Garbage going crazy

pgimeno · Post by **pgimeno** » Wed May 12, 2021 4:54 pm

In case it helps, here's my package.path (in both Löve and command line luajit):

./?.lua;/usr/share/luajit-2.0.4/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;/usr/share/lua/5.1/?.lua;/usr/share/lua/5.1/?/init.lua

And it's found in the second location, as these lines of strace show:

Code: Select all

open("./jit/dump.lua", O_RDONLY)        = -1 ENOENT (No such file or directory)
open("/usr/share/luajit-2.0.4/jit/dump.lua", O_RDONLY) = 19

zorg · Post by **zorg** » Wed May 12, 2021 6:11 pm

grump wrote: ↑Wed May 12, 2021 12:16 am
zorg wrote: ↑Tue May 11, 2021 11:26 pm I think i got auah to give me some precompiled extra luajit stuff earlier that löve didn't have that i could just put into my project and what allowed me to do some debug stuff live; i can try to find it tomorrow when i'm not at work, if you're interested.
Please do.
Edit: oh, precompiled... if that means binaries compiled for Windows then I can't use it.

Well, it was just a couple of lua files, but they were autogenerated for me by him since i didn't want to build luajit on windows; not sure if the contents change or not, it was basically all the files necessary for dump.lua itself to function. (dump.lua, vmdef.lua, bc.lua, bcsave.lua, v.lua, dis_<arch>.lua)

grump · Post by **grump** » Wed May 12, 2021 6:15 pm

pgimeno wrote: ↑Wed May 12, 2021 4:54 pm In case it helps, here's my package.path (in both Löve and command line luajit):

I added the jit system directory to LÖVE's require path and it finds dump.lua. Unfortunately I can't use it like that because the LÖVE AppImage I use has LuaJIT 2.0.4 and my system has 2.1.0 installed. I'm just gonna get the files for 2.0.4 and add them locally when I need them. Thanks.

@zorg, thanks, I got it sorted out.

grump · Post by **grump** » Sat May 15, 2021 12:49 pm

So... I got the trace log thingie going, and it provided a little insight. There were a bunch of aborted traces with fallbacks to the interpreter, even though there is not a single NYI call in the critical parts. I could eliminate all of these aborts by tweaking JIT options, cranking them up to much higher values.

I now have this in my conf.lua:

Code: Select all

local JIT_MAGIC = 5000
jit.opt.start(3,
	'maxtrace='   .. JIT_MAGIC,
	'maxrecord='  .. JIT_MAGIC * 4,
	'maxirconst=' .. JIT_MAGIC * 4,
	'maxside='    .. JIT_MAGIC,
	'maxsnap='    .. JIT_MAGIC,
	'maxmcode='   .. JIT_MAGIC * 4,
	'maxside='    .. JIT_MAGIC,
	'tryside='    .. JIT_MAGIC,
	'maxsnap='    .. JIT_MAGIC,
	'instunroll=' .. JIT_MAGIC,
	'loopunroll=' .. JIT_MAGIC
)

The code runs well enough with these options, but there is still the occasional slowdown, and sometimes it's just crawling at 5% of the usual speed right from the start.

The GC counter is still spinning. Which is especially frustrating because I took great care to avoid garbage buildup, as well as small loops and branches.

What's left is aborts labeled as "too many snapshots". I couldn't get rid of those, no matter how large the maxsnap option was. And "leaving loop in root trace", no idea what to do about these. They seem to be a normal occurence. "loop unroll limit reached" was another abort reason. Seems harmless enough, and there were only two of them. There are not many loops in the code anyway.

And as previously mentioned, I invented a new tool to fight slowdowns: the no-op loop.

Code: Select all

for _ = 1, 100 do value = value end

I have two loops like this at random places in the code to prevent catastrophic slowdowns. I would love to understand what the hell is going on there. I wanted to isolate a test case but there's too much code to remove and too little determinism.

pgimeno · Post by **pgimeno** » Sat May 15, 2021 1:33 pm

Well, this may be a bit (understatement of the year) kludgy, but maybe consider measuring performance and doing love.event.quit("restart") if it's not enough.

grump · Post by **grump** » Sat May 15, 2021 2:27 pm

pgimeno wrote: ↑Sat May 15, 2021 1:33 pm Well, this may be a bit (understatement of the year) kludgy, but maybe consider measuring performance and doing love.event.quit("restart") if it's not enough.

I have considered doing that, but it seems difficult to distinguish between "this computer is slow" and "LuaJIT doesn't feel like working this time". There is also just this one critical code path that will only run after the user has supplied a ROM to run in this emulator.

I have tested calling jit.flush, but that only results in worse performance every time.

And I have considered splitting the three main components into separate threads, which would definitely help here, as each component runs fine on its own. But they would need to run in a perfectly synchronized lockstep on a ~5.4 MHz master clock. I'm not sure if that's feasible.

pgimeno · Post by **pgimeno** » Sat May 15, 2021 4:56 pm

grump wrote: ↑Sat May 15, 2021 2:27 pm I have considered doing that, but it seems difficult to distinguish between "this computer is slow" and "LuaJIT doesn't feel like working this time".

Isn't it distinguishable by amount of garbage generated, rather than speed?

grump · Post by **grump** » Sat May 15, 2021 5:18 pm

pgimeno wrote: ↑Sat May 15, 2021 4:56 pm Isn't it distinguishable by amount of garbage generated, rather than speed?

Maybe, but not necessarily. It keeps accumulating garbage even when it runs sufficiently fast. But I haven't looked at the exact numbers. Garbage is also generated faster as it runs faster.

grump · Post by **grump** » Sat Jun 05, 2021 7:05 pm

I got some hints from LuaJIT's author:

There's an unpredictable megamorphic dispatch in a loop, so lots of traces are generated. You're seeing a side-effect of variables anchored to traces.

You can achieve much better performance with tailcall-based dispatch. This allows even traces across opcodes, which implicitly reconstructs the original control flow.

If my interpretation of this is correct, then it means that in order to prevent an explosion of traces in my emulator, no function in the critical code path can open a stack frame. Any function call that is not a tail call, and possibly any local variable, will potentially trigger a new trace.

While this might be a solvable problem with a single interpreter, it falls apart as soon as there are multiple interleaved interpreters that use dynamic dispatch, as in an emulator.

Code: Select all

local function tick()
	if steps == 0 then return end
	steps = steps - 1
	ppu:tick()
	cpu:tick()
	apu:tick()
	tick()
end

Since there is no way (at least none that I know of) to place some kind of trace barrier, the traces for this loop will become conflated and the compiler will churn endlessly.

At least that's my understanding of the problem.

Running the components in separate threads would solve the problem, but I don't see how I can synchronize a bunch of threads several million times per second using Channels. Conclusion: my emulator's design is crap and won't ever perform reliably well in LuaJIT.

pgimeno · Post by **pgimeno** » Sat Jun 05, 2021 8:35 pm

grump wrote: ↑Sat Jun 05, 2021 7:05 pm I got some hints from LuaJIT's author:

There's an unpredictable megamorphic dispatch in a loop, so lots of traces are generated. You're seeing a side-effect of variables anchored to traces.

You can achieve much better performance with tailcall-based dispatch. This allows even traces across opcodes, which implicitly reconstructs the original control flow.
If my interpretation of this is correct, then it means that in order to prevent an explosion of traces in my emulator, no function in the critical code path can open a stack frame. Any function call that is not a tail call, and possibly any local variable, will potentially trigger a new trace.

Was that a private conversation, or a public one?

I'm not familiar with that terminology, so I may easily be wrong. From my reading of this: https://en.wikipedia.org/wiki/Inline_ca ... ne_caching it appears that the term "megamorphic" refers to a state in which in a certain code path, a certain variable (or several of them) adopts values of multiple types during a trace. I know that the traces can't deal with that situation well, because each IR instruction has an associated type. Making sure that you don't use polymorphism beyond what's strictly necessary could perhaps help. Like, avoid relying on defaulting to nil for tables, because nil is a different type. And if I understand what he's saying, avoid calling functions with different types in the middle of loops, and try to call them with a tail call.

I'm kind of assuming that "dispatching" means "calling" in this case, and I'm not sure of that.

He might also mean that you could loop using a tail call instead of a for.

Edit:

Code: Select all

    tick()

Shouldn't that be

Code: Select all

    return tick()

to be a proper tail call? Or is LuaJIT clever enough to figure out that there's no difference and use a tail call in that case? I doubt it.

Edit 2: Here's how I've made the tail call loop:

Code: Select all

local function tickloop(steps, cpu, apu, ppu)
  ppu:step()
  cpu:step()
  apu:step()
  if steps ~= 1 then return tickloop(steps - 1, cpu, apu, ppu) end
end

local function tick(self, dt)
  local steps = floor(self._freq * dt + .5)
  local cpu, apu, ppu = self._cpu, self._apu, self._ppu
  self._ctrl1:step()
  self._ctrl2:step()
  tickloop(steps, cpu, apu, ppu)
  return steps
end

Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Re: Garbage going crazy

Who is online