I hit the jackpot on my first try, I started a game and at the end of the first level, with the screen black, the game froze (the transition you mention, I guess).
Out of curiosity I attached gdb to the still running process (gdb --pid <pid>). To my surprise it printed some useful information. The backtrace was useless, of course, but then I started stepping instructions (gdb command SI) and it printed the names of several functions it was executing. Here's the functions that got executed:
Code: Select all
lj_BC_GGET
lj_BC_TGETS
lj_BC_CALL
lj_ff_ipairs
lj_fff_res
lj_fff_res_
lj_BC_JMP
lj_BC_ITERC
lj_ff_ipairs_aux
lj_fff_res2
lj_fff_res
lj_fff_res_
lj_BC_JITERL
lj_BC_JLOOP
??
lj_vm_next
??
lj_vm_next
??
lj_vm_next
??
lj_vm_exit_interp
lj_BC_ITERC
lj_ff_ipairs_aux
lj_fff_res2
lj_fff_res0
lj_fff_res
lj_BC_JITERL
lj_BC_ITERC
lj_ff_next
lj_tab_next
lj_tab_keyindex
hashkey.isra
lj_tab_keyindex
lj_obj_equal
lj_tab_keyindex
lj_obj_equal
lj_tab_keyindex
lj_obj_equal
lj_tab_keyindex
lj_tab_next
lj_ff_next
lj_fff_res2
lj_fff_res
lj_fff_res_
lj_BC_IITERL
lj_BC_ISEQP
lj_BC_GGET
lj_BC_TGETS
The lj_BC_xxx mostly seem to correspond to the interpreter opcodes. At some point it enters lj_BC_JITERL followed by lj_BC_JLOOP, and after that it executes unnamed code. I believe that's JIT-compiled code. Later it executes lj_exit_interp that I take as leaving compiled mode and returning to interpreter mode. There are some functions with names like lj_ff_*, lj_fff_* and lj_tab_* which I take as internal functions. Most relevantly, it's executing an ipairs().
I think I gave up too early tracing; unfortunately, I only got two freezes (consecutive!) then I didn't get a freeze anymore and I couldn't trace further.
With more function names, it might become possible to identify what section of the code is executing and why it's freezing. I ran it about 20-30 more times but I didn't get any other freezes.
After that first debugging session, my first suspicion was that it was executing function findName(), so I added a print() statement to it but I didn't get a freeze at the start of the function. I added a print inside the outer loop, but I got no further freezes and I couldn't test anymore. Also, the function looks quite straightforward, and it doesn't look like it has any reason to freeze. It's still possible that the loop is on a caller, though.
If you can reproduce it again, try stepping with gdb and taking note of the names of the functions it executes, and post them here as I did.
It's also a good idea to print backtraces on suspects: print(debug.traceback()). If you manage to place one at the point where it gets stuck, you can see where the infinite loop is happening and where it is called from.
Side note, instead of kill -9, you can interrupt by sending SIGQUIT instead of SIGINT, with Ctrl+\ or Ctrl+4 or both, depending on your keyboard.
Edit: For a bug that was similarly tough to reproduce, I used the method that I outline here:
viewtopic.php?p=229957#p229957 - basically, make the delta time fixed (I used 0.015625 which is 1/64, that's an exact floating point number so there are no rounding errors); count frames, record all keypresses/keyreleases and the frame when they happen, and once you get a crash, replay them, each at its corresponding frame count.