"profile.lua" a tool for finding bottlenecks

grump · Post by **grump** » Mon Oct 30, 2017 3:24 pm

Ah yes, I remember how tracking the call stack was basically impossible, because the hook was not always triggered when a function call ended.

A problem with your implementation is that nested functions like these:

Code: Select all

local function a()
	local function b()
		local function c()
			print("hello")
		end
		
		c()
	end
	
	b()
end

for i = 1, 10 do a() end

result in a profile that looks like this:

Code: Select all

Profiling report
================
1.'a'x10 time:0.002497 (main.lua:7)
2.'b'x1 time:0.00042900000000001 (main.lua:8)
3.'c'x1 time:0.000419 (main.lua:9)
4.'b'x1 time:0.000281 (main.lua:8)
5.'b'x1 time:0.00027600000000003 (main.lua:8)
6.'c'x1 time:0.00026500000000002 (main.lua:9)
7.'b'x1 time:0.000252 (main.lua:8)
8.'c'x1 time:0.000251 (main.lua:9)
9.'c'x1 time:0.00024299999999999 (main.lua:9)
10.'b'x1 time:0.000193 (main.lua:8)
11.'b'x1 time:0.00018399999999999 (main.lua:8)
12.'b'x1 time:0.00017999999999999 (main.lua:8)
13.'c'x1 time:0.00017400000000001 (main.lua:9)
14.'c'x1 time:0.000166 (main.lua:9)
15.'b'x1 time:0.00015699999999999 (main.lua:8)
16.'c'x1 time:0.00015299999999999 (main.lua:9)
17.'b'x1 time:0.00015100000000001 (main.lua:8)
18.'c'x1 time:0.000141 (main.lua:9)
19.'c'x1 time:0.00013500000000002 (main.lua:9)
20.'b'x1 time:0.00012400000000001 (main.lua:8)
21.'c'x1 time:5.9000000000003e-05 (main.lua:9)

While the number of a() calls is correctly reported as 10 times, b() and c() appear 10 times with one call each.

ivan · Post by **ivan** » Mon Oct 30, 2017 3:55 pm

Hey grump, I believe we mentioned this earlier in this topic.
Personally, I actually prefer seeing these reported individually (each closure has its own "unique reference" in Lua) so that I know when I'm creating too many closures in my code.
I think it would be fairly simple to report these as the same function - if this feature is super important for your profiling needs, I'll see what I can do.

grump · Post by **grump** » Mon Oct 30, 2017 4:31 pm

ivan wrote: ↑Mon Oct 30, 2017 3:55 pm Hey grump, I believe we mentioned this earlier in this topic.

Ah, okay.

if this feature is super important for your profiling needs, I'll see what I can do.

Nah, I'm good, thanks. I'm just testing how well this works because it made me think about reviving my own profiler

ivan · Post by **ivan** » Mon Oct 30, 2017 4:58 pm

One simple way to handling this is:

Code: Select all

-- Combines data generated by closures, should be called prior to queries
function profile.combine()
  local lookup = {}
  for f, d in pairs(_defined) do
    local id = (_labeled[f] or "?")..d
    local f2 = lookup[id]
    if f2 then
      _ncalls[f2] = _ncalls[f2] + (_ncalls[f] or 0)
      _telapsed[f2] = _telapsed[f2] + (_telapsed[f] or 0)
      _defined[f], _labeled[f] = nil, nil
      _ncalls[f], _telapsed[f] = nil, nil
    else
      lookup[id] = f
    end
  end
end

Just call this func prior to the report and you're good to go.

Pospos · Post by **Pospos** » Mon Oct 30, 2017 7:02 pm

Sorry, waht the hell is a bottle neck

grump · Post by **grump** » Mon Oct 30, 2017 7:31 pm

Bottleneck (software)

ivan · Post by **ivan** » Wed Nov 01, 2017 9:59 am

Pospos wrote: ↑Mon Oct 30, 2017 7:02 pm Sorry, waht the hell is a bottle neck

The idea here is that you start the profiler and it shows you which functions took the longest time to execute.
So that allows you to find the slowest parts of your game code aka "bottlenecks".

0x25a0 · Post by **0x25a0** » Thu Nov 02, 2017 12:42 am

Very nice library!

I especially like how non-intrusive it is. That's always a big plus in my books.

Two things:
1. As far as I can tell, recursion is not handled correctly. Here's an example:

Code: Select all

local profile = require("profile/profile")
profile.setclock(love.timer.getTime) -- this will measure wall time, which is more intuitive for this example

local function g(s)
  love.timer.sleep(1)
  if s > 0 then
    g(s-1)
  end
end

local function f()
  love.timer.sleep(1)
  g(7)
end

function love.load()
  profile.hookall("Lua")
  profile.start()
end

function love.update()
  f()
  print(profile.report())
  os.exit(0)
end

Running this prints the following report:

Code: Select all

Profiling report

 #  fname                       count     time source
================================================================================
 1. g                        x      8  1.00101 main.lua:4
 2. f                        x      1  9.00810 main.lua:11
 3. [string "boot.lua"]:188  x      1  0.00001 [string "boot.lua"]:188
 4. [string "boot.lua"]:182  x      1  0.00000 [string "boot.lua"]:182
 5. update                   x      1  0.00000 main.lua:21
 6. load                     x      0  0.00000 main.lua:16

According to the report, the CPU spent 9 seconds in f, which is what we expect, and 1 second in g, which is not correct. The number of calls, however, is correct for both functions.
I believe that this is caused by profile.lua:59: Here, the timestamp is updated even if the function did not return yet. An easy patch seems to be:
Code: Select all
_tcalled[f] = _tcalled[f] or clock()
Since _tcalled is cleared once a function returns, this change makes sure that we only update the timestamp after the function returned.
Edit: This won't work for more complex cases. E.g. in the example above, my fix doesn't work if function g is:

Code: Select all

local function g(s)
  if s > 0 then
    g(s-1)
    g(s-1)
  end
  love.timer.sleep(1)
end

2. I implemented a metric to track the time spent within a function, not including calls to other functions (thus, similar to the --no-children option in Linux' perf tool). So, in the example above, only 1s is reported for function f instead of 9 seconds, since only 1s was actually spent within f. This metric can sometimes make it easier to find the actual bottleneck.

The patch to add this metric is attached, in case you'd like to add it to the library

From some quick tests it seems to work as intended, but I'm expecting to hear about weird cases where it doesn't work

The new metric is returned by the iterator, right after the total time, and you can sort the report based on this metric by passing "tnochildren" to profile.query.

ivan · Post by **ivan** » Thu Nov 02, 2017 7:33 am

Great catch 0x25a0, this seems to be a bug with any type of recursion, not just tail recursion.
I really hope this could be patched without any more calls to getinfo since that is going to be even more memory intensive (I tend to run the profiler in realtime so the memory adds up quickly).
My attempt to patch this uses your first suggestion along with simple refcounting.
There are caveats of course, especially with LuaJIT and FFI (where the order of call & return is not guaranteed) but it seems to work correctly in both of your examples.
Also note that "profiler.stop" might lead to inaccuracies when called from code that is being profiled, but hopefully these could be fixed in the future.
Thanks so much for the help!

PS. Here is another fix note that your examples should be tweaked slightly for the second fix to work:

Code: Select all

profiler.stop()
print(profiler.report())

0x25a0 · Post by **0x25a0** » Thu Nov 02, 2017 1:49 pm

Thanks for the quick fix!

That seems to work perfectly

"profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Re: "profile.lua" a tool for finding bottlenecks

Who is online