Very nice library!
I especially like how non-intrusive it is. That's always a big plus in my books.
Two things:
1. As far as I can tell, recursion is not handled correctly. Here's an example:
Code: Select all
local profile = require("profile/profile")
profile.setclock(love.timer.getTime) -- this will measure wall time, which is more intuitive for this example
local function g(s)
love.timer.sleep(1)
if s > 0 then
g(s-1)
end
end
local function f()
love.timer.sleep(1)
g(7)
end
function love.load()
profile.hookall("Lua")
profile.start()
end
function love.update()
f()
print(profile.report())
os.exit(0)
end
Running this prints the following report:
Code: Select all
Profiling report
# fname count time source
================================================================================
1. g x 8 1.00101 main.lua:4
2. f x 1 9.00810 main.lua:11
3. [string "boot.lua"]:188 x 1 0.00001 [string "boot.lua"]:188
4. [string "boot.lua"]:182 x 1 0.00000 [string "boot.lua"]:182
5. update x 1 0.00000 main.lua:21
6. load x 0 0.00000 main.lua:16
According to the report, the CPU spent 9 seconds in f, which is what we expect, and 1 second in g, which is not correct. The number of calls, however, is correct for both functions.
I believe that this is caused by profile.lua:59: Here, the timestamp is updated even if the function did not return yet.
An easy patch seems to be:
Code: Select all
_tcalled[f] = _tcalled[f] or clock()
Since _tcalled is cleared once a function returns, this change makes sure that we only update the timestamp after the function returned.
Edit: This won't work for more complex cases. E.g. in the example above, my fix doesn't work if function g is:
Code: Select all
local function g(s)
if s > 0 then
g(s-1)
g(s-1)
end
love.timer.sleep(1)
end
2. I implemented a metric to track the time spent within a function, not including calls to other functions (thus, similar to the --no-children option in Linux' perf tool). So, in the example above, only 1s is reported for function f instead of 9 seconds, since only 1s was actually spent within f. This metric can sometimes make it easier to find the actual bottleneck.
The patch to add this metric is attached, in case you'd like to add it to the library
From some quick tests it seems to work as intended, but I'm expecting to hear about weird cases where it doesn't work
The new metric is returned by the iterator, right after the total time, and you can sort the report based on this metric by passing "tnochildren" to profile.query.