Re: Lua Performance Tips
Posted: Thu Aug 11, 2011 4:27 pm
Do Agree with that.Bruce Whiteside wrote:Make it work before you make it work fast.
Do Agree with that.Bruce Whiteside wrote:Make it work before you make it work fast.
How would one use that with Love? I saw the require but it's also talking about compiling, which I'm pretty rusty on.vrld wrote: LuaProfiler is great for that.
Code: Select all
local profiler = require 'profiler'
Code: Select all
profiler.start()
Code: Select all
profiler.stop()
I get something about lua5.1.dll?Roland_Yonaba wrote:I remember I have been using LuaProfiler the last week, while writing my basic RTS project.
It is pretty simple to use. Just place the 'profiler.dll' near your program folder, and load it this way:
No need to add '.dll' extension, require does this automatically.Code: Select all
local profiler = require 'profiler'
Then start the profiler where you want in your script this way:
And stop it this way;Code: Select all
profiler.start()
Pretty simple. There will be an outputted file generally named 'lprof_xxxx.out', open it using any text editor to see the results.Code: Select all
profiler.stop()
Code: Select all
################## test01
Time for 10000000 cycles (in ms):
localize_function_yes: 3388.999939 (100%) (0.000339 per cycle)
localize_function_no: 4285.999894 (126%) (0.000429 per cycle)
CONCLUSIONS:
- Always localize functions.
################## test02
Time for 10000000 cycles (in ms):
localize_method_yes: 30894.996643 (100%) (0.003089 per cycle)
localize_method_no: 33195.000648 (107%) (0.003320 per cycle)
CONCLUSIONS:
- If method is called many times in the function or loop, localize it first
inside the function or loop.
################## test03
Time for 10000000 cycles (in ms):
unpack_no: 4053.001404 (100%) (0.000405 per cycle)
unpack_localize: 4891.998291 (120%) (0.000489 per cycle)
unpack_custom: 5466.003418 (134%) (0.000547 per cycle)
CONCLUSIONS:
- Do not use unpack if you can avoid it.
- Strangely, custom unpack function is slower than localized unpack.
################## test04
Time for 10000000 cycles (in ms):
max_no: 1964.996338 (100%) (0.000196 per cycle)
max_yes: 3333.000183 (169%) (0.000333 per cycle)
CONCLUSIONS:
- Do not use math.max, math.min. Use conditionals instead.
################## test05
Time for 10000000 cycles (in ms):
nil_or: 2003.997803 (100%) (0.000200 per cycle)
nil_if: 2057.998657 (102%) (0.000206 per cycle)
nil_andor: 2117.004395 (105%) (0.000212 per cycle)
CONCLUSIONS:
- It does not really matter unless used in time-critical code, in which case,
use 'x = y or 1' instead of 'if y == nil then x = 1 end'.
- However, 'x = y==nil and 1 or y' is slowest of all.
################## test06
Time for 10000000 cycles (in ms):
square_mult: 341.003418 (100%) (0.000034 per cycle)
square_caret: 607.002258 (178%) (0.000061 per cycle)
square_pow: 1872.993469 (549%) (0.000187 per cycle)
CONCLUSIONS:
- Use multiplication 'x*x' instead of ^ operator 'x^2'.
- As expected, math.pow is super slow.
################## test07
Time for 10000000 cycles (in ms):
modulus_mod: 715.995789 (100%) (0.000072 per cycle)
modulus_fmod: 1896.003723 (264%) (0.000190 per cycle)
CONCLUSIONS:
- If you can, use % operator instead of math.fmod.
################## test08
Time for 10000000 cycles (in ms):
funcparam_localized: 2852.005005 (100%) (0.000285 per cycle)
funcparam_anonymous: 6011.001587 (210%) (0.000601 per cycle)
CONCLUSIONS:
- Always localize passed functions. Do not pass anonymous functions.
################## test09
Time for 10000000 cycles (in ms):
iterate_forilocal: 135803.985596 (100%) (0.013580 per cycle)
iterate_fori: 158284.004211 (116%) (0.015828 per cycle)
iterate_while: 196549.072266 (144%) (0.019655 per cycle)
iterate_fornext: 353249.023438 (260%) (0.035325 per cycle)
iterate_pairs: 354534.912109 (261%) (0.035453 per cycle)
iterate_ipairs: 367575.012207 (270%) (0.036758 per cycle)
CONCLUSIONS:
- If ONLY traversing the array portion of a table, whether order of elements
matters or not, use:
local size = #table
for i=1,size do
...
end
- Otherwise, use 'for k,v in pairs(table) do', or 'for k,v in next,table do'.
The difference in speed between these two is not noticable.
################## test10
Time for 10000000 cycles (in ms):
access_method: 540.039062 (100%) (0.000054 per cycle)
access_array: 547.973633 (101%) (0.000055 per cycle)
access_array_variable: 572.998047 (106%) (0.000057 per cycle)
CONCLUSIONS:
- Use whichever you want. Time difference is small enough to barely register.
- The variable lookup has a small overhead, as expected.
################## test11
Time for 10000000 cycles (in ms):
buffertable_yes: 155816.040039 (100%) (0.015582 per cycle)
buffertable_no: 191192.016602 (122%) (0.019119 per cycle)
CONCLUSIONS:
- Localize table index if used more than once in the function/loop.
- This is basically the same conclusion as test02.
################## test12
Time for 10000000 cycles (in ms):
tableadd_direct: 1044.067383 (100%) (0.000104 per cycle)
tableadd_counter: 1216.918945 (116%) (0.000122 per cycle)
tableadd_tablesize: 3295.043945 (315%) (0.000330 per cycle)
tableadd_insert: 4183.959961 (400%) (0.000418 per cycle)
CONCLUSIONS:
- Do not use table.insert unless you have to! (Not even localized.)
- Using 'table[#table+1]' is not much better.
- Direct access usually is not feasable.
- So, use a counter variable when possible.
################## test13
Time for 10000000 cycles (in ms):
tableparam_localized_empty_ro: 3413.085938 (100%) (0.000341 per cycle)
tableparam_localized_constructed_ro: 3436.035156 (100%) (0.000344 per cycle)
tableparam_tablecopy: 10267.089844 (300%) (0.001027 per cycle)
tableparam_anonymous_direct: 10770.996094 (315%) (0.001077 per cycle)
tableparam_anonymous_lookup: 12754.882812 (373%) (0.001275 per cycle)
NOTE: The tests ending in _ro mean the function is expected to treat the
table as read-only (otherwise it would be modifying the same table on each
iteration). This skews the times in favor of these _ro tests.
CONCLUSIONS:
- Do not pass anonymous tables unless you have to!
- That is, try to avoid creating short-lived tables within loops or functions.
- Instead, localize a table outside the loop and modify it before passing it.
(This is only feasable if the function you're passing the table into treats
the passed table as read-only.)
- Surprisingly, whether you create an empty table or a constructed table (with
the correct number of elements) does not make much difference (at least, not
with only two key-value pairs).
- Even more surprisingly, the fastest non-readonly solution is calling a
function to copy the table from within the loop, and pass the result table
on to the real function. I suppose this is similar to a closure.
Code: Select all
local T = {}
local getTime = love.timer.getTime
local RND = 100000
local RNDHALF = RND/2
local CLASS = {test = function() return math.random(RND) end}
local A = {math.random(RND), math.random(RND), math.random(RND), math.random(RND)}
local B = {}
for i=1,200 do B[i] = i end
local tests = {
test01 = {
localize_function_no = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local x = math.min(random(RND), RNDHALF)
end
return getTime() - start
end,
localize_function_yes = function(cycles)
local min = math.min
local random = math.random
local start = getTime()
for n=1,cycles do
local x = min(random(RND), RNDHALF)
end
return getTime() - start
end,
},
test02 = {
localize_method_no = function(cycles)
local start = getTime()
for n=1,cycles do
local z = CLASS.test()
local y = CLASS.test()
local x = CLASS.test()
local w = CLASS.test()
local v = CLASS.test()
local u = CLASS.test()
local t = CLASS.test()
local s = CLASS.test()
end
return getTime() - start
end,
localize_method_yes = function(cycles)
local start = getTime()
for n=1,cycles do
local test = CLASS.test
local z = test()
local y = test()
local x = test()
local w = test()
local v = test()
local u = test()
local t = test()
local s = test()
end
return getTime() - start
end,
},
test03 = {
unpack_no = function(cycles)
local min = math.min
local start = getTime()
for n=1,cycles do
local x = min(A[1], A[2], A[3], A[4])
end
return getTime() - start
end,
unpack_localize = function(cycles)
local min = math.min
local unpack = unpack
local start = getTime()
for n=1,cycles do
local x = min(unpack(A))
end
return getTime() - start
end,
unpack_custom = function(cycles)
local min = math.min
local function unpack4(a) return a[1],a[2],a[3],a[4] end
local start = getTime()
for n=1,cycles do
local x = min(unpack4(A))
end
return getTime() - start
end,
},
test04 = {
max_yes = function(cycles)
local max = math.max
local random = math.random
local x = random(RND)
local start = getTime()
for n=1,cycles do
x = max(random(RND), x)
end
return getTime() - start
end,
max_no = function(cycles)
local random = math.random
local x = random(RND)
local start = getTime()
for n=1,cycles do
local r = random(RND)
if r>x then x = r end
end
return getTime() - start
end,
},
test05 = {
nil_if = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local y,x
if random()>0.5 then y=1 end
if y==nil then x=1 else x=y end
end
return getTime() - start
end,
nil_or = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local y
if random()>0.5 then y=1 end
local x = y or 1
end
return getTime() - start
end,
nil_andor = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local y
if random()>0.5 then y=1 end
local x = y==nil and 1 or y
end
return getTime() - start
end,
},
test06 = {
square_caret = function(cycles)
local x = RNDHALF
local start = getTime()
for n=1,cycles do
local y = x^2
end
return getTime() - start
end,
square_mult = function(cycles)
local x = RNDHALF
local start = getTime()
for n=1,cycles do
local y = x*x
end
return getTime() - start
end,
square_pow = function(cycles)
local x = RNDHALF
local pow = math.pow
local start = getTime()
for n=1,cycles do
local y = pow(x,2)
end
return getTime() - start
end,
},
test07 = {
modulus_fmod = function(cycles)
local fmod = math.fmod
local start = getTime()
for n=1,cycles do
if fmod(n,30)<1 then
local x = 1
end
end
return getTime() - start
end,
modulus_mod = function(cycles)
local start = getTime()
for n=1,cycles do
if (n%30)<1 then
local x = 1
end
end
return getTime() - start
end,
},
test08 = {
funcparam_anonymous = function(cycles)
local func1 = function(a,b,func) return func(a+b) end
local start = getTime()
for n=1,cycles do
local x = func1(1,2,function(a) return a*2 end)
end
return getTime() - start
end,
funcparam_localized = function(cycles)
local func1 = function(a,b,func) return func(a+b) end
local func2 = function(a) return a*2 end
local start = getTime()
for n=1,cycles do
local x = func1(1,2,func2)
end
return getTime() - start
end,
},
test09 = {
iterate_ipairs = function(cycles)
local start = getTime()
for n=1,cycles do
for i,v in ipairs(B) do
local x=v
end
end
return getTime() - start
end,
iterate_pairs = function(cycles)
local start = getTime()
for n=1,cycles do
for i,v in pairs(B) do
local x=v
end
end
return getTime() - start
end,
iterate_fornext = function(cycles)
local start = getTime()
for n=1,cycles do
for i,v in next,B do
local x=v
end
end
return getTime() - start
end,
iterate_fori = function(cycles)
local start = getTime()
for n=1,cycles do
for i=1,#B do
local x=B[i]
end
end
return getTime() - start
end,
iterate_forilocal = function(cycles)
local start = getTime()
local size = #B
for n=1,cycles do
for i=1,size do
local x=B[i]
end
end
return getTime() - start
end,
iterate_while = function(cycles)
local start = getTime()
for n=1,cycles do
local i = #B
while i>0 do
local x=B[i]
i = i-1
end
end
return getTime() - start
end,
},
test10 = {
access_array = function(cycles)
local a = {foo = function() end}
local start = getTime()
for n=1,cycles do
local x = a['foo']
end
return getTime() - start
end,
access_array_variable = function(cycles)
local f = 'foo'
local a = {foo = function() end}
local start = getTime()
for n=1,cycles do
local x = a[f]
end
return getTime() - start
end,
access_method = function(cycles)
local a = {foo = function() end}
local start = getTime()
for n=1,cycles do
local x = a.foo
end
return getTime() - start
end,
},
test11 = {
buffertable_no = function(cycles)
local a = {}
for i=1,100 do a[i] = {x=i} end
local start = getTime()
for n=1,cycles do
for i=1,100 do
a[i].x = a[i].x + 1
end
end
return getTime() - start
end,
buffertable_yes = function(cycles)
local a = {}
for i=1,100 do a[i] = {x=i} end
local start = getTime()
for n=1,cycles do
for i=1,100 do
local y = a[i]
y.x = y.x + 1
end
end
return getTime() - start
end,
},
test12 = {
tableadd_insert = function(cycles)
local a = {}
local insert = table.insert
local start = getTime()
for n=1,cycles do
insert(a,n)
end
return getTime() - start
end,
tableadd_direct = function(cycles)
local a = {}
local start = getTime()
for n=1,cycles do
a[n] = n
end
return getTime() - start
end,
tableadd_tablesize = function(cycles)
local a = {}
local start = getTime()
for n=1,cycles do
a[#a+1] = n
end
return getTime() - start
end,
tableadd_counter = function(cycles)
local a = {}
local count = 1
local start = getTime()
for n=1,cycles do
a[count] = n
count = count + 1
end
return getTime() - start
end,
},
test13 = {
tableparam_anonymous_direct = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local start = getTime()
for n=1,cycles do
func({x = n, y = n-1})
end
return getTime() - start
end,
tableparam_anonymous_lookup = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local a = {x = 1, y = 1}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func({x = a.x, y = a.y})
end
return getTime() - start
end,
tableparam_localized_empty_ro = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local a = {}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func(a)
end
return getTime() - start
end,
tableparam_localized_constructed_ro = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local a = {x = 1, y = 1}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func(a)
end
return getTime() - start
end,
tableparam_tablecopy = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local copy = function(t)
local q = {}
local size = #t
for i=1,size do q[i] = t[i] end
return q
end
local a = {x = 1, y = 1}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func(copy(a))
end
return getTime() - start
end,
},
}
function T.run(cycles)
local allTestsKeys = {}
for k in pairs(tests) do allTestsKeys[#allTestsKeys+1] = k end
table.sort(allTestsKeys)
for i=1,#allTestsKeys do
local testKey = allTestsKeys[i]
local test = tests[testKey]
print(string.format('\n################## %s', testKey))
local testKeys = {}
for k in pairs(test) do testKeys[#testKeys+1] = k end
table.sort(testKeys)
local results = {}
for j=1,#testKeys do
local key = testKeys[j]
local time = test[key](cycles)
time = time > 0 and time or 0.00001
local ms = time*1000
local reskey = tostring(math.floor(ms*1000))..tostring(j)
results[reskey] = {ms=ms, test=key, time=time}
end
local resultsKeys = {}
for k in pairs(results) do resultsKeys[#resultsKeys+1] = k end
table.sort(resultsKeys, function(a,b) return tonumber(a) < tonumber(b) end)
local baseResult = results[resultsKeys[1]]
print(string.format('Time for %d cycles (in ms):', cycles))
for j=1,#resultsKeys do
local r = results[resultsKeys[j]]
local percent = (r.ms / baseResult.ms) * 100
local avg_per_cycle = (r.time / cycles) * 1000
print(string.format(' %s: %f (%d%%) (%f per cycle)',
r.test, r.ms, percent, avg_per_cycle))
end
end
end
return T
Code: Select all
for n=1,cycles do
local test = CLASS.test
Code: Select all
local test = CLASS.test
for n=1,cycles do
I'm not that suprised since a custom unpack function would probably have to make more operations to the stack than the regular unpack. "unpack_custom" needs to start a function scope, look up values, push them on the stack, etc.- Strangely, custom unpack function is slower than localized unpack.
math.max can work with a variable number of arguments so I assume that's why it's slower in your test.- Do not use math.max, math.min. Use conditionals instead.
Code: Select all
local func1 = function(a,b,func) return func(a+b) end
local start = getTime()
for n=1,cycles do
local func2 = function(a) return a*2 end
local x = func1(1,2,func2)
end
If you're on OSX then you're using a version for the wrong architecture (x86, or ppc, since it is most likely you have an x64).kefka wrote:mach-o, but wrong architecture
Actually in this case it's probably x64 when you need x86 or PPC, because LÖVE is 32-bit.bartbes wrote:If you're on OSX then you're using a version for the wrong architecture (x86, or ppc, since it is most likely you have an x64).kefka wrote:mach-o, but wrong architecture
If you're not on OSX, well, then you have binaries for the wrong OS.