Do Agree with that.Bruce Whiteside wrote:Make it work before you make it work fast.
Lua Performance Tips
- Roland_Yonaba
- Inner party member
- Posts: 1563
- Joined: Tue Jun 21, 2011 6:08 pm
- Location: Ouagadougou (Burkina Faso)
- Contact:
Re: Lua Performance Tips
- tentus
- Inner party member
- Posts: 1060
- Joined: Sun Oct 31, 2010 7:56 pm
- Location: Appalachia
- Contact:
Re: Lua Performance Tips
How would one use that with Love? I saw the require but it's also talking about compiling, which I'm pretty rusty on.vrld wrote: LuaProfiler is great for that.
Kurosuke needs beta testers
- Roland_Yonaba
- Inner party member
- Posts: 1563
- Joined: Tue Jun 21, 2011 6:08 pm
- Location: Ouagadougou (Burkina Faso)
- Contact:
Re: Lua Performance Tips
I remember I have been using LuaProfiler the last week, while writing my basic RTS project.
It is pretty simple to use. Just place the 'profiler.dll' near your program folder, and load it this way:
No need to add '.dll' extension, require does this automatically.
Then start the profiler where you want in your script this way:
And stop it this way;
Pretty simple. There will be an outputted file generally named 'lprof_xxxx.out', open it using any text editor to see the results.
It is pretty simple to use. Just place the 'profiler.dll' near your program folder, and load it this way:
Code: Select all
local profiler = require 'profiler'
Then start the profiler where you want in your script this way:
Code: Select all
profiler.start()
Code: Select all
profiler.stop()
- Attachments
-
- profiler.zip
- LuaProfiler
- (45.24 KiB) Downloaded 244 times
- tentus
- Inner party member
- Posts: 1060
- Joined: Sun Oct 31, 2010 7:56 pm
- Location: Appalachia
- Contact:
Re: Lua Performance Tips
I get something about lua5.1.dll?Roland_Yonaba wrote:I remember I have been using LuaProfiler the last week, while writing my basic RTS project.
It is pretty simple to use. Just place the 'profiler.dll' near your program folder, and load it this way:
No need to add '.dll' extension, require does this automatically.Code: Select all
local profiler = require 'profiler'
Then start the profiler where you want in your script this way:
And stop it this way;Code: Select all
profiler.start()
Pretty simple. There will be an outputted file generally named 'lprof_xxxx.out', open it using any text editor to see the results.Code: Select all
profiler.stop()
Kurosuke needs beta testers
Re: Lua Performance Tips
LuaProfiler works for me fine when i use my system Lua (5.1), but when I include it in a love script, i get the following:
Error: error loading module 'profiler' from file '/usr/local/lib/lua/5.1/profiler.so':
dlopen(/usr/local/lib/lua/5.1/profiler.so, 2): no suitable image found. Did find:
/usr/local/lib/lua/5.1/profiler.so: mach-o, but wrong architecture
Error: error loading module 'profiler' from file '/usr/local/lib/lua/5.1/profiler.so':
dlopen(/usr/local/lib/lua/5.1/profiler.so, 2): no suitable image found. Did find:
/usr/local/lib/lua/5.1/profiler.so: mach-o, but wrong architecture
- slime
- Solid Snayke
- Posts: 3166
- Joined: Mon Aug 23, 2010 6:45 am
- Location: Nova Scotia, Canada
- Contact:
Re: Lua Performance Tips
You may need to compile LuaProfiler for 32-bit if you're on a 64-bit system.
Re: Lua Performance Tips
After reading through the link that Taehl posted on the first page, I got curious to see if I would get similar results. Also, I wanted to make some tests that I could run specifically with Love and its version of Lua. So I wrote a bunch of tests, looped them each 10 million times, and timed them with love.timer.getTime.
Here are the results, and some of my conclusions. In case it matters, I'm running this on an old Mac Pro, with OS X Lion.
And here are the tests, for reference:
I know there is a lot of repeated code in that test file, but I wanted to make sure I was only timing the inner loops and not any of the setup. Plus, I wrote it pretty quickly, I'm sure it could be better.
I'm always open to comments and criticisms, and if you think of any useful tests to add, let me know.
Here are the results, and some of my conclusions. In case it matters, I'm running this on an old Mac Pro, with OS X Lion.
Code: Select all
################## test01
Time for 10000000 cycles (in ms):
localize_function_yes: 3388.999939 (100%) (0.000339 per cycle)
localize_function_no: 4285.999894 (126%) (0.000429 per cycle)
CONCLUSIONS:
- Always localize functions.
################## test02
Time for 10000000 cycles (in ms):
localize_method_yes: 30894.996643 (100%) (0.003089 per cycle)
localize_method_no: 33195.000648 (107%) (0.003320 per cycle)
CONCLUSIONS:
- If method is called many times in the function or loop, localize it first
inside the function or loop.
################## test03
Time for 10000000 cycles (in ms):
unpack_no: 4053.001404 (100%) (0.000405 per cycle)
unpack_localize: 4891.998291 (120%) (0.000489 per cycle)
unpack_custom: 5466.003418 (134%) (0.000547 per cycle)
CONCLUSIONS:
- Do not use unpack if you can avoid it.
- Strangely, custom unpack function is slower than localized unpack.
################## test04
Time for 10000000 cycles (in ms):
max_no: 1964.996338 (100%) (0.000196 per cycle)
max_yes: 3333.000183 (169%) (0.000333 per cycle)
CONCLUSIONS:
- Do not use math.max, math.min. Use conditionals instead.
################## test05
Time for 10000000 cycles (in ms):
nil_or: 2003.997803 (100%) (0.000200 per cycle)
nil_if: 2057.998657 (102%) (0.000206 per cycle)
nil_andor: 2117.004395 (105%) (0.000212 per cycle)
CONCLUSIONS:
- It does not really matter unless used in time-critical code, in which case,
use 'x = y or 1' instead of 'if y == nil then x = 1 end'.
- However, 'x = y==nil and 1 or y' is slowest of all.
################## test06
Time for 10000000 cycles (in ms):
square_mult: 341.003418 (100%) (0.000034 per cycle)
square_caret: 607.002258 (178%) (0.000061 per cycle)
square_pow: 1872.993469 (549%) (0.000187 per cycle)
CONCLUSIONS:
- Use multiplication 'x*x' instead of ^ operator 'x^2'.
- As expected, math.pow is super slow.
################## test07
Time for 10000000 cycles (in ms):
modulus_mod: 715.995789 (100%) (0.000072 per cycle)
modulus_fmod: 1896.003723 (264%) (0.000190 per cycle)
CONCLUSIONS:
- If you can, use % operator instead of math.fmod.
################## test08
Time for 10000000 cycles (in ms):
funcparam_localized: 2852.005005 (100%) (0.000285 per cycle)
funcparam_anonymous: 6011.001587 (210%) (0.000601 per cycle)
CONCLUSIONS:
- Always localize passed functions. Do not pass anonymous functions.
################## test09
Time for 10000000 cycles (in ms):
iterate_forilocal: 135803.985596 (100%) (0.013580 per cycle)
iterate_fori: 158284.004211 (116%) (0.015828 per cycle)
iterate_while: 196549.072266 (144%) (0.019655 per cycle)
iterate_fornext: 353249.023438 (260%) (0.035325 per cycle)
iterate_pairs: 354534.912109 (261%) (0.035453 per cycle)
iterate_ipairs: 367575.012207 (270%) (0.036758 per cycle)
CONCLUSIONS:
- If ONLY traversing the array portion of a table, whether order of elements
matters or not, use:
local size = #table
for i=1,size do
...
end
- Otherwise, use 'for k,v in pairs(table) do', or 'for k,v in next,table do'.
The difference in speed between these two is not noticable.
################## test10
Time for 10000000 cycles (in ms):
access_method: 540.039062 (100%) (0.000054 per cycle)
access_array: 547.973633 (101%) (0.000055 per cycle)
access_array_variable: 572.998047 (106%) (0.000057 per cycle)
CONCLUSIONS:
- Use whichever you want. Time difference is small enough to barely register.
- The variable lookup has a small overhead, as expected.
################## test11
Time for 10000000 cycles (in ms):
buffertable_yes: 155816.040039 (100%) (0.015582 per cycle)
buffertable_no: 191192.016602 (122%) (0.019119 per cycle)
CONCLUSIONS:
- Localize table index if used more than once in the function/loop.
- This is basically the same conclusion as test02.
################## test12
Time for 10000000 cycles (in ms):
tableadd_direct: 1044.067383 (100%) (0.000104 per cycle)
tableadd_counter: 1216.918945 (116%) (0.000122 per cycle)
tableadd_tablesize: 3295.043945 (315%) (0.000330 per cycle)
tableadd_insert: 4183.959961 (400%) (0.000418 per cycle)
CONCLUSIONS:
- Do not use table.insert unless you have to! (Not even localized.)
- Using 'table[#table+1]' is not much better.
- Direct access usually is not feasable.
- So, use a counter variable when possible.
################## test13
Time for 10000000 cycles (in ms):
tableparam_localized_empty_ro: 3413.085938 (100%) (0.000341 per cycle)
tableparam_localized_constructed_ro: 3436.035156 (100%) (0.000344 per cycle)
tableparam_tablecopy: 10267.089844 (300%) (0.001027 per cycle)
tableparam_anonymous_direct: 10770.996094 (315%) (0.001077 per cycle)
tableparam_anonymous_lookup: 12754.882812 (373%) (0.001275 per cycle)
NOTE: The tests ending in _ro mean the function is expected to treat the
table as read-only (otherwise it would be modifying the same table on each
iteration). This skews the times in favor of these _ro tests.
CONCLUSIONS:
- Do not pass anonymous tables unless you have to!
- That is, try to avoid creating short-lived tables within loops or functions.
- Instead, localize a table outside the loop and modify it before passing it.
(This is only feasable if the function you're passing the table into treats
the passed table as read-only.)
- Surprisingly, whether you create an empty table or a constructed table (with
the correct number of elements) does not make much difference (at least, not
with only two key-value pairs).
- Even more surprisingly, the fastest non-readonly solution is calling a
function to copy the table from within the loop, and pass the result table
on to the real function. I suppose this is similar to a closure.
Code: Select all
local T = {}
local getTime = love.timer.getTime
local RND = 100000
local RNDHALF = RND/2
local CLASS = {test = function() return math.random(RND) end}
local A = {math.random(RND), math.random(RND), math.random(RND), math.random(RND)}
local B = {}
for i=1,200 do B[i] = i end
local tests = {
test01 = {
localize_function_no = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local x = math.min(random(RND), RNDHALF)
end
return getTime() - start
end,
localize_function_yes = function(cycles)
local min = math.min
local random = math.random
local start = getTime()
for n=1,cycles do
local x = min(random(RND), RNDHALF)
end
return getTime() - start
end,
},
test02 = {
localize_method_no = function(cycles)
local start = getTime()
for n=1,cycles do
local z = CLASS.test()
local y = CLASS.test()
local x = CLASS.test()
local w = CLASS.test()
local v = CLASS.test()
local u = CLASS.test()
local t = CLASS.test()
local s = CLASS.test()
end
return getTime() - start
end,
localize_method_yes = function(cycles)
local start = getTime()
for n=1,cycles do
local test = CLASS.test
local z = test()
local y = test()
local x = test()
local w = test()
local v = test()
local u = test()
local t = test()
local s = test()
end
return getTime() - start
end,
},
test03 = {
unpack_no = function(cycles)
local min = math.min
local start = getTime()
for n=1,cycles do
local x = min(A[1], A[2], A[3], A[4])
end
return getTime() - start
end,
unpack_localize = function(cycles)
local min = math.min
local unpack = unpack
local start = getTime()
for n=1,cycles do
local x = min(unpack(A))
end
return getTime() - start
end,
unpack_custom = function(cycles)
local min = math.min
local function unpack4(a) return a[1],a[2],a[3],a[4] end
local start = getTime()
for n=1,cycles do
local x = min(unpack4(A))
end
return getTime() - start
end,
},
test04 = {
max_yes = function(cycles)
local max = math.max
local random = math.random
local x = random(RND)
local start = getTime()
for n=1,cycles do
x = max(random(RND), x)
end
return getTime() - start
end,
max_no = function(cycles)
local random = math.random
local x = random(RND)
local start = getTime()
for n=1,cycles do
local r = random(RND)
if r>x then x = r end
end
return getTime() - start
end,
},
test05 = {
nil_if = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local y,x
if random()>0.5 then y=1 end
if y==nil then x=1 else x=y end
end
return getTime() - start
end,
nil_or = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local y
if random()>0.5 then y=1 end
local x = y or 1
end
return getTime() - start
end,
nil_andor = function(cycles)
local random = math.random
local start = getTime()
for n=1,cycles do
local y
if random()>0.5 then y=1 end
local x = y==nil and 1 or y
end
return getTime() - start
end,
},
test06 = {
square_caret = function(cycles)
local x = RNDHALF
local start = getTime()
for n=1,cycles do
local y = x^2
end
return getTime() - start
end,
square_mult = function(cycles)
local x = RNDHALF
local start = getTime()
for n=1,cycles do
local y = x*x
end
return getTime() - start
end,
square_pow = function(cycles)
local x = RNDHALF
local pow = math.pow
local start = getTime()
for n=1,cycles do
local y = pow(x,2)
end
return getTime() - start
end,
},
test07 = {
modulus_fmod = function(cycles)
local fmod = math.fmod
local start = getTime()
for n=1,cycles do
if fmod(n,30)<1 then
local x = 1
end
end
return getTime() - start
end,
modulus_mod = function(cycles)
local start = getTime()
for n=1,cycles do
if (n%30)<1 then
local x = 1
end
end
return getTime() - start
end,
},
test08 = {
funcparam_anonymous = function(cycles)
local func1 = function(a,b,func) return func(a+b) end
local start = getTime()
for n=1,cycles do
local x = func1(1,2,function(a) return a*2 end)
end
return getTime() - start
end,
funcparam_localized = function(cycles)
local func1 = function(a,b,func) return func(a+b) end
local func2 = function(a) return a*2 end
local start = getTime()
for n=1,cycles do
local x = func1(1,2,func2)
end
return getTime() - start
end,
},
test09 = {
iterate_ipairs = function(cycles)
local start = getTime()
for n=1,cycles do
for i,v in ipairs(B) do
local x=v
end
end
return getTime() - start
end,
iterate_pairs = function(cycles)
local start = getTime()
for n=1,cycles do
for i,v in pairs(B) do
local x=v
end
end
return getTime() - start
end,
iterate_fornext = function(cycles)
local start = getTime()
for n=1,cycles do
for i,v in next,B do
local x=v
end
end
return getTime() - start
end,
iterate_fori = function(cycles)
local start = getTime()
for n=1,cycles do
for i=1,#B do
local x=B[i]
end
end
return getTime() - start
end,
iterate_forilocal = function(cycles)
local start = getTime()
local size = #B
for n=1,cycles do
for i=1,size do
local x=B[i]
end
end
return getTime() - start
end,
iterate_while = function(cycles)
local start = getTime()
for n=1,cycles do
local i = #B
while i>0 do
local x=B[i]
i = i-1
end
end
return getTime() - start
end,
},
test10 = {
access_array = function(cycles)
local a = {foo = function() end}
local start = getTime()
for n=1,cycles do
local x = a['foo']
end
return getTime() - start
end,
access_array_variable = function(cycles)
local f = 'foo'
local a = {foo = function() end}
local start = getTime()
for n=1,cycles do
local x = a[f]
end
return getTime() - start
end,
access_method = function(cycles)
local a = {foo = function() end}
local start = getTime()
for n=1,cycles do
local x = a.foo
end
return getTime() - start
end,
},
test11 = {
buffertable_no = function(cycles)
local a = {}
for i=1,100 do a[i] = {x=i} end
local start = getTime()
for n=1,cycles do
for i=1,100 do
a[i].x = a[i].x + 1
end
end
return getTime() - start
end,
buffertable_yes = function(cycles)
local a = {}
for i=1,100 do a[i] = {x=i} end
local start = getTime()
for n=1,cycles do
for i=1,100 do
local y = a[i]
y.x = y.x + 1
end
end
return getTime() - start
end,
},
test12 = {
tableadd_insert = function(cycles)
local a = {}
local insert = table.insert
local start = getTime()
for n=1,cycles do
insert(a,n)
end
return getTime() - start
end,
tableadd_direct = function(cycles)
local a = {}
local start = getTime()
for n=1,cycles do
a[n] = n
end
return getTime() - start
end,
tableadd_tablesize = function(cycles)
local a = {}
local start = getTime()
for n=1,cycles do
a[#a+1] = n
end
return getTime() - start
end,
tableadd_counter = function(cycles)
local a = {}
local count = 1
local start = getTime()
for n=1,cycles do
a[count] = n
count = count + 1
end
return getTime() - start
end,
},
test13 = {
tableparam_anonymous_direct = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local start = getTime()
for n=1,cycles do
func({x = n, y = n-1})
end
return getTime() - start
end,
tableparam_anonymous_lookup = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local a = {x = 1, y = 1}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func({x = a.x, y = a.y})
end
return getTime() - start
end,
tableparam_localized_empty_ro = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local a = {}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func(a)
end
return getTime() - start
end,
tableparam_localized_constructed_ro = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local a = {x = 1, y = 1}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func(a)
end
return getTime() - start
end,
tableparam_tablecopy = function(cycles)
local func = function(t) local x,y = t.x, t.y end
local copy = function(t)
local q = {}
local size = #t
for i=1,size do q[i] = t[i] end
return q
end
local a = {x = 1, y = 1}
local start = getTime()
for n=1,cycles do
a.x, a.y = n, n-1
func(copy(a))
end
return getTime() - start
end,
},
}
function T.run(cycles)
local allTestsKeys = {}
for k in pairs(tests) do allTestsKeys[#allTestsKeys+1] = k end
table.sort(allTestsKeys)
for i=1,#allTestsKeys do
local testKey = allTestsKeys[i]
local test = tests[testKey]
print(string.format('\n################## %s', testKey))
local testKeys = {}
for k in pairs(test) do testKeys[#testKeys+1] = k end
table.sort(testKeys)
local results = {}
for j=1,#testKeys do
local key = testKeys[j]
local time = test[key](cycles)
time = time > 0 and time or 0.00001
local ms = time*1000
local reskey = tostring(math.floor(ms*1000))..tostring(j)
results[reskey] = {ms=ms, test=key, time=time}
end
local resultsKeys = {}
for k in pairs(results) do resultsKeys[#resultsKeys+1] = k end
table.sort(resultsKeys, function(a,b) return tonumber(a) < tonumber(b) end)
local baseResult = results[resultsKeys[1]]
print(string.format('Time for %d cycles (in ms):', cycles))
for j=1,#resultsKeys do
local r = results[resultsKeys[j]]
local percent = (r.ms / baseResult.ms) * 100
local avg_per_cycle = (r.time / cycles) * 1000
print(string.format(' %s: %f (%d%%) (%f per cycle)',
r.test, r.ms, percent, avg_per_cycle))
end
end
end
return T
I'm always open to comments and criticisms, and if you think of any useful tests to add, let me know.
Re: Lua Performance Tips
Interesting results. I do have a couple of criticisms however:
This is an extra assignment which is not present in the "localize_method_no" test.
I would change it to:
Test 8 probably also does an extra assignment (to a temp variable which is referenced to the "anonymous" function)
A more 'fair' test for funcparam_localized might be:
Code: Select all
for n=1,cycles do
local test = CLASS.test
I would change it to:
Code: Select all
local test = CLASS.test
for n=1,cycles do
I'm not that suprised since a custom unpack function would probably have to make more operations to the stack than the regular unpack. "unpack_custom" needs to start a function scope, look up values, push them on the stack, etc.- Strangely, custom unpack function is slower than localized unpack.
math.max can work with a variable number of arguments so I assume that's why it's slower in your test.- Do not use math.max, math.min. Use conditionals instead.
Test 8 probably also does an extra assignment (to a temp variable which is referenced to the "anonymous" function)
A more 'fair' test for funcparam_localized might be:
Code: Select all
local func1 = function(a,b,func) return func(a+b) end
local start = getTime()
for n=1,cycles do
local func2 = function(a) return a*2 end
local x = func1(1,2,func2)
end
- bartbes
- Sex machine
- Posts: 4946
- Joined: Fri Aug 29, 2008 10:35 am
- Location: The Netherlands
- Contact:
Re: Lua Performance Tips
If you're on OSX then you're using a version for the wrong architecture (x86, or ppc, since it is most likely you have an x64).kefka wrote:mach-o, but wrong architecture
If you're not on OSX, well, then you have binaries for the wrong OS.
- slime
- Solid Snayke
- Posts: 3166
- Joined: Mon Aug 23, 2010 6:45 am
- Location: Nova Scotia, Canada
- Contact:
Re: Lua Performance Tips
Actually in this case it's probably x64 when you need x86 or PPC, because LÖVE is 32-bit.bartbes wrote:If you're on OSX then you're using a version for the wrong architecture (x86, or ppc, since it is most likely you have an x64).kefka wrote:mach-o, but wrong architecture
If you're not on OSX, well, then you have binaries for the wrong OS.
Who is online
Users browsing this forum: Ahrefs [Bot] and 0 guests