Lua Performance Tips

General discussion about LÖVE, Lua, game development, puns, and unicorns.
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Lua Performance Tips

Post by Roland_Yonaba »

Bruce Whiteside wrote:Make it work before you make it work fast.
Do Agree with that.
User avatar
tentus
Inner party member
Posts: 1060
Joined: Sun Oct 31, 2010 7:56 pm
Location: Appalachia
Contact:

Re: Lua Performance Tips

Post by tentus »

vrld wrote: LuaProfiler is great for that.
How would one use that with Love? I saw the require but it's also talking about compiling, which I'm pretty rusty on.
Kurosuke needs beta testers
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Lua Performance Tips

Post by Roland_Yonaba »

I remember I have been using LuaProfiler the last week, while writing my basic RTS project.
It is pretty simple to use. Just place the 'profiler.dll' near your program folder, and load it this way:

Code: Select all

local profiler = require 'profiler'
No need to add '.dll' extension, require does this automatically.

Then start the profiler where you want in your script this way:

Code: Select all

profiler.start()
And stop it this way;

Code: Select all

profiler.stop()
Pretty simple. There will be an outputted file generally named 'lprof_xxxx.out', open it using any text editor to see the results.
Attachments
profiler.zip
LuaProfiler
(45.24 KiB) Downloaded 244 times
User avatar
tentus
Inner party member
Posts: 1060
Joined: Sun Oct 31, 2010 7:56 pm
Location: Appalachia
Contact:

Re: Lua Performance Tips

Post by tentus »

Roland_Yonaba wrote:I remember I have been using LuaProfiler the last week, while writing my basic RTS project.
It is pretty simple to use. Just place the 'profiler.dll' near your program folder, and load it this way:

Code: Select all

local profiler = require 'profiler'
No need to add '.dll' extension, require does this automatically.

Then start the profiler where you want in your script this way:

Code: Select all

profiler.start()
And stop it this way;

Code: Select all

profiler.stop()
Pretty simple. There will be an outputted file generally named 'lprof_xxxx.out', open it using any text editor to see the results.
I get something about lua5.1.dll?
Kurosuke needs beta testers
kefka
Prole
Posts: 9
Joined: Thu Feb 17, 2011 3:34 am

Re: Lua Performance Tips

Post by kefka »

LuaProfiler works for me fine when i use my system Lua (5.1), but when I include it in a love script, i get the following:

Error: error loading module 'profiler' from file '/usr/local/lib/lua/5.1/profiler.so':
dlopen(/usr/local/lib/lua/5.1/profiler.so, 2): no suitable image found. Did find:
/usr/local/lib/lua/5.1/profiler.so: mach-o, but wrong architecture
User avatar
slime
Solid Snayke
Posts: 3166
Joined: Mon Aug 23, 2010 6:45 am
Location: Nova Scotia, Canada
Contact:

Re: Lua Performance Tips

Post by slime »

You may need to compile LuaProfiler for 32-bit if you're on a 64-bit system.
User avatar
benloran
Prole
Posts: 19
Joined: Tue Jul 05, 2011 4:52 pm

Re: Lua Performance Tips

Post by benloran »

After reading through the link that Taehl posted on the first page, I got curious to see if I would get similar results. Also, I wanted to make some tests that I could run specifically with Love and its version of Lua. So I wrote a bunch of tests, looped them each 10 million times, and timed them with love.timer.getTime.

Here are the results, and some of my conclusions. In case it matters, I'm running this on an old Mac Pro, with OS X Lion.

Code: Select all

################## test01
Time for 10000000 cycles (in ms):
   localize_function_yes: 3388.999939 (100%) (0.000339 per cycle)
   localize_function_no: 4285.999894 (126%) (0.000429 per cycle)

CONCLUSIONS:
- Always localize functions.


################## test02
Time for 10000000 cycles (in ms):
   localize_method_yes: 30894.996643 (100%) (0.003089 per cycle)
   localize_method_no: 33195.000648 (107%) (0.003320 per cycle)

CONCLUSIONS:
- If method is called many times in the function or loop, localize it first
  inside the function or loop.


################## test03
Time for 10000000 cycles (in ms):
   unpack_no: 4053.001404 (100%) (0.000405 per cycle)
   unpack_localize: 4891.998291 (120%) (0.000489 per cycle)
   unpack_custom: 5466.003418 (134%) (0.000547 per cycle)

CONCLUSIONS:
- Do not use unpack if you can avoid it.
- Strangely, custom unpack function is slower than localized unpack.


################## test04
Time for 10000000 cycles (in ms):
   max_no: 1964.996338 (100%) (0.000196 per cycle)
   max_yes: 3333.000183 (169%) (0.000333 per cycle)

CONCLUSIONS:
- Do not use math.max, math.min. Use conditionals instead.


################## test05
Time for 10000000 cycles (in ms):
   nil_or: 2003.997803 (100%) (0.000200 per cycle)
   nil_if: 2057.998657 (102%) (0.000206 per cycle)
   nil_andor: 2117.004395 (105%) (0.000212 per cycle)

CONCLUSIONS:
- It does not really matter unless used in time-critical code, in which case,
  use 'x = y or 1' instead of 'if y == nil then x = 1 end'.
- However, 'x = y==nil and 1 or y' is slowest of all.


################## test06
Time for 10000000 cycles (in ms):
   square_mult: 341.003418 (100%) (0.000034 per cycle)
   square_caret: 607.002258 (178%) (0.000061 per cycle)
   square_pow: 1872.993469 (549%) (0.000187 per cycle)

CONCLUSIONS:
- Use multiplication 'x*x' instead of ^ operator 'x^2'.
- As expected, math.pow is super slow.


################## test07
Time for 10000000 cycles (in ms):
   modulus_mod: 715.995789 (100%) (0.000072 per cycle)
   modulus_fmod: 1896.003723 (264%) (0.000190 per cycle)

CONCLUSIONS:
- If you can, use % operator instead of math.fmod.


################## test08
Time for 10000000 cycles (in ms):
   funcparam_localized: 2852.005005 (100%) (0.000285 per cycle)
   funcparam_anonymous: 6011.001587 (210%) (0.000601 per cycle)

CONCLUSIONS:
- Always localize passed functions. Do not pass anonymous functions.


################## test09
Time for 10000000 cycles (in ms):
   iterate_forilocal: 135803.985596 (100%) (0.013580 per cycle)
   iterate_fori: 158284.004211 (116%) (0.015828 per cycle)
   iterate_while: 196549.072266 (144%) (0.019655 per cycle)
   iterate_fornext: 353249.023438 (260%) (0.035325 per cycle)
   iterate_pairs: 354534.912109 (261%) (0.035453 per cycle)
   iterate_ipairs: 367575.012207 (270%) (0.036758 per cycle)

CONCLUSIONS:
- If ONLY traversing the array portion of a table, whether order of elements
  matters or not, use:
       local size = #table
       for i=1,size do
         ...
       end
- Otherwise, use 'for k,v in pairs(table) do', or 'for k,v in next,table do'.
The difference in speed between these two is not noticable.


################## test10
Time for 10000000 cycles (in ms):
   access_method: 540.039062 (100%) (0.000054 per cycle)
   access_array: 547.973633 (101%) (0.000055 per cycle)
   access_array_variable: 572.998047 (106%) (0.000057 per cycle)

CONCLUSIONS:
- Use whichever you want. Time difference is small enough to barely register.
- The variable lookup has a small overhead, as expected.


################## test11
Time for 10000000 cycles (in ms):
   buffertable_yes: 155816.040039 (100%) (0.015582 per cycle)
   buffertable_no: 191192.016602 (122%) (0.019119 per cycle)

CONCLUSIONS:
- Localize table index if used more than once in the function/loop.
- This is basically the same conclusion as test02.


################## test12
Time for 10000000 cycles (in ms):
   tableadd_direct: 1044.067383 (100%) (0.000104 per cycle)
   tableadd_counter: 1216.918945 (116%) (0.000122 per cycle)
   tableadd_tablesize: 3295.043945 (315%) (0.000330 per cycle)
   tableadd_insert: 4183.959961 (400%) (0.000418 per cycle)

CONCLUSIONS:
- Do not use table.insert unless you have to! (Not even localized.)
- Using 'table[#table+1]' is not much better.
- Direct access usually is not feasable.
- So, use a counter variable when possible.


################## test13
Time for 10000000 cycles (in ms):
   tableparam_localized_empty_ro: 3413.085938 (100%) (0.000341 per cycle)
   tableparam_localized_constructed_ro: 3436.035156 (100%) (0.000344 per cycle)
   tableparam_tablecopy: 10267.089844 (300%) (0.001027 per cycle)
   tableparam_anonymous_direct: 10770.996094 (315%) (0.001077 per cycle)
   tableparam_anonymous_lookup: 12754.882812 (373%) (0.001275 per cycle)

NOTE: The tests ending in _ro mean the function is expected to treat the
  table as read-only (otherwise it would be modifying the same table on each
  iteration). This skews the times in favor of these _ro tests.

CONCLUSIONS:
- Do not pass anonymous tables unless you have to!
- That is, try to avoid creating short-lived tables within loops or functions.
- Instead, localize a table outside the loop and modify it before passing it.
  (This is only feasable if the function you're passing the table into treats
  the passed table as read-only.)
- Surprisingly, whether you create an empty table or a constructed table (with
  the correct number of elements) does not make much difference (at least, not
  with only two key-value pairs).
- Even more surprisingly, the fastest non-readonly solution is calling a
  function to copy the table from within the loop, and pass the result table
  on to the real function. I suppose this is similar to a closure.
And here are the tests, for reference:

Code: Select all

local T = {}

local getTime = love.timer.getTime
local RND = 100000
local RNDHALF = RND/2
local CLASS = {test = function() return math.random(RND) end}
local A = {math.random(RND), math.random(RND), math.random(RND), math.random(RND)}
local B = {}
for i=1,200 do B[i] = i end

local tests = {
	test01 = {
		localize_function_no = function(cycles)
			local random = math.random

			local start = getTime()
			for n=1,cycles do
				local x = math.min(random(RND), RNDHALF)
			end
			return getTime() - start
		end,

		localize_function_yes = function(cycles)
			local min = math.min
			local random = math.random

			local start = getTime()
			for n=1,cycles do
				local x = min(random(RND), RNDHALF)
			end
			return getTime() - start
		end,
	},

	test02 = {
		localize_method_no = function(cycles)
			local start = getTime()
			for n=1,cycles do
				local z = CLASS.test()
				local y = CLASS.test()
				local x = CLASS.test()
				local w = CLASS.test()
				local v = CLASS.test()
				local u = CLASS.test()
				local t = CLASS.test()
				local s = CLASS.test()
			end
			return getTime() - start
		end,

		localize_method_yes = function(cycles)
			local start = getTime()
			for n=1,cycles do
				local test = CLASS.test
				local z = test()
				local y = test()
				local x = test()
				local w = test()
				local v = test()
				local u = test()
				local t = test()
				local s = test()
			end
			return getTime() - start
		end,
	},

	test03 = {
		unpack_no = function(cycles)
			local min = math.min

			local start = getTime()
			for n=1,cycles do
				local x = min(A[1], A[2], A[3], A[4])
			end
			return getTime() - start
		end,

		unpack_localize = function(cycles)
			local min = math.min
			local unpack = unpack

			local start = getTime()
			for n=1,cycles do
				local x = min(unpack(A))
			end
			return getTime() - start
		end,

		unpack_custom = function(cycles)
			local min = math.min
			local function unpack4(a) return a[1],a[2],a[3],a[4] end

			local start = getTime()
			for n=1,cycles do
				local x = min(unpack4(A))
			end
			return getTime() - start
		end,
	},

	test04 = {
		max_yes = function(cycles)
			local max = math.max
			local random = math.random
			local x = random(RND)

			local start = getTime()
			for n=1,cycles do
				x = max(random(RND), x)
			end
			return getTime() - start
		end,

		max_no = function(cycles)
			local random = math.random
			local x = random(RND)

			local start = getTime()
			for n=1,cycles do
				local r = random(RND)
				if r>x then x = r end
			end
			return getTime() - start
		end,
	},

	test05 = {
		nil_if = function(cycles)
			local random = math.random

			local start = getTime()
			for n=1,cycles do
				local y,x
				if random()>0.5 then y=1 end
				if y==nil then x=1 else x=y end
			end
			return getTime() - start
		end,

		nil_or = function(cycles)
			local random = math.random

			local start = getTime()
			for n=1,cycles do
				local y
				if random()>0.5 then y=1 end
				local x = y or 1
			end
			return getTime() - start
		end,

		nil_andor = function(cycles)
			local random = math.random

			local start = getTime()
			for n=1,cycles do
				local y
				if random()>0.5 then y=1 end
				local x = y==nil and 1 or y
			end
			return getTime() - start
		end,
	},

	test06 = {
		square_caret = function(cycles)
			local x = RNDHALF

			local start = getTime()
			for n=1,cycles do
				local y = x^2
			end
			return getTime() - start
		end,

		square_mult = function(cycles)
			local x = RNDHALF

			local start = getTime()
			for n=1,cycles do
				local y = x*x
			end
			return getTime() - start
		end,

		square_pow = function(cycles)
			local x = RNDHALF
			local pow = math.pow

			local start = getTime()
			for n=1,cycles do
				local y = pow(x,2)
			end
			return getTime() - start
		end,
	},

	test07 = {
		modulus_fmod = function(cycles)
			local fmod = math.fmod

			local start = getTime()
			for n=1,cycles do
				if fmod(n,30)<1 then
					local x = 1
				end
			end
			return getTime() - start
		end,

		modulus_mod = function(cycles)
			local start = getTime()
			for n=1,cycles do
				if (n%30)<1 then
					local x = 1
				end
			end
			return getTime() - start
		end,
	},

	test08 = {
		funcparam_anonymous = function(cycles)
			local func1 = function(a,b,func) return func(a+b) end

			local start = getTime()
			for n=1,cycles do
				local x = func1(1,2,function(a) return a*2 end)
			end
			return getTime() - start
		end,

		funcparam_localized = function(cycles)
			local func1 = function(a,b,func) return func(a+b) end
			local func2 = function(a) return a*2 end

			local start = getTime()
			for n=1,cycles do
				local x = func1(1,2,func2)
			end
			return getTime() - start
		end,
	},

	test09 = {
		iterate_ipairs = function(cycles)
			local start = getTime()
			for n=1,cycles do
				for i,v in ipairs(B) do
					local x=v
				end
			end
			return getTime() - start
		end,

		iterate_pairs = function(cycles)
			local start = getTime()
			for n=1,cycles do
				for i,v in pairs(B) do
					local x=v
				end
			end
			return getTime() - start
		end,

		iterate_fornext = function(cycles)
			local start = getTime()
			for n=1,cycles do
				for i,v in next,B do
					local x=v
				end
			end
			return getTime() - start
		end,

		iterate_fori = function(cycles)
			local start = getTime()
			for n=1,cycles do
				for i=1,#B do
					local x=B[i]
				end
			end
			return getTime() - start
		end,

		iterate_forilocal = function(cycles)
			local start = getTime()
			local size = #B
			for n=1,cycles do
				for i=1,size do
					local x=B[i]
				end
			end
			return getTime() - start
		end,

		iterate_while = function(cycles)
			local start = getTime()
			for n=1,cycles do
				local i = #B
				while i>0 do
					local x=B[i]
					i = i-1
				end
			end
			return getTime() - start
		end,
	},

	test10 = {
		access_array = function(cycles)
			local a = {foo = function() end}

			local start = getTime()
			for n=1,cycles do
				local x = a['foo']
			end
			return getTime() - start
		end,

		access_array_variable = function(cycles)
			local f = 'foo'
			local a = {foo = function() end}

			local start = getTime()
			for n=1,cycles do
				local x = a[f]
			end
			return getTime() - start
		end,

		access_method = function(cycles)
			local a = {foo = function() end}

			local start = getTime()
			for n=1,cycles do
				local x = a.foo
			end
			return getTime() - start
		end,
	},

	test11 = {
		buffertable_no = function(cycles)
			local a = {}
			for i=1,100 do a[i] = {x=i} end

			local start = getTime()
			for n=1,cycles do
				for i=1,100 do
					a[i].x = a[i].x + 1
				end
			end
			return getTime() - start
		end,

		buffertable_yes = function(cycles)
			local a = {}
			for i=1,100 do a[i] = {x=i} end

			local start = getTime()
			for n=1,cycles do
				for i=1,100 do
					local y = a[i]
					y.x = y.x + 1
				end
			end
			return getTime() - start
		end,
	},

	test12 = {
		tableadd_insert = function(cycles)
			local a = {}
			local insert = table.insert

			local start = getTime()
			for n=1,cycles do
				insert(a,n)
			end
			return getTime() - start
		end,

		tableadd_direct = function(cycles)
			local a = {}

			local start = getTime()
			for n=1,cycles do
				a[n] = n
			end
			return getTime() - start
		end,

		tableadd_tablesize = function(cycles)
			local a = {}

			local start = getTime()
			for n=1,cycles do
				a[#a+1] = n
			end
			return getTime() - start
		end,

		tableadd_counter = function(cycles)
			local a = {}
			local count = 1

			local start = getTime()
			for n=1,cycles do
				a[count] = n
				count = count + 1
			end
			return getTime() - start
		end,
	},

	test13 = {
		tableparam_anonymous_direct = function(cycles)
			local func = function(t) local x,y = t.x, t.y end

			local start = getTime()
			for n=1,cycles do
				func({x = n, y = n-1})
			end
			return getTime() - start
		end,

		tableparam_anonymous_lookup = function(cycles)
			local func = function(t) local x,y = t.x, t.y end
			local a = {x = 1, y = 1}

			local start = getTime()
			for n=1,cycles do
				a.x, a.y = n, n-1
				func({x = a.x, y = a.y})
			end
			return getTime() - start
		end,

		tableparam_localized_empty_ro = function(cycles)
			local func = function(t) local x,y = t.x, t.y end
			local a = {}

			local start = getTime()
			for n=1,cycles do
				a.x, a.y = n, n-1
				func(a)
			end
			return getTime() - start
		end,
		
		tableparam_localized_constructed_ro = function(cycles)
			local func = function(t) local x,y = t.x, t.y end
			local a = {x = 1, y = 1}

			local start = getTime()
			for n=1,cycles do
				a.x, a.y = n, n-1
				func(a)
			end
			return getTime() - start
		end,

		tableparam_tablecopy = function(cycles)
			local func = function(t) local x,y = t.x, t.y end
			local copy = function(t)
				local q = {}
				local size = #t
				for i=1,size do q[i] = t[i] end
				return q
			end
			local a = {x = 1, y = 1}

			local start = getTime()
			for n=1,cycles do
				a.x, a.y = n, n-1
				func(copy(a))
			end
			return getTime() - start
		end,
	},
}

function T.run(cycles)
	local allTestsKeys = {}
	for k in pairs(tests) do allTestsKeys[#allTestsKeys+1] = k end
	table.sort(allTestsKeys)

	for i=1,#allTestsKeys do
		local testKey = allTestsKeys[i]
		local test = tests[testKey]
		print(string.format('\n################## %s', testKey))

		local testKeys = {}
		for k in pairs(test) do testKeys[#testKeys+1] = k end
		table.sort(testKeys)

		local results = {}

		for j=1,#testKeys do
			local key = testKeys[j]
			local time = test[key](cycles)
			time = time > 0 and time or 0.00001
			local ms = time*1000
			local reskey = tostring(math.floor(ms*1000))..tostring(j)
			results[reskey] = {ms=ms, test=key, time=time}
		end

		local resultsKeys = {}
		for k in pairs(results) do resultsKeys[#resultsKeys+1] = k end
		table.sort(resultsKeys, function(a,b) return tonumber(a) < tonumber(b) end)
		local baseResult = results[resultsKeys[1]]

		print(string.format('Time for %d cycles (in ms):', cycles))
		for j=1,#resultsKeys do
			local r = results[resultsKeys[j]]
			local percent = (r.ms / baseResult.ms) * 100
			local avg_per_cycle = (r.time / cycles) * 1000
			print(string.format('   %s: %f (%d%%) (%f per cycle)',
				r.test, r.ms, percent, avg_per_cycle))
		end
	end
end

return T
I know there is a lot of repeated code in that test file, but I wanted to make sure I was only timing the inner loops and not any of the setup. Plus, I wrote it pretty quickly, I'm sure it could be better.

I'm always open to comments and criticisms, and if you think of any useful tests to add, let me know.
User avatar
ivan
Party member
Posts: 1915
Joined: Fri Mar 07, 2008 1:39 pm
Contact:

Re: Lua Performance Tips

Post by ivan »

Interesting results. I do have a couple of criticisms however:

Code: Select all

         for n=1,cycles do
            local test = CLASS.test
This is an extra assignment which is not present in the "localize_method_no" test.
I would change it to:

Code: Select all

         local test = CLASS.test
         for n=1,cycles do
- Strangely, custom unpack function is slower than localized unpack.
I'm not that suprised since a custom unpack function would probably have to make more operations to the stack than the regular unpack. "unpack_custom" needs to start a function scope, look up values, push them on the stack, etc.
- Do not use math.max, math.min. Use conditionals instead.
math.max can work with a variable number of arguments so I assume that's why it's slower in your test.

Test 8 probably also does an extra assignment (to a temp variable which is referenced to the "anonymous" function)
A more 'fair' test for funcparam_localized might be:

Code: Select all

         local func1 = function(a,b,func) return func(a+b) end

         local start = getTime()
         for n=1,cycles do
            local func2 = function(a) return a*2 end
            local x = func1(1,2,func2)
         end
User avatar
bartbes
Sex machine
Posts: 4946
Joined: Fri Aug 29, 2008 10:35 am
Location: The Netherlands
Contact:

Re: Lua Performance Tips

Post by bartbes »

kefka wrote:mach-o, but wrong architecture
If you're on OSX then you're using a version for the wrong architecture (x86, or ppc, since it is most likely you have an x64).
If you're not on OSX, well, then you have binaries for the wrong OS.
User avatar
slime
Solid Snayke
Posts: 3166
Joined: Mon Aug 23, 2010 6:45 am
Location: Nova Scotia, Canada
Contact:

Re: Lua Performance Tips

Post by slime »

bartbes wrote:
kefka wrote:mach-o, but wrong architecture
If you're on OSX then you're using a version for the wrong architecture (x86, or ppc, since it is most likely you have an x64).
If you're not on OSX, well, then you have binaries for the wrong OS.
Actually in this case it's probably x64 when you need x86 or PPC, because LÖVE is 32-bit.
Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot] and 0 guests