Equivalent to Gamemaker's sign() function?

coffeecat · Post by **coffeecat** » Sun Feb 11, 2018 2:13 am

Thanks for the benchmark. Wow, I didn't expect the branch version is actually this much slower than those with floating point * /

, assuming the benchmark is done correctly.

The clamp version won't work in cases that the number is very close to 0. This could be important for some algorithms. The minimum positive number representable in double is approximately 4.9 * 10^(−324), while the maximum is approximately 1.7976931348623157 * 10^308. (https://en.wikipedia.org/wiki/Double-pr ... int_format) It could still be the case that math.abs(n*math.huge) < 1. I think n*math.huge*math.huge would suffice.

I'd suggest to rewrite well-isolated Lua code in C, if the performance really matters. It will give you a tighter control over everything.

pgimeno · Post by **pgimeno** » Sun Feb 11, 2018 9:06 am

raidho36 wrote: ↑Sat Feb 10, 2018 11:10 pm
pgimeno wrote: ↑Sat Feb 10, 2018 7:50 pmHowever I'm having a hard trying to imagine a situation where the performance of a game critically depends on the performance of the sgn function
Oh that's easy. You have a tight loop that runs thousands of times per frame and does some very basic math that's only a few cycles, but some variables require flipping depending on sign of other variables, i.e. you do
Code: Select all
abc = abc * sign ( xyz )
and then the performance difference between conditional and branchless code is in double digit percentage.
Code: Select all
branch	0.6830	0.6939	0.6866	0.6846
absdiv	0.1585	0.1597	0.1595	0.1596
clamp	0.0815	0.0842	0.0822	0.0819
dummy	0.0805	0.0815	0.0807	0.0806

local function branch ( n )
	return n < 0 and -1 or n > 0 and 1 or 0
end
local function absdiv ( n )
	return n == 0 and 0 or n / abs ( n )
end
local function clamp ( n )
	return max ( min ( n * inf, 1 ), -1 )
end
local function dummy ( n )
	return n
end
The clamp variant is as cheap as functions come. LuaJIT even automatically puts it inline, there's no actual function call.

I couldn't reproduce your results. Here are my benchmarks:

Code: Select all

Branch: 0.154363
AbsDiv: 0.164355
Clamp : 0.152699

And here's my code:

Code: Select all

local function branch ( n )
	return n < 0 and -1 or n > 0 and 1 or n
end
local function absdiv ( n )
	return n == 0 and 0 or n / math.abs ( n )
end
local function clamp ( n )
	return math.max ( math.min ( n, 1 ), -1 )
end

local function benchmark(iter, fn)
  local sum = 0
  local start = os.clock()
  for i = 1, iter do
    sum = sum + fn(math.random(-100, 100))
  end
  local finish = os.clock()
  print(finish - start, sum)
end

-- "Priming" pass
benchmark(50000, branch)
benchmark(50000, absdiv)
benchmark(50000, clamp)
-- Actual pass
benchmark(5000000, branch)
benchmark(5000000, absdiv)
benchmark(5000000, clamp)

coffeecat wrote: ↑Sun Feb 11, 2018 2:13 amThe clamp version won't work in cases that the number is very close to 0. This could be important for some algorithms.

Right, it's mostly of use in cases where the numbers are integer, not really general.

zorg · Post by **zorg** » Sun Feb 11, 2018 9:38 am

pgimeno wrote: ↑Sun Feb 11, 2018 9:06 am
coffeecat wrote: ↑Sun Feb 11, 2018 2:13 amThe clamp version won't work in cases that the number is very close to 0. This could be important for some algorithms.
Right, it's mostly of use in cases where the numbers are integer, not really general.

He might have meant that, but that clamp function that raidho defined literally returns NaN for exact 0 because of n * math.huge (inf in his code block) and it gets propagated by the min and max functions.

ivan · Post by **ivan** » Sun Feb 11, 2018 12:44 pm

Also, if you want the test to be deterministic, we need to either move math.rand outside of the test function or:

Code: Select all

local function benchmark(iter, fn)
  math.randomseed(0)
  ...

The branching variant is the fastest on my machine btw.
After localizing math.min/math.max, the branching version is still the fastest!

coffeecat · Post by **coffeecat** » Sun Feb 11, 2018 1:15 pm

I can see a problem in pgimeno's benchmark code: math.random is slow, and could dominate the cost of the loop body. (EDIT: I now realize that it's math.random rather than love.math.random. Now I am not sure that math.random is slow, as LuaJIT could possibly provide a very efficient implementation of math.random) Also note that all input numbers are integers.

He might have meant that, but that clamp function that raidho defined literally returns NaN for exact 0 because of n * math.huge (inf in his code block) and it gets propagated by the min and max functions.

Sorry, I thought math.huge is the maximum finite number representable in double. I think the clamp version is correct if it is (n * max_finite_double * max_finite_double)

pgimeno · Post by **pgimeno** » Sun Feb 11, 2018 3:44 pm

coffeecat wrote: ↑Sun Feb 11, 2018 1:15 pm I can see a problem in pgimeno's benchmark code: math.random is slow, and could dominate the cost of the loop body.

It probably does. However, note that a) it's unlikely that the sign function appears alone in any computation-intensive process, therefore it makes the benchmark a bit less unrealistic, b) the test results are consistent even when the generator is seeded as suggested by Ivan and even when the order of the tests is changed, c) a different benchmark makes the branch function slightly slower (by a tight margin), showing that certain fine details could make a difference. If the performance of your game critically depends on a loop like that, you'd better be coding that game in C++.

coffeecat wrote: ↑Sun Feb 11, 2018 1:15 pm Also note that all input numbers are integers.

Yes, otherwise it wouldn't be fair to include the clamp method. I've benchmarked with floats and excluding the clamp, same relative results.

coffeecat wrote: ↑Sun Feb 11, 2018 1:15 pmSorry, I thought math.huge is the maximum finite number representable in double. I think the clamp version is correct if it is (n * max_finite_double * max_finite_double)

No, floating-point numbers have a greater range near zero than near infinite, due to denormals:

Code: Select all

> print(1e-310, 1e310)
1e-310	inf

The maximum finite value for double is (in Python):

Code: Select all

>>> float.fromhex('0x1.fffffffffffffp+1023')
1.7976931348623157e+308

But

Code: Select all

>>> float.fromhex('0x1.fffffffffffffp+1023') * 1e-310
0.0179769313486231

Edit: However, this would work:

Code: Select all

function sgn(x)
  return math.max(-1, math.min(1, x/5e-324))
end

but that makes it substantially slower than the others in my tests.

Edit2: Oh wait, you said two products. Well, 1e200 works for that. And yeah, that's faster than the absdiv version always, and sometimes faster than the branch version.

coffeecat · Post by **coffeecat** » Sun Feb 11, 2018 4:01 pm

(REMOVED something here. I was being stupidly wrong)

No, floating-point numbers have a greater range near zero than near infinite, due to denormals

Yes, and that's why I suggest (n * max_finite_double * max_finite_double) rather than (n * max_finite_double)

pgimeno · Post by **pgimeno** » Sun Feb 11, 2018 4:02 pm

Yes, sorry, I realized later. See my edit2.

EDIT:
Okay, so I tested without the math.random calls and this is what I got. Branching loses over div by about 30%, but clamp with double multiply wins by an ample margin. test1 and test2 changed the balance in the previous test, but even if it gives a slight advantage to the branch version in test2, it's not enough to change the relative results in this case

Code: Select all

Branch test1: 0.046269
AbsDiv test1: 0.035799
Clamp  test1: 0.006607

Branch test2: 0.042319
AbsDiv test2: 0.035819
Clamp  test2: 0.006366

Test program:

Code: Select all

local t = {}
local iter = 5000000

math.randomseed(1)
for i = 1, iter do
  t[i] = math.random() * 200 - 100
end

local function branch(x)
  return x < 0 and -1 or x > 0 and 1 or x
end

local function absdiv(x)
  return x == 0 and 0 or x / math.abs(x)
end

local function clamp(x)
  return math.max(math.min(x * 1e200 * 1e200, 1), -1)
end

local function benchmark(fn)
  math.randomseed(1)
  local sum = 0
  local start = os.clock()
  for i = 1, iter do
    sum = sum + fn(t[i])
  end
  local finish = os.clock()
  print(finish - start, sum)
end

local function benchmark_branch()
  math.randomseed(1)
  local sum = 0
  local start = os.clock()
  for i = 1, iter do
    sum = sum + branch(t[i])
  end
  local finish = os.clock()
  print(finish - start, sum)
end

local function benchmark_absdiv()
  math.randomseed(1)
  local sum = 0
  local start = os.clock()
  for i = 1, iter do
    sum = sum + absdiv(t[i])
  end
  local finish = os.clock()
  print(finish - start, sum)
end

local function benchmark_clamp()
  math.randomseed(1)
  local sum = 0
  local start = os.clock()
  for i = 1, iter do
    sum = sum + clamp(t[i])
  end
  local finish = os.clock()
  print(finish - start, sum)
end


-- First "priming" pass
benchmark(branch)
benchmark(absdiv)
benchmark(clamp)
benchmark_branch()
benchmark_absdiv()
benchmark_clamp()
-- Actual pass
benchmark(branch)
benchmark(absdiv)
benchmark(clamp)
benchmark_branch()
benchmark_absdiv()
benchmark_clamp()

coffeecat · Post by **coffeecat** » Mon Feb 12, 2018 12:15 am

Thanks for the benchmark. It's an interesting result.

raidho36 · Post by **raidho36** » Mon Feb 12, 2018 3:48 am

coffeecat wrote: ↑Sun Feb 11, 2018 2:13 am The clamp version won't work in cases that the number is very close to 0. This could be important for some algorithms. The minimum positive number representable in double is approximately 4.9 * 10^(−324), while the maximum is approximately 1.7976931348623157 * 10^308. (https://en.wikipedia.org/wiki/Double-pr ... int_format) It could still be the case that math.abs(n*math.huge) < 1. I think n*math.huge*math.huge would suffice.

Don't be silly, any value other than zero multiplied by infinity is infinity, zero times infinity is NaN. Speaking of which - my bad! Forgot about that one. Putting a ternary operator for this case puts a tiny strain on the CPU, but that depends on how many zeroes you're trying to process: worst case scenario is every other value is zero, in which case it's gonna run the same as normal branched variant. If you don't, then this sign function becomes positively biased one - it returns 1 if the input is 0. I suppose that worked for my application because I never needed to zero out the variables, only to flip sign.

Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Re: Equivalent to Gamemaker's sign() function?

Who is online