Code: Select all
function sign ( x )
x = x / ( math.abs ( x ) + 1 )
return math.floor ( x ) + math.ceil ( x )
end
Code: Select all
function sign ( x )
x = x / ( math.abs ( x ) + 1 )
return math.floor ( x ) + math.ceil ( x )
end
Code: Select all
local function sign_clamp(x)
return math.max(math.min(x * 1e200 * 1e200, 1), -1)
end
Code: Select all
$ luajit
LuaJIT 2.0.4 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
> function sign(x)
>> x = x / (math.abs(x) + 1)
>> return math.floor(x) + math.ceil(x)
>> end
> print(sign(1e20))
2
>
Code: Select all
return math.min ( 1, math.max ( -1, math.ceil ( x ) + math.floor ( x ) ) )
Floating point multiplication is well known for not being associative. Compilers don't fold these things unless you pass them special flags (e.g. -fassociative-math in gcc). I'm pretty sure LuaJIT doesn't do that, and the results and compiled code agree.
It fails to give the correct output in my hardware for certain inputs. See my edit above.
For SSE2 that won't happen, but even if it's not available, it doesn't matter if you get 1e400 inside the FPU. LuaJIT will still do the math in the order specified, meaning the first product will always give zero when the input is zero. If you get infinity (or 1e400) in the second product, it will be correctly clamped. Therefore the function performs as specified.
Now you're on to something. Works fine for any ranges I could test. But in my testing, the performance is poorer than the multiply version:raidho36 wrote: ↑Mon Feb 26, 2018 10:43 amCode: Select all
return math.min ( 1, math.max ( -1, math.ceil ( x ) + math.floor ( x ) ) )
Code: Select all
Test 1 Test 2
Branch : 0.039085 0.035571
AbsDiv : 0.020082 0.019601
Mul Clamp : 0.005007 0.004514
Floor+Ceil: 0.011755 0.011638
Code: Select all
local t = {}
local iter = 5000000
math.randomseed(1)
for i = 1, iter do
t[i] = math.random() * 200 - 100
end
local function branch(x)
return x < 0 and -1 or x > 0 and 1 or x
end
local function absdiv(x)
x = x / (math.abs (x) + 1)
return math.floor(x) + math.ceil(x)
end
local function clamp(x)
return math.max(math.min(x * 1e200 * 1e200, 1), -1)
end
local function clamp2(x)
return math.min(1, math.max (-1, math.ceil(x) + math.floor(x)))
end
local function benchmark(fn)
math.randomseed(1)
local sum = 0
local start = os.clock()
for i = 1, iter do
sum = sum + fn(t[i])
end
local finish = os.clock()
print(finish - start, sum)
end
local function benchmark_branch()
math.randomseed(1)
local sum = 0
local start = os.clock()
for i = 1, iter do
sum = sum + branch(t[i])
end
local finish = os.clock()
print(finish - start, sum)
end
local function benchmark_absdiv()
math.randomseed(1)
local sum = 0
local start = os.clock()
for i = 1, iter do
sum = sum + absdiv(t[i])
end
local finish = os.clock()
print(finish - start, sum)
end
local function benchmark_clamp()
math.randomseed(1)
local sum = 0
local start = os.clock()
for i = 1, iter do
sum = sum + clamp(t[i])
end
local finish = os.clock()
print(finish - start, sum)
end
local function benchmark_clamp2()
math.randomseed(1)
local sum = 0
local start = os.clock()
for i = 1, iter do
sum = sum + clamp2(t[i])
end
local finish = os.clock()
print(finish - start, sum)
end
-- First "priming" pass
benchmark(branch)
benchmark(absdiv)
benchmark(clamp)
benchmark(clamp2)
benchmark_branch()
benchmark_absdiv()
benchmark_clamp()
benchmark_clamp2()
print("----- Results:")
-- Actual pass
benchmark(branch)
benchmark(absdiv)
benchmark(clamp)
benchmark(clamp2)
benchmark_branch()
benchmark_absdiv()
benchmark_clamp()
benchmark_clamp2()
Code: Select all
Branch : 0.268995 0.267536
AbsDiv : 1.001139 1.001986
Mul Clamp : 0.657281 0.655257
Floor+Ceil: 1.123145 1.117501
Code: Select all
->LOOP:
0bcef6a0 cmp dword [rcx+rbp*8+0x4], 0xfffeffff
0bcef6a8 jnb 0x0bce0018 ->2
0bcef6ae movsd xmm6, [rcx+rbp*8]
0bcef6b3 mulsd xmm6, xmm2
0bcef6b7 mulsd xmm6, xmm2
0bcef6bb minsd xmm6, xmm1
0bcef6bf maxsd xmm6, xmm0
0bcef6c3 addsd xmm7, xmm6
0bcef6c7 add ebp, +0x01
0bcef6ca cmp ebp, eax
0bcef6cc jle 0x0bcef6a0 ->LOOP
0bcef6ce jmp 0x0bce001c ->3
Code: Select all
->LOOP:
0bcef4e0 cmp dword [rcx+rbp*8+0x4], 0xfffeffff
0bcef4e8 jnb 0x0bce0018 ->2
0bcef4ee movsd xmm6, [rcx+rbp*8]
0bcef4f3 roundsd xmm5, xmm6, 0x0a
0bcef4f9 roundsd xmm6, xmm6, 0x09
0bcef4ff addsd xmm6, xmm5
0bcef503 maxsd xmm6, xmm1
0bcef507 minsd xmm6, xmm0
0bcef50b addsd xmm7, xmm6
0bcef50f add ebp, +0x01
0bcef512 cmp ebp, eax
0bcef514 jle 0x0bcef4e0 ->LOOP
Code: Select all
min max avg mean check
branch 0.7235 0.7268 0.7242 0.7239 2720000
div 0.2133 0.2134 0.2133 0.2133 2720000
fast 0.0841 0.0862 0.0849 0.0846 2720000
sign 0.0822 0.0849 0.0835 0.0829 2720000
dummy 0.0805 0.0807 0.0806 0.0806 1251059.0378535
Code: Select all
$ love10 .
testing branch .
testing div .
testing fast .
testing sign .
testing dummy .
testing branch ..........
testing div ..........
testing fast ..........
testing sign ..........
testing dummy ..........
min max avg mean check
branch 0.4934 0.4947 0.4942 0.4943 2720000
div 0.2432 0.2432 0.2432 0.2432 2720000
fast 0.0557 0.0558 0.0557 0.0557 2720000
sign 0.146 0.1461 0.146 0.146 2720000
dummy 0.0548 0.0548 0.0548 0.0548 1251059.0378535
Code: Select all
$ cat /proc/cpuinfo | grep model\ name | uniq
model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
"Double-precision floating-point" should usually refer to the double-precision floating-point format as specified in IEEE 754, but I'm not sure whether it's the case with Lua.Number represents real (double-precision floating-point) numbers.
Users browsing this forum: Amazon [Bot], Bing [Bot] and 11 guests