I don't know how LuaJIT does this, but a naive implementation of Lua 'and' 'or' needs branching. Branching is costly too (may be more costly than you expect).
It's a tricky puzzle to translate that sign bit to a result double. I won't bother with it, a simple naive implementation like the following is probably good enough for me. A naive implementation like this has the benefit of better clarity.
Other than the fact that if/then/else is also branching, we're still at a point where debating whether which of a simple conditional or a div/mul is compiled to a slower bytecode is still in the premature optimization territory with respects to you guys not actually having your projects finished by a very big margin anyway.
Me and my stuff True Neutral Aspirant. Why, yes, i do indeed enjoy sarcastically correcting others when they make the most blatant of spelling mistakes. No bullying or trolling the innocent tho.
coffeecat wrote: ↑Sat Feb 10, 2018 2:09 pm
Both * and / are quite costly for floating numbers, if LuaJIT can't help to optimize. A simple branching like grump's version could be more efficient.
I'm not an expert at LuaJIT, but Lua is an interpreted language so I it's kind of silly to talk about float point operations being costly.
Things like "branching" or any type of conditional logic probably affects the state of the Lua virtual machine so it's hard to say...
However, I do like your implementation for its clarity.
LuaJIT can find code paths that are most used, and compile them into optimized binary machine executable code on the fly. IIRC, LuaJIT can use integers for a variable that actually has integer values.
What I mean by "branching is costly" is about very low-level stuff: CPU pipeline. Modern CPUs can do pipelines, and execute several sequential operations at the same time. A mis-predicted branch leads to wasted efforts done in the pipeline.
I am not sure how Lua code would be executed via low-level machine code. It could still have an impact. I think we can find out by benchmarks, which I won't bother to do.
Lua is an interpreted language running in a virtual machine.
While the Lua bytecode could be optimized using LuaJIT it's still running in a VM.
The VM alone adds all sorts of overhead.
Yes, it's compiled to branches (checked with -jdump). However I'm having a hard trying to imagine a situation where the performance of a game critically depends on the performance of the sgn function.
But if you really want to avoid branches, when your numbers are always integral (as is often the case) and you don't care about special cases, you can just clamp:
Coming back to say that a single failed branch prediction would entirely eclipse any gains you'd have from a more efficient math. An FPU math operation is 1-4 cycles, a mispredicted branch is 10-20 cycles. Also try to avoid branches unless they're mostly going in the same direction, just in general.
pgimeno wrote: ↑Sat Feb 10, 2018 7:50 pmHowever I'm having a hard trying to imagine a situation where the performance of a game critically depends on the performance of the sgn function
Oh that's easy. You have a tight loop that runs thousands of times per frame and does some very basic math that's only a few cycles, but some variables require flipping depending on sign of other variables, i.e. you do
branch 0.6830 0.6939 0.6866 0.6846
absdiv 0.1585 0.1597 0.1595 0.1596
clamp 0.0815 0.0842 0.0822 0.0819
dummy 0.0805 0.0815 0.0807 0.0806
local function branch ( n )
return n < 0 and -1 or n > 0 and 1 or 0
end
local function absdiv ( n )
return n == 0 and 0 or n / abs ( n )
end
local function clamp ( n )
return max ( min ( n * inf, 1 ), -1 )
end
local function dummy ( n )
return n
end
The clamp variant is as cheap as functions come. LuaJIT even automatically puts it inline, there's no actual function call.