Class Best Practices

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
Jaston
Prole
Posts: 18
Joined: Sun Nov 25, 2018 5:43 pm

Class Best Practices

Post by Jaston »

Hi Everyone,

In my game I have 1000's of zombies that move around the level. I want to push as many as I can on screen as possible. I realize since I'm new to lua I might not be doing things as efficiently as possible. This has led me to the following clarifying questions that I need help on:

The below is how class objects are setup in my game.
a) Is it a problem that none of the supporting functions are local?
b) Is it bad to always use self.x or self.y etc when referring to this object's variables in the update and render functions?
c) Should I put in the Particle.lua file at the top local references of things like math.floor, or love.graphics.draw?
d) If I do c) will that increase the memory significantly for the 1000's of particle objects I make? Does every object have its own local reference of math.floor for etc, or does it share that one reference will all particle objects? Same question for its related functions. If I make 1000 particle objects is it make 1000's of copies of my update and render code in memory?
e) Currently, in my gameworld update function I loop through all these particle objects and go particle:update(dt) then do the same for rendering them. Is this the best practice or should I be doing things differently?
f) Should I put local floor = math.floor in every lua file just to make it faster? Will this make more problems as every module will have its own local copy of math.floor called floor?

Sorry for all the questions but I want to iron this out before my code gets any bigger than it is. Thanks :)

Code: Select all

--Particle.lua
Particle = Class{}

function Particle:create(x,y,width,height, lenLife)
  local this = {
    x = x,
    y = y,
    width = width,
    height = height,
    xOffSet = width/2,
    yOffSet = height/2,
    angle = 0, --in radiants
    animation = Animation:init(GAME_OBJECT_DEFS['cloud'].animations['walk left']),
    life = lenLife,
    timer = 0
  }

  setmetatable(this, self)
  
  return self
  
end

function Particle:update(dt)
  --whatever logic
  self.x = self.x + self.velX*dt
  self.x = self.y + self.velY*dt
end

function Particle:render(dt)

  --Draw the object to the screen
  love.graphics.draw(gTextures[curAnim.texture], gQuadFrames[curAnim.texture][curAnim:getCurrentFrame()],
   math.floor((self.x - cameraXPos) +self.xOffSet), math.floor((self.y - cameraYPos)+self.yOffSet),
   self.angle, 1, 1, self.xOffSet, self.yOffSet)

end
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Class Best Practices

Post by grump »

Jaston wrote: Sat Jan 26, 2019 12:54 am In my game I have 1000's of zombies that move around the level.
How fast is it currently? I'm asking because most of your questions are concerned with micro-optimizations that might gain you a few percent at best. They won't salvage a game that runs very slowly already.
a) Is it a problem that none of the supporting functions are local?
What do mean by 'supporting functions'? See answers c and f.
b) Is it bad to always use self.x or self.y etc when referring to this object's variables in the update and render functions?
I don't think so. JIT takes care of that.
c) Should I put in the Particle.lua file at the top local references of things like math.floor, or love.graphics.draw?
No. You can, however, put math and love.graphics in the local scope as a minor optimization. Generally, accessing local variables is a bit faster than accessing globals.
d) If I do c) will that increase the memory significantly for the 1000's of particle objects I make? Does every object have its own local reference of math.floor for etc, or does it share that one reference will all particle objects? Same question for its related functions. If I make 1000 particle objects is it make 1000's of copies of my update and render code in memory?
No to both questions. Although your class definition looks a bit unusual, but that might be just me being tired af. What class implementation are you using?
e) Currently, in my gameworld update function I loop through all these particle objects and go particle:update(dt) then do the same for rendering them. Is this the best practice or should I be doing things differently?
That is the one thing on your list that has the potential for huge improvements.
Update only what needs to be updated, and draw only things that are on screen. Depending on your game, you might gain a lot by using techniques like spatial hashing.
Also make sure you use as few textures as possible, and try to minimize texture switches.
f) Should I put local floor = math.floor in every lua file just to make it faster? Will this make more problems as every module will have its own local copy of math.floor called floor?
See c. Pulling single functions into the local scope is a valid pattern for vanilla Lua, in LuaJIT it's better to localize the module, not single functions.
monkyyy
Citizen
Posts: 52
Joined: Fri Mar 16, 2012 5:29 pm

Re: Class Best Practices

Post by monkyyy »

Drink my design koolaid that I have been drinking

https://www.youtube.com/watch?v=rX0ItVEVjHc

The short version of this talk says, make simple loops that linearly go thru an array if you want something to go fast

If you need it to go faster:

https://www.youtube.com/watch?v=WDIkqP4JbkE

Poke around and find out how to fit your data into a cashe line; how you do that in lua is beyond me, how you do it in "luajit" is even futher beyond me; but thats the sort of thing you do
Jaston
Prole
Posts: 18
Joined: Sun Nov 25, 2018 5:43 pm

Re: Class Best Practices

Post by Jaston »

Thanks grump for detailed response. I will try to answer your questions to give you some context.
grump wrote: Sat Jan 26, 2019 2:57 am Jaston wrote: ↑Fri Jan 25, 2019 7:54 pm
In my game I have 1000's of zombies that move around the level.
How fast is it currently? I'm asking because most of your questions are concerned with micro-optimizations that might gain you a few percent at best. They won't salvage a game that runs very slowly already.
It runs between 24 fps to ~120 fps with 4500 zombies, the problem is it can slow down alot depending on the number of zombies in the screen and what is happening to them. My goal would be to ensure that I hit 60 at the lowest possible and remove any stuttering.
grump wrote: Sat Jan 26, 2019 2:57 am a) Is it a problem that none of the supporting functions are local?
What do mean by 'supporting functions'? See answers c and f.
Supporting functions are just the other functions in the class that the object class has. For example my bomb class will have a bunch of functions like:

Code: Select all

--Bomb.lua

function Bomb:create(x, y, width, height, objectType, id)
--code 
end

function Bomb:reset()
--code
end

function Bomb:update(dt)
--code
end

function Bomb:onTileCol(dt, colTileType) --what to do on tile colission
--code
end

--Launches a bomb based on the angle you are pointing
function Bomb:launchBomb()
end

--Actions to perform when object collides with explosion
function Bomb:onHitBoxCollision(obj, dmgAmount, pulseAmt)
end

function Bomb:render(dt)
--code
end
All of these functions are global. Is that a big deal or no? Does it even really matter for performance?
monkyyy wrote: Sat Jan 26, 2019 5:15 am d) If I do c) will that increase the memory significantly for the 1000's of particle objects I make? Does every object have its own local reference of math.floor for etc, or does it share that one reference will all particle objects? Same question for its related functions. If I make 1000 particle objects is it make 1000's of copies of my update and render code in memory?
No to both questions. Although your class definition looks a bit unusual, but that might be just me being tired af. What class implementation are you using?
I am using the hump class https://github.com/vrld/hump/blob/master/class.lua
grump wrote: Sat Jan 26, 2019 2:57 am e) Currently, in my gameworld update function I loop through all these particle objects and go particle:update(dt) then do the same for rendering them. Is this the best practice or should I be doing things differently?
That is the one thing on your list that has the potential for huge improvements.
Update only what needs to be updated, and draw only things that are on screen. Depending on your game, you might gain a lot by using techniques like spatial hashing.
Also make sure you use as few textures as possible, and try to minimize texture switches.
I currently leverage spatial partitioning, made two different methods (fixed grid spatial hash and quad tree). Quad tree spatial hash on seems to work well with clumped zombies but I swear that the garbage created from building and destroying the tree makes massive stutters. Fixed hash is fast with no stutters until the zombies clump.

I only do collisions on things near the player, only render objects that are near the camera. Only add objects to the quadtree that are near the camera too to save memory. I update offscreen zombies only once every 1/5 seconds to save CPU.

As for drawing I have 3, 1024 x 1024 textures and have ordered them to batch all the drawing calls in the proper order which has already increased performance a lot. I am always trying to squeeze more :).

For all of my loops I never use ipairs, I just always use numerical indexes with for i = 1, #gameObjects do gameObjects:update end

Do you or anyone else have an efficient sorting algorithm for z-sorting objects? The one I use does the job but I don't know if it is very slow as well. Mine uses function calls within function calls so I only run the sort at 30 fps to save cpu.

Another question I have is how can I truely profile my game if the profile is too slow to make my game test which components are the worst offenders when executing my game? For example, it will just drop to 1 fps when profiling, so I can't tell while executing certain parts of the game what is taxing the system because the game is not playable to get to that point. Am I going about profiling the wrong way?

grump wrote: Sat Jan 26, 2019 2:57 am in LuaJIT it's better to localize the module, not single functions.


I don't understand Lua enough to understand what this means. What is localizing the module? how is that different from localizing functions?

@monkyyy - I will take a look at these videos to see if it gives me any big ideas that will help. I made object pools for my enemies and particle systems and enemies to help perform this caching activity. I don't know if it actually helped by I like to think it did.
monkyyy
Citizen
Posts: 52
Joined: Fri Mar 16, 2012 5:29 pm

Re: Class Best Practices

Post by monkyyy »

>@monkyyy - I will take a look at these videos to see if it gives me any big ideas that will help. I made object pools for my enemies and particle systems and enemies to help perform this caching activity. I don't know if it actually helped by I like to think it did.

I don't know what hump does, but its fairly likely your not doing data oriented things when you use it like

>function Bomb:bombfunction()

Even c++ isn't smart enough to separate data from code sanely, always; random lua meta programing, eh I have my doubts that they are doing things well.

You want a "bomb controller" or something at the very least, so your bombs exist as a very tiny fragment of data when your doing big ugly loops

lets assume your class system is fucking up and when the bombs are collisioning with the zombies your running thru a double depth loop of

Code: Select all

for each zombies = {x,y, big fucking ai function} do
                for each bomb ={x,y,big fucking animation function} do
                	if close-ish(z.x,b.x) and close-ish(z.y,b.y) do ....
Or

Code: Select all

for each zombies = {x,y} do
                for each bomb ={x,y,big fucking table of particle} do
                	if close-ish(z.x,b.x) and close-ish(z.y,b.y) do ....
Or any number of things, if the data is generated in a way that its grouped in memory badly

Yes thats an n^2 loop its going to be a bit of work; but the problem isn't that its n^2. Its that extra bits is being loaded into L1 cashe which is only KILOBYTES; while the data you need is only likely 2 bytes and you could fit hundreds of them into cashe at the same time if your smart with the data; if something extra is slipping in, it could easily fill you l1 and l2 cashe with a few dozen bombs, rather then likely all of them.

99.999% of the time that if statement is going to fail, so the code you need to do, is tiny; being pessimistic, a dozen instuctions going over 1000 * 1000; is *nothing*, IF it fits cashe; nothing else really matters.

rather then do something fancy, try rewriting your code so the big loops access tiny data that you initialized to be grouped together

One of the things I've seen data oriented people people do is have a "hot" and "cold" datastuctures, that they tell/trick the compiler to group the hot arrays together, and they carefully manage the hot arrays, while having a cold array that can have bad things like uncompressed bools, random data thats used by only one process, etc. you could try that if your going to stick with oo. And then when you need an ugly loops you stick to "if hot_logic, then cold_logic" so 99.99999 of the run time is on pretty fast code
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Class Best Practices

Post by grump »

Jaston wrote: Sat Jan 26, 2019 8:52 pm Supporting functions are just the other functions in the class that the object class has. For example my bomb class will have a bunch of functions like:
[...]
All of these functions are global. Is that a big deal or no? Does it even really matter for performance?
They are not global. They are entries in the Bomb class table, therefore by definition not global.
It may be a little bit faster to write "private" functions not as member functions, but as functions local to the module/file they are used in.
I swear that the garbage created from building and destroying the tree makes massive stutters.
That's entirely possible. You could try to reduce the strain on the garbage collector by reusing objects, where possible. In some cases, it may be faster to clear and reuse existing tables instead of creating new ones all the time.
I update offscreen zombies only once every 1/5 seconds to save CPU.
How about stretching these updates over multiple frames instead? Partition them into n slices and update one slice per frame?
Do you or anyone else have an efficient sorting algorithm for z-sorting objects? The one I use does the job but I don't know if it is very slow as well. Mine uses function calls within function calls so I only run the sort at 30 fps to save cpu.
table.sort is not sufficient? It's (annoyingly) not a stable sort.
Maybe try binary insertion sort? Keeps your array sorted at all times, without having to iterate over all elements. Implementing the __lt metamethod for depth-sorted objects may result in a small perf boost too.
grump wrote: Sat Jan 26, 2019 2:57 am in LuaJIT it's better to localize the module, not single functions.
I don't understand Lua enough to understand what this means. What is localizing the module? how is that different from localizing functions?

Code: Select all

-- do this
local math = math
local graphics = love.graphics

-- don't do this
local floor = math.floor
local draw = love.graphics.draw
Take this tip with a grain of salt though, I might be wrong about this. I can't remember where I got it from, and can't find the source right now. I have a suspicion it applies to C modules only, not pure Lua modules.
Edit: yeah, ignore this. The performance guide linked below contradicts what I wrote. It's just for ffi modules, not regular code.

Maybe this could be useful. Would definitely help with cache locality.
Also, read this if you haven't already. And this.
User avatar
pgimeno
Party member
Posts: 3656
Joined: Sun Oct 18, 2015 2:58 pm

Re: Class Best Practices

Post by pgimeno »

grump wrote: Sun Jan 27, 2019 11:59 am In some cases, it may be faster to clear and reuse existing tables instead of creating new ones all the time.
Not just in some cases. In my experiments, clearing all elements of the table in a loop beat recreating the object every time. I couldn't get recreation to be faster. EDIT: Importantly, they were array tables. I haven't tried with tables with hash values.

I'm not that into LJ's internals but my guess is that as you keep adding elements to an existing table, reallocation must happen, and reallocation is slower than writing.

Apart from that, I wanted to mention that LJ uses mixed techniques, and while some sections will be traced and compiled, others will be interpreted all the time. Those that are compiled will probably be unaffected by whether a function is inside an object or in a local, but those that are interpreted will, so it's wise to avoid them as much as possible. Also, using numeric indices instead of string indices where possible will help. For example, have vectors use [1] and [2] Instead of .x and .y.
grump wrote: Sun Jan 27, 2019 11:59 amAlso, read this if you haven't already. And this.
These two pages are important. Functions that are NYI stop traces, therefore impede compilation and can slow things down significantly in the affected sections.

One minor thing:
Jaston wrote: Sat Jan 26, 2019 8:52 pm I just always use numerical indexes with for i = 1, #gameObjects do gameObjects[i]:update end
If you write the [i] without any guards, it will convert to italics all the text starting there. It would help everyone with readability if you could preview your post before sending, see if that happened, and fix it if so. What I did to avoid that effect was to write it like this: [[i][/i]i] (i.e. insert an open italics/close italics between the '[' and the 'i')... and you don't want to know what I had to write to explain what I did :D
Jaston
Prole
Posts: 18
Joined: Sun Nov 25, 2018 5:43 pm

Re: Class Best Practices

Post by Jaston »

@monkyyy - I watched the videos and understand at an extremely high level what they are trying to do. But are there any practical examples? Also how can I reduce the data needed to process each zombie to such a tiny level. Each one of them needs a bunch of variables currently in order to be updated and rendered etc.

I found a really good article that summarizes all the ideas you have that I thought I would share with others,as I found it helpful.
http://gameprogrammingpatterns.com/data-locality.html
pgimeno wrote: Sun Jan 27, 2019 12:35 pm Not just in some cases. In my experiments, clearing all elements of the table in a loop beat recreating the object every time. I couldn't get recreation to be faster. EDIT: Importantly, they were array tables. I haven't tried with tables with hash values.
When you are saying clearing, are you refering to looping through the list and selling each element to nil? Also would the speed still be the same if it is just an array with table references and no explicit key? (i.e. {#eA23dc, #dsf234, etc..})
pgimeno wrote: Sun Jan 27, 2019 12:35 pm grump wrote: ↑Sun Jan 27, 2019 6:59 am
Also, read this if you haven't already. And this.
These two pages are important. Functions that are NYI stop traces, therefore impede compilation and can slow things down significantly in the affected sections.
Those links had me lost. They use functions way beyond what I understand from lua. All can grasp is that this gets to the root functions of what the compiler and JIT does quick vs. interprets. And one could leverage these very specific techniques to ensure the unoptimized functions are avoided.
pgimeno wrote: Sun Jan 27, 2019 12:35 pm That's entirely possible. You could try to reduce the strain on the garbage collector by reusing objects, where possible. In some cases, it may be faster to clear and reuse existing tables instead of creating new ones all the time.
I considered doing this but with a quadtree a list/array is so dynamic. It could be a length or 0 one second and length of 400 the next. And from what I read, having to allocate more memory every time it reaches a new maximum wouldn't help the speed. So if I did this I would have to make every tree list very long so it would get a big chunk of memory at the beginning and wouldn't have to grow. I wondered about how to clear the list in the tree without it being garbage collected. My best guess was just track and manager the index of the last item in the list.
grump wrote: Sun Jan 27, 2019 11:59 am How about stretching these updates over multiple frames instead? Partition them into n slices and update one slice per frame?
That is a great idea and I am definitely going to do this.
grump wrote: Sun Jan 27, 2019 11:59 am table.sort is not sufficient? It's (annoyingly) not a stable sort.
Maybe try binary insertion sort? Keeps your array sorted at all times, without having to iterate over all elements. Implementing the __lt metamethod for depth-sorted objects may result in a small perf boost too.
What is this metamethod you are referring too?

pgimeno wrote: Sun Jan 27, 2019 12:35 pm Maybe this could be useful. Would definitely help with cache locality.
So I have to use C type array etc to fit my data into the cache most effectively and I can do that with this library?

Thank you all for your responses. I really appreciate it.
monkyyy
Citizen
Posts: 52
Joined: Fri Mar 16, 2012 5:29 pm

Re: Class Best Practices

Post by monkyyy »

>@monkyyy - I watched the videos and understand at an extremely high level what they are trying to do. But are there any practical examples? Also how can I reduce the data needed to process each zombie to such a tiny level. Each one of them needs a bunch of variables currently in order to be updated and rendered etc.

Without your code I don't know what your other variables are *shrug*

I would suggest that "bombs" are probably easier to make smaller.

Who's checking if they are touching who? As its the later who matters more.

i.e. try

for zombies for bombs do; rather then for bombs for zombies do; then get your bomb data small..... maybe
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Class Best Practices

Post by grump »

Jaston wrote: Tue Jan 29, 2019 5:17 am When you are saying clearing, are you refering to looping through the list and selling each element to nil? Also would the speed still be the same if it is just an array with table references and no explicit key? (i.e. {#eA23dc, #dsf234, etc..})
That is the gist of it.
Those links had me lost. They use functions way beyond what I understand from lua. All can grasp is that this gets to the root functions of what the compiler and JIT does quick vs. interprets. And one could leverage these very specific techniques to ensure the unoptimized functions are avoided.
Basically, avoid library functions that are not compiled. The performance guide is pretty self-explanatory, ask specific questions if you don't understand something.
I considered doing this but with a quadtree a list/array is so dynamic. It could be a length or 0 one second and length of 400 the next. And from what I read, having to allocate more memory every time it reaches a new maximum wouldn't help the speed. So if I did this I would have to make every tree list very long so it would get a big chunk of memory at the beginning and wouldn't have to grow. I wondered about how to clear the list in the tree without it being garbage collected. My best guess was just track and manager the index of the last item in the list.
Have a pool of empty tables that you can pull from. Once a table is not needed anymore, set all entries in it to nil and put it back into the pool. The point is to avoid creation of new tables. Some experimentation is required. You won't know for sure if it helps until you tried and measured it.
What is this metamethod you are referring too?
Instead of doing this:

Code: Select all

table.sort(zombies, function(a, b) return a.zindex < b.zindex end)
Do this:

Code: Select all

function Zombie:__lt(other)
    return self.zindex < other.zindex
end

table.sort(zombies)
According to the performance guide:
Avoid inventing your own dispatch mechanisms.
Prefer to use built-in mechanisms, e.g. metamethods.
, this may help get better performance.
So I have to use C type array etc to fit my data into the cache most effectively and I can do that with this library?
The thought behind it is that C structs are more compact than Lua tables, and you can preallocate them. It may prove difficult to implement, because C types are pretty restrictive.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 1 guest