[Solved]How to sort data (when table.sort cannot be used)?

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
qwdqwqwffqw
Prole
Posts: 33
Joined: Sat Jan 25, 2020 4:11 pm

[Solved]How to sort data (when table.sort cannot be used)?

Post by qwdqwqwffqw »

A simple example :
ID | Age | Height(cm) | Weight(kg)
001 | 32 | 175 | 70
002 | 24 | 182 | 82
(...)
220 | 62 | 168 | 65
If each record is a lua table (e.g {1, 32, 175, 70}), we can just use table.sort.
However, this is inefficient if number of records become huge. To reduce memory size :

1) Each field is a lua table. Then we only need three long lua sequences.
db[field][id] == value
db["Age"][1] == 32
db["Weight"][220] = 65
...

2) C Array/Struct
ID(4byte)-Age(1byte)-Height(1byte)-Weight(1byte)...

For these methods, inserting and removing records were not that difficult - but I am having a trouble in sorting them as there is no provided standard function. As a half measure, I am currently using a temporary sequence of ID :

Code: Select all

local function sort(a, b)
    return db[field][a] > db[field][b]
end
local tmp = {1, 2, 3, ..., 220}
table.sort(tmp, sort)
newDB = clone(DB) -- copy same structure
for i=1, #tmp do
    newDB:set(i, tmp[i], db["Age"][tmp[i]], db["Height"][tmp[i]], ...) -- db:set(index, id, field1, field2, ...)
end
DB = newDB
As it seems to be a common problem, I speculate that there are more proficient ways to handle this. What would you recommend?
Always thankful for your help, forum!
Last edited by qwdqwqwffqw on Thu Oct 31, 2024 7:41 pm, edited 1 time in total.
User avatar
marclurr
Party member
Posts: 158
Joined: Fri Apr 22, 2022 9:25 am

Re: How to sort data (when table.sort cannot be used)?

Post by marclurr »

Your only option if you want to continue with your database (which looks to me a bit like a "struct of arrays"), is to write a sort function that understands the structure of your database and can swap all required fields. This wouldn't be too hard, there are quicksort implementations that are more or less copy-pastable on the internet, you'd just need to integrate the code that understands your database structure.

I'd be interested to hear how much memory you've managed to save going down this route, especially given the extra book keeping and cloning required for sorting. If you haven't measured the difference, I'd do that first and perhaps revert to the simpler approach if the numbers don't heavily favour the struct of arrays style.
qwdqwqwffqw
Prole
Posts: 33
Joined: Sat Jan 25, 2020 4:11 pm

Re: How to sort data (when table.sort cannot be used)?

Post by qwdqwqwffqw »

I completely agree with you that it isn't worth using such data structures for fairly small memory gain, if I have to manually code for each different cases or to clone database everytime when they are sorted.

However, if there is any simple, effortless and universal way to sort field(column)-based 2d tables(just like table.sort for row-based 2d tables or 1d lists), I would be glad to know it!
User avatar
pgimeno
Party member
Posts: 3672
Joined: Sun Oct 18, 2015 2:58 pm

Re: How to sort data (when table.sort cannot be used)?

Post by pgimeno »

I'm not aware of any better or more direct method. But I don't understand the difference between db and DB in your code, nor the arguments of DB:set() (especially the first two), so I can't say if it can be improved. To me it seems that this might work, assuming you only have DB and not db (replacing all the code under and including the line that says `newDB = clone(DB)`):

Code: Select all

newDB = {}
for i = 1, #tmp do
  newDB.Age[i] = DB.Age[tmp[i]]
  newDB.Height[i] = DB.Height[tmp[i]]
  ...
end
DB = newDB
I like how your storage method saves allocations and reduces garbage generation. That's a plus to me, regardless of speed.
qwdqwqwffqw
Prole
Posts: 33
Joined: Sat Jan 25, 2020 4:11 pm

Re: How to sort data (when table.sort cannot be used)?

Post by qwdqwqwffqw »

Yes, db and DB sould be same. Sorry for messy code; I tried to show pseudo code that includes C data as DB as well, plus possible wrapper functions(set). Your code correction is right.

For additional speed one could use table.new for each field when making clone, as size is known and many rehashings can be avoided in this case.
User avatar
pgimeno
Party member
Posts: 3672
Joined: Sun Oct 18, 2015 2:58 pm

Re: How to sort data (when table.sort cannot be used)?

Post by pgimeno »

Alas, there's no table.new - wish you could do that. Maybe you're thinking about Luau's table.create?

Anyway I made a mistake, I forgot to create the individual tables. It should have been:

Code: Select all

newDB = {Age = {}, Height = {}, ...}
qwdqwqwffqw
Prole
Posts: 33
Joined: Sat Jan 25, 2020 4:11 pm

Re: How to sort data (when table.sort cannot be used)?

Post by qwdqwqwffqw »

I learned that, for C array of structs, I can just use qsort in stdlib through ffi.

Code: Select all

local ffi = require "ffi"
ffi.cdef[[
	void qsort(void *base, size_t, size_t, int (*)(void *, void *));
	typedef struct { unsigned int id; unsigned char age, height, weight; } row;
]]
local n = 4 -- number of DB items
local size = ffi.sizeof("row") -- needed byte size for item
local db = ffi.cast("row*",  love.data.newByteData(size*n):getFFIPointer())

db[0].id, db[0].age, db[0].height, db[0].weight = 1, 32, 175, 70
db[1].id, db[1].age, db[1].height, db[1].weight = 2, 24, 182, 82
db[2].id, db[2].age, db[2].height, db[2].weight = 3, 42, 152, 58
db[3].id, db[3].age, db[3].height, db[3].weight = 100, 62, 168, 72

local function comp_age(a, b) return a.age - b.age end
local function comp_height(a, b) return a.height - b.height end
local function comp_weight(a, b) return a.weight - b.weight end

local function show()
	for i=0, n-1 do
		local row = db[i]
		print(row.id, " | ", row.age.." yo", " | ", row.height.." cm", " | ",  row.weight.." kg")
	end
	print(string.rep("=", 20))
end

show()
ffi.C.qsort(db, n, size, ffi.cast("int (*) (row*, row*)", comp_age)); show()
ffi.C.qsort(db, n, size, ffi.cast("int (*) (row*, row*)", comp_height)); show()
ffi.C.qsort(db, n, size, ffi.cast("int (*) (row*, row*)", comp_weight)); show()
I'm not familiar with C - so some of this code might be incorrect/unsafe (correction will be appreciated!), but it is showing desired result.

For column-based 2d array in lua, I still couldn't find any ready-made solution. So, as marclurr suggested, I should probably code cutstom quick sort function, that swaps values in other fields simultaneously. In that way, cloning table or tracking ID won't be necessary.

--------

Another advantage of column-based array is that, as values of a field is all of same data type, you can use specific container for each different fields. For text and complex data, we use lua table to hold strings and various objects; but for coordinates and simple values, we can use C array of char/bool/int. Other than slight advantages in memory, reading/writing in C array is noticeablely faster (with JIT on) and data serialization is simpler(saved file is also smaller).
For some cases, being able to selectively loading/saving needed fields of database is also useful.

--------
Alas, there's no table.new - wish you could do that. Maybe you're thinking about Luau's table.create?
No, I was referring to table.new function in luajit extension, which can also be used in love. What I tried to tell was :

Code: Select all

require "table.new"
newDB = {Age = table.new(n, 0), Height = table.new(n, 0), ...} -- n is number of items in db
RNavega
Party member
Posts: 385
Joined: Sun Aug 16, 2020 1:28 pm

Re: How to sort data (when table.sort cannot be used)?

Post by RNavega »

Considering the marshalling that would happen with using stdlib qsort through FFI, did you profile it and is it faster than your original table.sort method in the first post?
qwdqwqwffqw
Prole
Posts: 33
Joined: Sat Jan 25, 2020 4:11 pm

Re: How to sort data (when table.sort cannot be used)?

Post by qwdqwqwffqw »

You pointed out very well. Test(row consisting of 4 ints, 1M+ number of rows) shows that my first method(table.sort) takes only half the time, even with including time for cloning db and creating temporary table for indexing. So in terms of speed, qsort through ffi is not a better option.

Also, I noticed strange behavior of my code when n is bigger than a few thousands. Love shuts off(without error) or sorted DB shows bizarre value (not even integer). Using ffi.new() instead solved this issue.

Code: Select all

local n = 1000000 -- number of DB items
-- local db = ffi.cast("row*",  love.data.newByteData(size*n):getFFIPointer()) -- doesn't work in large numbers(why?)
local db = ffi.new("row[?]", n) -- this works
RNavega
Party member
Posts: 385
Joined: Sun Aug 16, 2020 1:28 pm

Re: How to sort data (when table.sort cannot be used)?

Post by RNavega »

That's interesting, thanks for testing.

There's an interesting repo here: https://github.com/DarkRoku12/lua_sort
It says that rewriting the sorting algorithm from table.sort (implemented in these lines here from LuaJIT), will make it get JIT'ed, leading to a faster implementation.
Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot] and 19 guests