Reading binary files quickly.

General discussion about LÖVE, Lua, game development, puns, and unicorns.
User avatar
zorg
Party member
Posts: 3470
Joined: Thu Dec 13, 2012 2:55 pm
Location: Absurdistan, Hungary
Contact:

Re: Reading binary files quickly.

Post by zorg »

ingsoc451 wrote: Sun Mar 03, 2019 5:31 pm Prior to running your game, you can export the bin files to a format that is easier/faster to parse by your game (like a (compressed) lua table). This is a one time processing. But if you are not able to do it for reasons, then using a compressed lua table would be of no use.
Then you can write a c/c++ module to the parsing of legacy format
To my understanding, the reasons they were using binary files were: "It's an old game they're trying to port" so they have no other option.
Then, instead of needing to touch c/c++ at all, if they don't want to, they can just use what has already been suggested, either love.data.unpack or moonblob.
Me and my stuff :3True Neutral Aspirant. Why, yes, i do indeed enjoy sarcastically correcting others when they make the most blatant of spelling mistakes. No bullying or trolling the innocent tho.
gradualgames
Prole
Posts: 9
Joined: Thu Feb 21, 2019 2:55 am

Re: Reading binary files quickly.

Post by gradualgames »

Thanks for the responses. I've been looking at the documentation for unpack and moonblob, but it is not clear to me that I could use these without modification. For instance, this file format contains strings which have a uShort header (2 bytes) followed by the characters of the strings. I couldn't infer from the unpack or moonblob documentation if the header for a string was a ushort or a normal 32 bit integer.

"cn: a fixed-sized string with n bytes"

What is that c? a 2 byte integer? a 4 byte integer? The documentation is unclear. People do say rtfm a lot but people don't say wtfm enough I think.

So far I have been writing my own parser for the data that just advances an offset through a string containing the file's data. I may wind up sticking with this approach for maximum control and minimum dependencies.

Not sure yet how to parse floating point numbers though, that will be interesting.
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Reading binary files quickly.

Post by grump »

gradualgames wrote: Mon Mar 04, 2019 8:17 pm For instance, this file format contains strings which have a uShort header (2 bytes) followed by the characters of the strings. I couldn't infer from the unpack or moonblob documentation if the header for a string was a ushort or a normal 32 bit integer.

Code: Select all

local reader = BlobReader(data)
local len = reader:u16()
local string = reader:raw(len)
gradualgames
Prole
Posts: 9
Joined: Thu Feb 21, 2019 2:55 am

Re: Reading binary files quickly.

Post by gradualgames »

grump wrote: Mon Mar 04, 2019 8:50 pm
gradualgames wrote: Mon Mar 04, 2019 8:17 pm For instance, this file format contains strings which have a uShort header (2 bytes) followed by the characters of the strings. I couldn't infer from the unpack or moonblob documentation if the header for a string was a ushort or a normal 32 bit integer.

Code: Select all

local reader = BlobReader(data)
local len = reader:u16()
local string = reader:raw(len)
Good idea :) Thank you. I actually almost just dug into the moonblob code to learn how to parse floating point numbers because I think I'm almost through all the types I need to parse in my own mini binary parser.
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Reading binary files quickly.

Post by grump »

gradualgames wrote: Mon Mar 04, 2019 8:56 pm I actually almost just dug into the moonblob code to learn how to parse floating point numbers because I think I'm almost through all the types I need to parse in my own mini binary parser.
BlobReader:f32 reads 32 bits floating point numbers. BlobReader:f64 reads 64 bits floating point numbers.
Here is the documentation for BlobReader.

moonblob makes heavy use of ffi cdata types, so there is no actual "parsing" involved. It reads 4 or 8 bytes and interprets those bits as a floating point number by using "type punning".

love.data.unpack provides identifiers for floating point numbers, but you can't be sure their size matches the actual size of your data. It depends on the platform.
gradualgames
Prole
Posts: 9
Joined: Thu Feb 21, 2019 2:55 am

Re: Reading binary files quickly.

Post by gradualgames »

grump wrote: Mon Mar 04, 2019 9:02 pm
gradualgames wrote: Mon Mar 04, 2019 8:56 pm I actually almost just dug into the moonblob code to learn how to parse floating point numbers because I think I'm almost through all the types I need to parse in my own mini binary parser.
BlobReader:f32 reads 32 bits floating point numbers. BlobReader:f64 reads 64 bits floating point numbers.
Here is the documentation for BlobReader.

moonblob makes heavy use of ffi cdata types, so there is no actual "parsing" involved. It reads 4 or 8 bytes and interprets those bits as a floating point number by using "type punning".

love.data.unpack provides identifiers for floating point numbers, but you can't be sure their size matches the actual size of your data. It depends on the platform.
Thanks; I'll give moonblob a try rather than continuing to roll my own. :awesome:
User avatar
pgimeno
Party member
Posts: 3689
Joined: Sun Oct 18, 2015 2:58 pm

Re: Reading binary files quickly.

Post by pgimeno »

Code: Select all

print(love.data.unpack('<s2s2', '\005\000ABCDE\002\000FG')) -- prints "ABCDE    FG     12"
print(love.data.unpack('<c5c2', 'ABCDEFG')) -- prints "ABCDE    FG      8"
print(love.data.unpack('<i2i4s2I2', '\254\255\001\000\000\000\018\000STRING OF 18 BYTES\254\255')) --prints:
-- -2	1	STRING OF 18 BYTES	65534	29

-- Example: Decode a TGA header - http://www.paulbourke.net/dataformats/tga/
local idlen, cmaptype, imgtype, cmapstart, cmaplength, cmapbits, xorigin, yorigin, xsize, ysize, pixelbits, flags =
  love.data.unpack('<BBBI2I2BI2I2I2I2BB', TGA_file)
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Reading binary files quickly.

Post by grump »

The thing with unpack is: if your data is more complex and you can't use static format strings anymore, it starts getting ugly pretty fast.

Parse header with offsets and size information, seek to offsets of each chunk you're interested in, read each chunk of variable sizes and formats. You have to do lots of string slicing and formatting/concatenation for that, and it results in slow code that's hard to read.
User avatar
pgimeno
Party member
Posts: 3689
Joined: Sun Oct 18, 2015 2:58 pm

Re: Reading binary files quickly.

Post by pgimeno »

grump wrote: Wed Mar 06, 2019 5:09 amYou have to do lots of string slicing and formatting/concatenation for that, and it results in slow code that's hard to read.
unpack accepts an offset parameter. Wouldn't that obviate the need of slicing?

Where would you need the slicing/formatting/concatenation with unpack which you don't with other methods? Could you give an example?
grump
Party member
Posts: 947
Joined: Sat Jul 22, 2017 7:43 pm

Re: Reading binary files quickly.

Post by grump »

pgimeno wrote: Wed Mar 06, 2019 12:51 pm unpack accepts an offset parameter. Wouldn't that obviate the need of slicing?
Ah, you're right, my bad. No slicing required then.
Where would you need the slicing/formatting/concatenation with unpack which you don't with other methods? Could you give an example?
After thinking about it for a bit, concatenation may not always be required, but...
Consider this simple structure:

Code: Select all

{
	uint16_t len
	uint16_t data[len]
}
in a file with 1,000,000 records of this type.

With love.data.unpack:

Code: Select all

local data = "\x10\x0000112233445566778899aabbccddeeff"
local result = {}
for i = 1, 1e6 do
	local len = love.data.unpack('<H', data)
	result = { love.data.unpack('<' .. ('H'):rep(len), data, 3) }
	result[#result] = nil -- unpack returns an additional value
end
Runtime: 0.65s

But we can get rid of concatenation and also the table:

Code: Select all

local data = "\x10\x0000112233445566778899aabbccddeeff"
local result = {}
for i = 1, 1e6 do
	local len = love.data.unpack('<H', data)
	for i = 1, len do
		result[i] = love.data.unpack('<H', data, i * 2 + 1)
	end
end
Runtime: 1.39s

The concatenation is gone, but now it takes more than twice the time to complete. With more complex data structures and more calls to unpack, this may quickly become considerable.

moonblob:

Code: Select all

local data = "\x10\x0000112233445566778899aabbccddeeff"
local r = BlobReader(data, '<')
local result = {}
for i = 1, 1e6 do
	r:rewind():array('u16', r:u16(), result)
end
Runtime: 0.13s

moonblob is 5x-10x faster and the code is a lot more readable imho - more succint, no fiddling with strings and no strange looking format identifiers that you have to look up to understand their meaning (except the eyesore that is the endianess specifier).

You'll probably come up with an ingenious solution using unpack that proves me utterly wrong :) I can't believe there's no way to tell unpack to parse n values at once.
Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot] and 7 guests