Bad Apple!! in Love2D

Segfaultd · Post by **Segfaultd** » Fri Aug 26, 2016 3:54 am

Hello everyone!

As some people may know, there is an video on the Internet called Bad Apple!!. This video is interesting because it is entirely in black and white, so it can be compressed fairly efficiently. That made some people run it on some fairly underpowered systems, like the Sega Genesis, the NES and even the Atari 2600!

That said, I've created a simple video decoder for Love2D, that plays this video. The goal was to try and make the video as small as possible. I know almost nothing about video and audio compression, so I used a very very very simple compression scheme.

Encoding

The player runs the video at 64x64 resolution, monochrome at 30fps. I've created a (very primitive) video encoder that does the following:
- Resizes the video to 64x64 and dumps all frames to a folder (using the awesome FFMpeg)
- Converts all frames to monochrome (the video file is actually in grayscale; it gets converted to pure black and white)
- For each frame, do:
-- Find the x and y coordinates of all the pixels that are different (black to white or white to black) from this frame to the other
-- Dump those coordinates to a file
-- Put a frame delimiter
- Compress the file using love.math.compress (zlib)

I limited the resolution of the video in order to use only printable ASCII characters on the compression; I was having problems reading the data from the file

.
This results in a file like this (before compression):

Code: Select all

###XYXYXYXYXY#XYXYXY#XYXY#XYXYXYXYXY#

Where # is the frame delimiter, X and Y the position of the pixel .

The audio is simply a highly compressed MP3 file playing on the background. That makes the video to go off-sync with the audio.

Playing

The player code is extremely simple:

Code: Select all

-- index is theindex of the current byte being read by the file
function love.update(dt)
  value = string.byte(framedata, index) -- get the value of current byte
  if value == 32 then                   -- if it is 32 (frame delimiter), simply skip 
    index = index + 1                   -- (no changes between frames)
  end
  
  while value ~= 32 do                            -- else, while value isn't the frame delimiter
    local i = string.byte(framedata, index) - 62  -- get x position of changed pixel
    index = index + 1
    local j = string.byte(framedata, index) - 62  -- get y position of changed pixel
    index = index + 1
    if frame[i][j] == 1 then                      -- swap the pixel on the "framebuffer"
      frame[i][j] = 0                             -- (black to white or white to black)
    else 
      frame[i][j] = 1 
    end
    value = string.byte(framedata, index)         -- get next byte
  end
end

Here's a video of it running.

There are probably lots of better ways to compress this video. I'd love to see another implementations of this!

Thanks for reading!

Sulunia · Post by **Sulunia** » Fri Aug 26, 2016 4:43 am

You could just throw away the delimiter and just check if the next pixel changed was on a X and Y position smaller than the last changed pixel. This would mean it's a new frame information, so you could break your decoding loop and save a few bytes of space.

Lemme see if i can cook something up here.

@edit

Done. I won't post it yet till i figure out a way to compress the output file, since it's HUGE (27mb). But it runs smoothly 30FPS 255x255 res.

File format is:

Code: Select all

XYXYXYXYXYXYXYXYXYXYXYXYXY...

Where X and Y are a byte stored in a text file.

The encoder was written using C#. Writing an encoder in Lua sounds really tough, tbh.

If a previous frame is exactly the same as the next frame, then the very last pixel (255,255) will be flipped, which means that for every frame at least one exact pixel must change.
Pic for now:

zorg · Post by **zorg** » Fri Aug 26, 2016 7:50 am

I think i will try to tackle this myself as well, though with a few differences:

gonna try and render the video at a resolution of 128x128 instead
for each group of four pixels, besides full black and white, we can have 3 shades of gray (WB pixel count of 0-4, 1-3, 2-2, 3-1, 4-0) for a crude anti-aliasing job (bringing the resolution back down to 64x64)
use delta encoding, storing bit patterns (either fixed 3bit or huffman encoded 1-6bits [0,4,1,2,3,-1,-2,-3]:[0,11,1001,1010,1011,100000,10001,100001], whichever's smaller) for each pixel; maybe even do this per-frame, with one bit at the start of each, defining which method we used...
run-length-encode the file (actually experiment with this and the neat BWT algorithm)
forego the frame delimiter, as sulunia said, since all frames would have the same pixelcount anyway... and in any case, decoding would reconstruct it... maybe store one global blocksize as one byte at the start of the file so we know how much to decode at once?
use all bits in a byte when serializing the data to a file, since we can read in binary data with the appropriate flag set, so it'd be a waste of space not to.
Have an extra input file where i can define rows with varying widths and colors, just for tinting various frames based on characters and scenes.
Use FFI c arrays internally, since tables may be slower.
Either go even lower, with mono 8000kHz sampling rate and 8 kbps bitrate, or as an aside, with a bit of tweaking, i could use something like this in showcasing my realtime sound generation efforts, meaning i'd "re-compose" the tune myself. (So no need to store the music, just generate it realtime) Will be fun though

Sulunia · Post by **Sulunia** » Fri Aug 26, 2016 3:30 pm

zorg wrote: *Either go even lower, with mono 8000kHz sampling rate and 8 kbps bitrate, or as an aside, with a bit of tweaking, i could use something like this in showcasing my realtime sound generation efforts, meaning i'd "re-compose" the tune myself. (So no need to store the music, just generate it realtime) Will be fun though

Psst... I just found this yesterday!
Hyping to see some generated audio

zorg · Post by **zorg** » Fri Aug 26, 2016 4:47 pm

Sulunia wrote:
zorg wrote:*Either go even lower, with mono 8000kHz sampling rate and 8 kbps bitrate, or as an aside, with a bit of tweaking, i could use something like this in showcasing my realtime sound generation efforts, meaning i'd "re-compose" the tune myself. (So no need to store the music, just generate it realtime) Will be fun though
Psst... I just found this yesterday!
Hyping to see some generated audio

Thanks! Usually i prefer to just notate it by ear, but in this case, since i'll need to convert it to a custom (minimal-ish) format, i might as well cheat a bit

Truth be told, i wanted to use the original NEC PC-98 MML code, but it might have been pointlessly hard.
In any case, i was thinking that i'd only need to have a sampler for this. Even if it would raise the filesize a bit, it's still easier to code in than pitch bending support, envelopes and whatnot.
(Also i already coded down the encoder as pseudo lua-ish code, just need to get home tomorrow to test it

)
(Segfaultd: You did encourage others to implement things their way, hope you're happy

)

Edit: @Segfaultd, actually, could you give a (youtube or something) link to the video you used? Just so i can render out frames from the same source.

Segfaultd · Post by **Segfaultd** » Fri Aug 26, 2016 6:26 pm

Sulunia wrote: If a previous frame is exactly the same as the next frame, then the very last pixel (255,255) will be flipped, which means that for every frame at least one exact pixel must change.

Nice delimiter logic! Maybe you can check the surrounding pixels and paint it the same color (only when drawing; your array would still have the right value).

zorg wrote:I think i will try to tackle this myself as well, though with a few differences:
gonna try and render the video at a resolution of 128x128 instead

for each group of four pixels, besides full black and white, we can have 3 shades of gray (WB pixel count of 0-4, 1-3, 2-2, 3-1, 4-0) for a crude anti-aliasing job (bringing the resolution back down to 64x64)

use delta encoding, storing bit patterns (either fixed 3bit or huffman encoded 1-6bits [0,4,1,2,3,-1,-2,-3]:[0,11,1001,1010,1011,100000,10001,100001], whichever's smaller) for each pixel; maybe even do this per-frame, with one bit at the start of each, defining which method we used...

run-length-encode the file (actually experiment with this and the neat BWT algorithm)

forego the frame delimiter, as sulunia said, since all frames would have the same pixelcount anyway... and in any case, decoding would reconstruct it... maybe store one global blocksize as one byte at the start of the file so we know how much to decode at once?

use all bits in a byte when serializing the data to a file, since we can read in binary data with the appropriate flag set, so it'd be a waste of space not to.

Have an extra input file where i can define rows with varying widths and colors, just for tinting various frames based on characters and scenes.

Use FFI c arrays internally, since tables may be slower.

Either go even lower, with mono 8000kHz sampling rate and 8 kbps bitrate, or as an aside, with a bit of tweaking, i could use something like this in showcasing my realtime sound generation efforts, meaning i'd "re-compose" the tune myself. (So no need to store the music, just generate it realtime) Will be fun though

Lots of great ideas there! Didn't knew about the BWT algorithm, that looks very nice. I thought something about encoding the video in 2x2 blocks; because there can by only 16 permutations of them, maybe creating a "dictionary" of possible blocks could reduce the file size.
About the audio, I'm still trying to find a way to put the audio data in the same file as the video frames, but I guess that would be very hard to me.

zorg wrote: (Segfaultd: You did encourage others to implement things their way, hope you're happy )

My job is done here!

zorg · Post by **zorg** » Fri Aug 26, 2016 7:02 pm

To be honest, i'd either just dump the audio (if i were to use a streaming format like mp3, as well) to the end, or not combine it at all, since audio and "stream of images" both compress wildly differently, so it would probably make the file not compress that well as a whole.

As for 2x2 matrices, i'm not really retaining any "shape" information, just "density", that's why 6 bits is enough; for 4 "dice" with 2 "faces", you can have the following sums: 0,1,2,3,4; let's say zero being the blackest, 4 the whitest, and the other 3 being 25%, 50% and 75% grey. This is how my antialiasing would work, just blending with a 2x2 kernel.

Also, i think i worked out my final file representation; imho the hardest part will be the bit manipulating, since it needs to mess with data across byte boundaries (constant width encoding uses 3bits per pixel, for example, so that means "aaabbbcc cdddeeef ffggghhh" in 3 bytes translate to 8 pixels; manipulating that tidily from code can be difficult. The other thing would be whether to compress per-frame, or just the whole file once, at the end. The latter is more beneficial if we can load the whole thing uncompressed into memory.

The spec:

Code: Select all

-- The video file's layout (uncompressed)

	-- 64 bits: A double value denoting the exact FPS the video should be played back at. (For a one-shot thing, this could be hardcoded)

	-- 64 bits: A double value containing how many frames the video has. (similarly as the above)

	--  * bits: All the frames follow, using the below format(s).

-- One frame's layout in the encoded "video" file

	-- 1 bit:
		-- 0 - delta frame (differences are stored using one of the below schemes)
		-- 1 - key frame (absolute values are stored: [0,1,2,3,4]->[000,001,010,011,111])
		-- Note: The first frame always has to be a keyframe.

	-- 1 bit:
		-- 0 - data is huffman encoded ([0,4,1,2,3,-1,-2,-3]->[0,11,1001,1010,1011,100000,10001,100001])
		--     size is constrained to [4096,24576] bits, or [512,3072] bytes, or [0.5,3] kilobytes.
		-- 1 - data is constant width  ([0,4,1,2,3,-1,-2,-3]->[000,111,001,010,011,100,101,110])
		--     size is always 12288 bits, or 1536 bytes, or 1.5 kilobytes.

	-- 1 bit:
		-- 0 - data is coded LtR in inner loop, and TtB in outer loop.
		-- 1 - data is coded TtB in inner loop, and LtR in outer loop.
		-- Note: This may only have any effect on the BWT and compression.

	-- 1 bit:
		-- 0 - data is not transformed by the BWT.
		-- 1 - data is transformed by the BWT.
		-- Note: Calculating a BWT of size 4096 may be slow, but thankfully encoding speed doesn't matter. The representation may need to be padded, and interestingly, the max size is not 4096, but 3072 bytes, though this doesn't modify the required 12 bits for the row pointer; no extra info is needed to be retained.

	-- If the last bit was 1, then 12 more bits follow, denoting which "row" we need to use to get back our original data with the inverse BWT.

	-- * bits - the actual frame data; when decoding, keeping count is needed to detect whether we have processed 64x64 	pixels or not.

-- Some stats about the uncompressed representation

	-- If we used constant width representation for all frames, the total size for a 1 minute 60 FPS video would be 5.4 MB.
	-- If we used huffman encoding, the size would be between 1.8 MB and 10.8 MB.
	-- Thing is, compression would probably work very well on these representations, and the BWT may also help, meaning that in most cases, the sizes should be reduced drastically.

Also, the weights used for the huffman encoding assumes that there will be loads of pure black and pure white pixels, while the middle values will be sparse; and this will be true, since those will only exist at the edges between those two main colours.

As for actually tinting the thing, i was thinking about having a separate file, with framecount * 64 color triplets, that can be used to set the pixels of ImageDatas with dimensions of 1x64, then uploading that to the GPU to be used with a simple shader that uses said 1D image to color each "row" of pixels to what was defined.

Hand-creating the table would be a chore, but i could also export the frames from a color/model version of the video, and average all the columns per row to get a useable value.

Segfaultd · Post by **Segfaultd** » Sat Aug 27, 2016 1:45 am

zorg wrote: Edit: @Segfaultd, actually, could you give a (youtube or something) link to the video you used? Just so i can render out frames from the same source.

Here: https://www.youtube.com/watch?v=G3C-VevI36s

zorg · Post by **zorg** » Sat Aug 27, 2016 6:52 pm

Thanks! Though i think i found a slightly better version and ended up using that.

Second issue i had was that the video doesn't have an aspect ratio of 1:1, so i needed to adjust the size (cropped 22 pixels off each side after resizing to 172x128, close enough) Then again, i might make my enc/dec work with arbitrary sizes, and test the video out with various sizes (160x120, 120x120, 128x128)

Third issue is that i couldn't set the output pixel format to monob (that would have been neat) so i settled with the default bgr24, and i'll just clamp inside my encoder.

Also, i think i will forego that colorization idea for the time being.

Results coming soon

(this time, without a tm)

Sulunia · Post by **Sulunia** » Sat Aug 27, 2016 7:51 pm

I'm so inclined on implementing a partial IT tracker on my decoder... But love already plays IT files! Should i just bite the bullet and do it for the sake of "do everything yourself"? I think it's better to parse an IT file instead of a full blown MIDI one. Besides the fact i always wanted to try implementing a tracker player by hand.

Thoughts?
IT file attached in case it's useful to someone. Notice the samples were made in a bit of hurry, so they sound a bit off besides being probably too big.

@edit
Remade the samples and fixed the looping points. Added triangle wave for bass channel. Tracker size went from 269.076 Bytes to 50.635 Bytes! Look for post up ahead.

Bad Apple!! in Love2D

Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Re: Bad Apple!! in Love2D

Who is online