Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (732)
Games in Android Showcase (222)
games submitted by our members
Games in WIP (806)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Compressing chunk of tile data?  (Read 791 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Posted 2017-06-23 17:34:01 »

I need to compress a 32x32 chunk of tile data. The data consists of 1 byte for the terrain type, and 1 short for a detail tile (second layer essentially). The terrain tiles are very continuous, while most detail tiles are 0 (null). What would a good compression method?

Here's my current idea:

1. Reorder data into z-curve order or Hilbert curve order to improve spatial coherency

2. Run length encode data using a special repeat byte/short value (say Byte/Short.MAX_VALUE) followed by number of repetitions, like this:
 baaaaab ---> ba*5b

3. Compress data using a precomputed Hilbert code. The idea is to precompute several Hilbert trees from statistical data from a lot of worlds based on the two most common tiles in the chunk, so there'll be one tree for each biome essentially. The chunk encodes a byte for which tree to use, then the encoded data. The RLE symbol is encoded in the tree, but the RLE count could be encoded using a different huffman code tree specifically for RLE counts.

My question is: Would this beat the DEFLATE compression I'm currently using? Is it even worth trying to implement this? I feel like DEFLATE should have problems compressing small chunks of data like this as it tries to learn about the data as it goes. Also, would it make sense to use delta encoding here?

Offline KaiHH

JGO Kernel

Medals: 429

« Reply #1 - Posted 2017-06-23 18:08:25 »

Make it a contest here! Smiley

Provide 4-5 example "byte arrays" of your tile chunk data and maybe a simple test fixture that calls a user-implemented compress(InputStream in, OutputStream out) and prints the compression ratio and then calls the other user-implemented decompress(InputStream is, OutputStream out) to validate with the initial data.
Offline theagentd
« Reply #2 - Posted 2017-06-23 18:43:18 »

That's actually a pretty good idea! I'm currently in Italy on vacation though, but I'll post a proper challenge when I get back!

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline KaiHH

JGO Kernel

Medals: 429

« Reply #3 - Posted 2017-06-26 20:16:04 »

I've thought about compression a bit more. Delta encoding together with a hand-crafted LZ77 variant (really not that hard to do) is probably a good option for you, under the following assumptions:

- within a "scanline" or a Morton/Hilbert path there are long runs of constant data (-> run-length encoding would shine here)
- if there is variance it changes with more or less constant frequency so the first order derivative would be more or less constant (-> delta encoding would shine here which in turn makes LZ77 perform well, too) (to handle positive as well as negative derivatives you can zigzag-encode the deltas to get all numbers positive again)
- if there are not many "runs" of the same tile type (or first order derivative) there _might_ be recurring patterns (which again would be great for LZ77)

These are also the assumptions being made by the PNG compression algorithm, which is basically delta encoding plus deflate (LZ77 + Huffman).

Now you can tune a custom LZ77 implementation to be more suited for run-length encoding as opposed to pattern matching by altering the coding scheme used for the <offset, length> pairs and literals, both in how many bits to allocate for each of that information and especially how you entropy-encode these bits.
If you expect there to be lots of runs of the same data (tile numbers or derivative) and not so much patterns found in some distances to each another, then first use the popular LZ77 scheme which uses lengths longer than the offset to indicate repetition and therefore enables runlength encoding, and then Huffman-encode the offset using for example 2 bits for the "offset = 1" case (i.e. directly following the current encoder position in the LZ77 lookahead buffer) and a 3-bit extra-bits code with 2-3 extra bits for the offset up to 13 bytes behind in the LZ77 window. Then probably a 1-bit extra-bit code with 2-3 extra bits for the length of the run.

For example, using this scheme:

## Encoding of literal
* '0' + <literal byte> <-- first 0 bit indicates "now comes a literal byte" (like LZSS does)

## Encoding of offsets (with always a first 1 bit to indicate "now comes <offset, length> and not literal")
* 1     = '10' <-- we expect large single-byte runs so keep it the most common and shortest case
* 2-5   = '110' + 2 extra bits
* 6-13  = '111' + 3 extra bits <-- we don't want to find recurring patterns that far behind

## Encoding of lengths
* 3-6     = '0' + 2 extra bits <-- here is where we can tweak things, too...
* 7-14    = '1' + 3 extra bits <-- ...if we expect longer runs to be more common, we just swap the intervals

This is to be efficient to represent moderately long runs of data very close to the current encoding position, but rather inefficient for patterns that are further behind in the LZ77 window (while hopefully still being more efficient than not caring about patterns at all and _just_ implementing run-length coding).
Essentially, what we want here as well is to avoid having to encode a (canonical) Huffman tree in the data as well, since the 32x32 tiles data is rather very short in total.

Next, I would encode the tile type/number completely separately from the "second layer" which you mentioned. If the latter is more constant we can make use of that by first encoding the tile type/number of all 32x32 tiles which we expect to have a higher entropy than the second layer, and then encode the second layer of all 32x32 tiles with an encoding scheme that favors low-entropy long runs even more, probably by just using run-length coding.

But we won't know what works unless there is some example data. Smiley
Offline KaiHH

JGO Kernel

Medals: 429

« Reply #4 - Posted 2017-06-27 23:54:32 »

I pushed a small and working implementation of an arithmetic coder, which you can use as a basis for a final coder.
Arithmetic coding is superior to Huffman when your probabilities are not powers of 1/2, which is almost always the case.
It uses 64-bit integers to represent symbols of any alphabet (so you can even use delta with zigzag encoding and encode those the [0-510] range), and you can plug a probability distribution model which directly reflects your <tile type, second layer, ...> values to implement different probabilities for the two values. See the "RoundRobinModel" class.
Offline ddyer
« Reply #5 - Posted 2017-06-28 00:12:01 »

If you just use a standard compression algorithm such as LZW, you'll come very close
to that you can achieve by hand-crafting something of your own.  Maybe even better.
And you'll have more time to work on your game instead of reinventing the wheel.
Offline Spasi
« Reply #6 - Posted 2017-06-28 07:16:23 »

On the topic of compression, but not necessarily applicable to this use-case, I've been meaning to add a few different implementations to LWJGL. Specifically, I'm interested in compression algorithms that make unusual space/speed trade-offs. For example, Zstandard and LZFSE. Suggestions for other algorithms are welcome. I'm eagerly looking for an rANS-based implementation, like cbloom's Kraken, that is open source.
Pages: [1]
  ignore  |  Print  

Sralse (13 views)
2017-07-25 17:13:48

Archive (412 views)
2017-04-27 17:45:51

buddyBro (593 views)
2017-04-05 03:38:00

CopyableCougar4 (1054 views)
2017-03-24 15:39:42

theagentd (1086 views)
2017-03-24 15:32:08

Rule (1057 views)
2017-03-19 12:43:22

Rule (1043 views)
2017-03-19 12:42:17

Rule (1032 views)
2017-03-19 12:36:21

theagentd (1175 views)
2017-03-16 05:07:07

theagentd (1132 views)
2017-03-15 22:37:06
List of Learning Resources
by elect
2017-03-13 14:05:44

List of Learning Resources
by elect
2017-03-13 14:04:45

SF/X Libraries
by philfrei
2017-03-02 08:45:19

SF/X Libraries
by philfrei
2017-03-02 08:44:05

SF/X Libraries
by SkyAphid
2017-03-02 06:38:56

SF/X Libraries
by SkyAphid
2017-03-02 06:38:32

SF/X Libraries
by SkyAphid
2017-03-02 06:38:05

SF/X Libraries
by SkyAphid
2017-03-02 06:37:51 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!