Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (731)
Games in Android Showcase (217)
games submitted by our members
Games in WIP (799)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
   Home   Help   Search   Login   Register   
  Show Posts
Pages: [1] 2 3 ... 122
1  Discussions / General Discussions / Re: Compressing chunk of tile data? on: 2017-06-23 18:43:18
That's actually a pretty good idea! I'm currently in Italy on vacation though, but I'll post a proper challenge when I get back!
2  Game Development / Shared Code / Re: "Better" LinkedList implementation on: 2017-06-23 17:46:11
A pooled Node version would probably be slightly worse performance-wise when looping over the list due to the Node object and the element object being in two different places in memory. Also, without exposing the pointers, it becomes a bit annoying to loop through the list, but that's no real concern I guess.

Interesting, it'd be cool to see a.comparison between that and my implementation.

Hmm, how exactly would that work?
3  Discussions / General Discussions / Compressing chunk of tile data? on: 2017-06-23 17:34:01
I need to compress a 32x32 chunk of tile data. The data consists of 1 byte for the terrain type, and 1 short for a detail tile (second layer essentially). The terrain tiles are very continuous, while most detail tiles are 0 (null). What would a good compression method?

Here's my current idea:

1. Reorder data into z-curve order or Hilbert curve order to improve spatial coherency

2. Run length encode data using a special repeat byte/short value (say Byte/Short.MAX_VALUE) followed by number of repetitions, like this:
 baaaaab ---> ba*5b

3. Compress data using a precomputed Hilbert code. The idea is to precompute several Hilbert trees from statistical data from a lot of worlds based on the two most common tiles in the chunk, so there'll be one tree for each biome essentially. The chunk encodes a byte for which tree to use, then the encoded data. The RLE symbol is encoded in the tree, but the RLE count could be encoded using a different huffman code tree specifically for RLE counts.

My question is: Would this beat the DEFLATE compression I'm currently using? Is it even worth trying to implement this? I feel like DEFLATE should have problems compressing small chunks of data like this as it tries to learn about the data as it goes. Also, would it make sense to use delta encoding here?
4  Game Development / Shared Code / Re: "Better" LinkedList implementation on: 2017-06-20 13:01:46
The removeFirst() and removeLast() functions don't check for empty lists yet.  Pointing  Grin

I like your solution, esp. because its an intrusive list - the node pointers are members of the list element.
NOTHING in this list checks for errors. I should've mentioned that in the original post I guess. xd

>Add already existing element again ---> Corrupt link references.
>Add element from another list ---> Corrupt other list.
>moveToHead() on object not in the list ---> head is set to object not in the list.
>remove() object not in list ---> corrupts size counter and may corrupt link references.
>removeFirst/Last() on empty list ---> nuppo

I keep track of if an element is in the list outside the list for the cache, so such testing wasn't necessary (and IMO not the job of the list in the first place).

I'd be inclined to have a reference in the Element, to the List that owns it.

It'd allow fail-fast protection against multiple logic errors (adding an Element to multiple lists/the same list multiple times, removing/moving an element from a List that it's not a member of, etc), making the code far less fragile.
Yeah, that'd be how you add testing. Add a reference to the list the element currently is on, check this reference in add*(), moveToFirst() and remove(). Null out the reference on removal. Also add an isEmpty() check for the removeFirst/Last() functions.

Actually you could make use of this. Just don't move the object to the top of the list, if it's in the upper half (or some other threshold) already. If anything, just swap it with an object a couple of indicies up in the buffer. This way you would only produce holes for the less used objects. Since you don't need additional references, you can easily make up for the lost cache entries by increasing the array size.

Or you only move the head pointer for adding new objects and just swap existing objects up in the buffer. This would not produce any holes and while it is technically not an LRU list anymore, it might provide good enough heuristics to function as a cache.
I'm worried about experimenting too much with my use case. The cache is critical for performance, and if I were to start thrashing, the game will essentially freeze up. I could try some kind of system where you don't move an accessed element to the head of the list, but just "upgrade" it a couple of elements up by swapping it with an element x indices up in the list. If x is randomly chosen for each swap, then the risk of getting to a pathological case would be low. Like you said, this wouldn't be an LRU cache anymore, but some kind of heuristic priority cache I guess. It's an interesting idea, but I'm afraid I don't have much more time to spend on this right now. =/
5  Game Development / Shared Code / Re: "Better" LinkedList implementation on: 2017-06-18 22:15:01
Are you thinking about a concurrency-safe version by chance, or is that definitely not an issue in your situation?
Not really, it's not a requirement for my use case. It'd just get messy if I needed to support concurrency in the system I'm working on, really. You can always just synchronize on the whole class if you really wanted to.
6  Game Development / Shared Code / Re: "Better" LinkedList implementation on: 2017-06-18 13:06:41
The problem with a simple ring buffer is that it's essentially a FIFO queue. When an element is accessed, I need to move it to the start of the queue again to make sure it's the most recently used again. I don't think that's doable in an array-based ring buffer. Removing an element would leave a null hole in the array. If you want to fill the hole, you need to shift half the elements to fill the hole (either the ones before or after, so 0 to size/2 elements). If you just leave the hole there, you've essentially reduced the size of the cache until the head reaches that point and can fill the null hole, right? I imagine a pathological (but very common) case where the cache is full of objects, and then two elements are accessed repeatedly, causing them to be moved to the head of the list repeatedly. If each movement leaves a null hole, then one element would be evicted from the cache each time an element is moved to the front of the list. =/ If I've misunderstood or missed something, please let me know.
7  Game Development / Shared Code / "Better" LinkedList implementation on: 2017-06-18 01:43:53
Hey, everyone.

Today I found myself in a need of a least-recently-used (LRU) cache system. Essentially, I wanted a cache of the last 10 000 used objects. When a new object was to be loaded, I wanted to unload the oldest element in the cache. Before, the cache was only 96 elements, at which point looping through the list to find the least recently used element was perfectly fine for the load I had, but with 10 000 objects, that situation changed. I found that LinkedHashMap could be used as a makeshift cache, as it can be set to update the internal order of elements when they are accessed, but it just seemed inefficient and weird for what I actually wanted to accomplish here. Some research made me realize that a simple linked list was a good fit for this, as a double linked list can easily be maintained in LRU order. Basically, whenever an element in the cache is used, it is moved to the beginning of the list (an O(1) operation on linked list), meaning that the cache stores objects sorted from most recently used to least recently used. Then to remove the most least recently used object, I just remove the tail of the list, again an O(1) operation.

Now, I hope we all know how bad Java's LinkedList class is. Not only does it generate garbage every time an object is added to it, but moving an object to the beginning of a list is an O(n) operation as I'd need to use list.remove(object), which would need to scan through the list until it finds the object in question and can remove it. A proper linked list implementation would store the previous and next references in the object itself, meaning that finding the previous and next objects become O(1) operations, and the element can be removed without going through the entire list. So... I wrote a minimal linked list that could do what I needed and not much more.

As mentioned, the main difference is that my linked list class does not allocate Node objects to store the next and previous references, but instead stores those references in the elements themselves. This leads to a couple of peculiarities.

 - All elements must implement an interface that allows the list to set the previous and next pointers (or alternatively extend a class which implements those functions for it).
 - You can't place the same element in a linked list twice, as each element can only track one previous and next neighbor.
 - For the same reason, you can't even use the same element in two DIFFERENT lists at the same time.

In my case, none of these were an issue, so I went ahead and implemented it. I even added a special method for moving an element in the list to the start of the list for maximum performance in that sense. The class provides the following functions, all of which are O(1):

 - addFirst(e)
 - addLast(e)
 - moveToFirst(e)
 - remove(e)
 - removeFirst()
 - removeLast()

There is no random access function. The correct way of looping through the list is to simply call element.getNext() until it returns null (the current size of the list is tracked though). The somewhat complicated usage of generics is there to allow for type safety, both when extending the Element class and when working with FastLinkedList.

Usage example:
private class MyElement extends FastLinkedList.SimpleElement<MyElement>{ //Note this definition here, it's particularly clever =P

FastLinkedList<MyElement> list = new FastLinkedList<>();

Performance test on 100 000 randomly shuffled elements:

Adding 100 000 objects to the list:
    LinkedList: 1.617 ms
    FastLinkedList: 0.627 ms
~3x faster

Move 2000 random elements from their current position to the start of the list:
    LinkedList: 203.315 ms
    FastLinkedList: 0.118 ms
Insanely much faster (O(n)--->O(1))

Remove 2000 random elements from the list:
    LinkedList: 175.541 ms
    FastLinkedList: 0.094 ms
Insanely much faster (O(n)--->O(1))

Remove remaining 98 000 objects using removeFirst():
    LinkedList: 0.298 ms
    FastLinkedList: 2.78 ms
Noticeably slower in this test due to worse cache coherency. FastLinkedList needs to access each element's data to find the next and previous node in the list. As the elements are shuffled, this ends up jumping around randomly in memory. Java's LinkedList instead creates Node objects for each element, which end up sequentially in memory as they're recreated each iteration. This is not an issue with real-life data, and is the cost of going garbage-less.

package engine.util;

import engine.util.FastLinkedList.Element; //this import is required

public class FastLinkedList<E extends Element<E>> {
   private E head, tail;
   private int size;
   public FastLinkedList() {}

   public void addFirst(E element){
      if(size == 0){
         head = element;
         tail = element;
         head = element;
   public void addLast(E element){
      if(size == 0){
         head = element;
         tail = element;
         tail = element;
   public void moveToFirst(E element){
      if(element == head){
      E prev = element.getPrevious();
      E next = element.getNext();
      prev.setNext(next); //prev cannot be null thanks to if-statement above
      if(next != null){
         //element was tail, update the tail.
         tail = prev;
      head = element;
   public void remove(E element){
      E prev = element.getPrevious();
      E next = element.getNext();
      if(prev != null){
         //prev == null means element was head.
         head = next;
      if(next != null){
         //next == null means element was tail.
         tail = prev;
   public E removeFirst(){
      E h = head;
      E next = h.getNext();
      if(next != null){
         //next and prev are null, list is now empty
         tail = null;

      head = next;

      return h;
   public E removeLast(){
      E t = tail;
      E prev = t.getPrevious();
      if(prev != null){
         //next and prev are null, list is now empty
         head = null;
      tail = prev;
      return t;
   public E getFirst(){
      return head;
   public E getLast(){
      return tail;
   public int size() {
      return size;
   public boolean isEmpty(){
      return size == 0;
   public String toString() {
      StringBuilder b = new StringBuilder("[");
      E e = head;
      for(int i = 0; i < size; i++){
         if(i != size-1){
            b.append(", ");
         e = e.getNext();
      return b.append(']').toString();
   public static interface Element<E extends Element>{
      public void setNext(E next);
      public void setPrevious(E previous);
      public E getNext();
      public E getPrevious();

   public static class SimpleElement<E extends SimpleElement> implements Element<E>{
      private E next, previous;

      public void setNext(E next) { = next;

      public void setPrevious(E previous) {
         this.previous = previous;

      public E getNext() {
         return next;

      public E getPrevious() {
         return previous;
8  Game Development / Shared Code / Re: Compute view/proj matrices for head tracking rendering on: 2017-06-14 14:50:50
I may have something to show at some point. =P
9  Java Game APIs & Engines / OpenGL Development / Re: Optimizing Performance on: 2017-06-12 12:57:48
Are you on mobile? I've heard about that being a major issue on mobile, but never on desktop. Mobile likes to prefetch texture data before the shader starts, which isn't possible if you need to run the shader to figure out the texture coordinates.
10  Java Game APIs & Engines / OpenGL Development / Re: Optimizing Performance on: 2017-06-12 03:28:38
Cool, let me know if you have any questions. I'd love to hear about your results when you're ready, too. =P
11  Game Development / Newbie & Debugging Questions / Re: (LibGDX) Memory Leak/Issue With ShaderProgram on: 2017-06-12 01:41:10
I'm not familiar with that program, but are you sure you're interpreting this correctly? Isn't it saying that there are a crapload of ShaderProgram objects, and they're all stored in a single Object[]? Wouldn't that simply mean that you may be loading shaders, throwing them on an ArrayList somewhere and forgetting about them, preventing them from being garbage collected?
12  Discussions / Miscellaneous Topics / Re: What I did today on: 2017-06-11 19:52:35
Today I reworked a large part of RF's rendering pipeline. I finally added proper sRGB support throughout the pipeline, and the fake bloom just looks so much better with it. We'll need to re-tweak a lot of colors as all non-texture colors (e.g. hardcoded float color values for as vertex colors or uniform color values) will look significantly brighter now, but it'll just be a tiny amount of work to fix.

A very common problem I've encountered is essentially wanting to scale something nicely without aliasing, but still retaining the pixely look of GL_NEAREST filtering. Essentially, I want antialiased edges between the different texels, but the actual interior of each texel to be just a single color from the texture. I found the solution to this: I enable linear filtering and then use a shader to "sharpen" the texture coordinates before sampling. This lead to very good results.

What I actually wanted to use this for was to upscale the entire rendered scene in Robot Farm to screen resolution. Our tiles are 16x16 pixels big, which meant that we needed to render those tiles (and everything else) at a scale that was a multiple of 16. We then realized that having a higher screen resolution or simply lowering the tile scale meant that the player would be able to see much further than intended, which gave the player a big advantage when exploring. To keep the view distance fixed, we needed to support arbitrary scaling with of tiles without them looking like absolute crap. For example, here are our 16x16 tiles drawn as 17x17 quads.

The issue here is that with nearest neighbor filtering, the pixels from the original tiles end up covering either 1 or 2 pixels, which causes heavy distortion of objects depending on where they are on the screen. This is especially visible during motion, but the anti-aliased text drawn at tile resolution shows the issue very well in a still image. The numbers look choppy and have weird thickness here and there. When using the new shader to upscale, here's what I get:

In this case, the new shader essentially works as bilinear filtering of the screen as the upscaled resolution is so similar to the original resolution, but it is a tiny bit sharper than bilinear filtering at least. Not too impressive actually...

Looking at a more zoomed-in result where we upscale from 16x16 to 34x34, we get the following result with nearest neighbor filtering:

Ugly again, the fact that each pixel is upscaled to either 2 or 3 pixels causes visible differences and artifacts in the text again. Let's try bilinear filtering:

Well, it's soft and all, but it's also very blurry. How about the new shader?

Perfect! The image is as sharp as it can be without introducing aliasing, but still filtered to the point where the pixels in the text look perfectly even. Try switching between this and the unfiltered image and see the massive difference without a significant loss of sharpness.

Essentially, what the new shader does is give you the output of extremely super-sampled nearest neihgbor filtering, without requiring rendering at a higher resolution. It looks great and costs pretty much nothing! =D

13  Java Game APIs & Engines / OpenGL Development / Re: Optimizing Performance on: 2017-06-10 10:59:42
I strongly recommend that you first of all try to optimize the terrain rendering. It's by far the slowest part, so optimizing it should be a priority. You should try to figure out what's making it slow. Either you're drawing too many triangles and/or your vertex shader is too expensive, so processing the geometry is the slow part, or the slow part is processing the pixels. You can diagnose this by changing the resolution you render at. If you halve the resolution (width/2, height/2), does the timing of the terrain rendering stay the same?
> Performance is ~4x faster ---> The slow part is either the fragment shader and/or the writing of the data to the G-buffer textures, so we should take a look at the fragment shader of the terrain to investigate further.
> Performance roughly stays the same ---> The slow part is the sheer number of vertices and/or the vertex shader. Take a look at the vertex shader and consider adding a LOD system to reduce the number of vertices of distant terrain, if you don't already have that.

Concerning your lighting shader...

 - Be careful! Some compilers don't accept automatic int-->float conversion. Line 30 and 33 seem to cause issues on at least some AMD hardware. Make sure those are float literals.

 - I'd recommend having different shaders for different types of lights. If you want, you can inject #defines into the source code to specialize the shader for different lights instead of relying on runtime branching on uniform variables. Although branching is not usually very expensive anymore (especially branching on uniforms as that means all shader invocations will take the same branch), it still forces the GPU to allocate enough registers for the worst case branch for all invocations, which can negatively impact texture read performance.

 - Consider using a signed texture format for your normalTexture (GL_RGB8_SNORM for example). They'll be more accurate and are automatically normalized to (-1, +1) instead of (0, 1), so you won't have to do that conversion yourself.

 - You seem to be doing lighting in world space instead of view space which is more common. Doing it in view space has the advantage of placing the camera at (0, 0, 0), which simplifies some of the math you have.

 - Even if you choose to do the lighting in world space, precompute the inverse projection view matrix and do a single matrix multiply. Currently, around half of the assembly instructions in your shader comes from this single line:
vec4 homogenousLocation = invViewMatrix * invProjectionMatrix * clipSpaceLocation;

This line first calculates a mat4*mat4 operation, which you might recognize requires computing a 4D dot product for every single element in the array. This requires 4 operations per element, so that's 64 operations right there. The resulting matrix is then used to do a mat4*vec4 operation, which is much cheaper; this only requires 4 dot products = 16 operations. That means that changing it to the following code is 60% faster as it avoids the mat4*mat4 operation:
vec4 homogenousLocation = invViewMatrix * (invProjectionMatrix * clipSpaceLocation);

but the fastest will always be
vec4 homogenousLocation = invViewProjectionMatrix * clipSpaceLocation;

which will make the entire shader around 80% faster in total. In other words, doing that instead gets rid of 44% of the instructions in your entire shader. As your lighting shader is ALU-bound (lots of math instructions), you're likely to see a very significant performance increase from that optimization alone.

Together, all the optimizations above (signed normal texture + view space lighting + precomputed matrix) should theoretically yield a 87% increase in performance of the lighting.

In addition:

 - Doing a fullscreen pass for the ambient light is extremely inefficient. That requires you to do an entire fullscreen pass just to add the ambientLight*diffuseColor to the lighting computation. This requires your GPU to read in the entire diffuse texture and blend with the entire lighting buffer, which is going to involve gigabytes of memory moved around and millions of pixels filled just so you can do three math instructions per pixel. I can see that you're doing bloom/HDR right after the lighting. See if you can pack the ambient light calculation into one of those shaders instead. Adding 3 math instructions to a different shader is always going to be faster than doing an entire fullscreen pass.

 - I get the impression that fog could be merged into another shader as well to save the overhead of a fullscreen pass (like the DoF).

 - Your FXAA shader looks a bit expensive for some reason. Are you using a custom one?

 - You shouldn't be doing fullscreen passes for local lights either. There are a number of different techniques for making sure you're not rendering too many unnecessary pixels.
 > Render an actual sphere and only compute lighting for the pixels covered by the sphere (pretty simple to implement).
 > Use the scissor test and the depth bounds test to only process pixels in a rectangular area around the sphere that are within the depth bounds of the sphere (best and fastest, simple but some complicated math to calculate everything).
Consider implementing one of those.

14  Java Game APIs & Engines / OpenGL Development / Re: Optimizing Performance on: 2017-06-09 09:37:42
Your first step should be getting basic GPU profiling working. See this thread: Using that, you can get the exact time your different render passes and postprocessing effects take, which will allow you to see where you should focus your efforts.

Once you've figured out your bottleneck, you can start optimizing it. If you find anything that stands out, I can help you diagnose what's making that particular part slow.
15  Game Development / Newbie & Debugging Questions / Re: LibGDX Box2D Applying Horizontal Linear Impulse Causes Object To Glide on: 2017-06-07 03:32:37
From what little I remember from playing with Box2D, I think I know what's going on. You're applying too much force and have a too low maximum velocity for your player. What happens is that the horizontal motion completely overwhelms the vertical motion, so when the velocity is capped to the max velocity, the vertical motion pretty much disappears.

Example: Max velocity 10, velocity (100, 1). Velocity is calculated to ~100.005, which is capped to the max velocity. The end result is that velocity is set to around (10, 0.1). If you keep pumping horizontal impulse into the entity, he'll never be able to pick up speed and fall due to the max velocity clamping continually reducing the vertical velocity.
16  Discussions / Miscellaneous Topics / Re: Silly Programming Mistakes on: 2017-06-04 21:30:47
Did this one today:
BufferedImage tilesetImage = File("test resources\tilesets\tileset.png"));
17  Java Game APIs & Engines / Engines, Libraries and Tools / Re: LWJGL 3.1.2 on: 2017-05-15 19:31:05
Oh, man, this is awesome! I've never heard of MoltenVK, maybe that'll save MacOS for now. I also didn't know that stb_dxt existed, gotta try that out! Also, stb_image now supports 16-bit PNGs, which are perfect for normal maps!
18  Game Development / Newbie & Debugging Questions / Re: Switching Shaders - Painter's Algorithm on: 2017-05-11 13:26:55
 - Orthographic projection matrices still calculate a depth value, which is configured using the near and far values when setting up your orthographic matrix.

 - Z-buffers do not work well with transparency, as a semi-transparent sprite will update the depth buffer, preventing things from being drawn behind it. If you're OK with binary alpha (either fully opaque or fully transparent), you can use alpha testing / discard; in your shader to prevent depth writes for the fully transparent parts (this is a very common technique used for foliage in 3D games).

 - If-statements are not slow per se. They're only slow if 1. the shader cores in a group take different branches, as that forces all cores to execute all branches, and/or 2. the worst path of your shader uses a lot of registers, as the GPU needs to allocate registers for the worst case branch regardless of if that branch executes or not (this is very rarely a problem). If you have if-statements where all pixels of a triangle take the same path, the overhead is essentially zero. In addition, 2D games are often CPU limited as they often need to micromanage the sprites, so getting rid of CPU work at the cost of GPU work is often a net win as the GPU was sitting idle for most of the time anyway.

 - If you can draw all stumps first, then all leaves, then you only have two shader switches regardless of the number of trees you have, in which case the number of shader switches is constant. In that case, 2 switches instead of 1 is insignificant.
19  Discussions / Miscellaneous Topics / Re: What I did today on: 2017-05-10 14:03:30
A couple of days ago, I managed to get some huge improvements in NNGINE's performance. Seriously, it's frigging insane how much faster it got. In addition, it was really simple and just took a day to do. There were huge GPU performance improvements throughout the entire engine, especially in the G-buffer filling pass, and the memory controller load is down a lot too! A scene which could barely maintain 60 FPS is now doing a crazy 280 FPS!!!

TL;DR: Upgraded from one GTX 770 to two Zotac GTX 1080s, with 5 years warranty to avoid one of them breaking 2 months after the 3 year warranty ends again...
20  Game Development / Newbie & Debugging Questions / Re: [Solved] GDX Bullet - Lag Spikes on: 2017-05-10 09:22:08
ViSualVM is awesome, yeah. It's really good for finding out where your garbage is coming from and can give you a good indicator of hotspots in your code. It can however sometimes be a bit confusing though and the profiler generally makes games drop to 0.01 FPS or so. =/
21  Game Development / Newbie & Debugging Questions / Re: [Solved] GDX Bullet - Lag Spikes on: 2017-05-10 04:38:09
Wait, so your issue was just a crapload of hashmap insertion? =P
22  Game Development / Newbie & Debugging Questions / Re: GDX Bullet - Lag Spikes on: 2017-05-10 04:16:44
Well, try running some memory profiling using VisualVM. It can log stack traces of all allocations you do.
23  Game Development / Game Mechanics / Re: Skeletal Animation structure on: 2017-05-10 03:52:31
My suggestion:

class Bone{
    Bone parent;
    double x, y;
    double sin, cos; //of the rotation

When you want to animate a bone, you can then easily get the parent (if there is one), transform it by the parent and go on. You don't generally need a list of children in each bone as long as you have a "global" list of all bones in a skeleton.
24  Java Game APIs & Engines / OpenGL Development / Re: [LWJGL] Conditional rendering? on: 2017-05-03 16:31:43
For such small pieces of geometry, hardware instancing is said to be fairly inefficient. Since your geometry seems to essentially be <6 vertices each, you're probably better off packing all your vertex data of a chunk in a single buffer, using an index buffer to construct polygons out of triangles. If your per-object matrices are static, then manually premultiplying them into the vertex data is a good idea. Otherwise, you can add a manual instance ID variable to each vertex (just 4 bytes per vertex), which you can use to in turn look up a per-object matrix from a texture buffer. Then you can reduce each chunk to a single draw call, and you can then adjust the CPU/GPU tradeoff by modifying the chunk size.
25  Game Development / Performance Tuning / Re: Pathfinding over too large grid on: 2017-04-27 20:12:57
I investigated a lot of different solution, from hierarchical systems to waypoint systems, but they all had different problems, ranging from expensive precomputing to not being able to handle dynamic worlds (units can block a tile). In the end I decided to try optimizing the existing implementation and see how far that got me.

The first issue was the memory usage of the array that holds all the nodes. As the worlds we generate are mostly inaccessible terrain like mountains or ocean, a majority of the elements in the array are going to be null. However, a 40 million element array of 32-bit references is still going to use 150MBs, so the single monolithic array had to go. I split up the world into chunks of 16x16 tiles. That left me with a 480x320 chunk array instead, which only uses around half a megabyte of memory instead. This reduced the constant by around 150MBs. Chunk objects are then allocated (and reused) on demand when the pathfinder first enters a chunk for each job. Chunks have all their nodes preallocated, so there's a tiny bit of inefficiency for the edge around an inaccessible area; if just one tile in a chunk is accessible and explored by the pathfinder, the entire chunk will need to be allocated/reserved.

The second problem was the size of each node. The Node object I used to store the pathfinding data of each tile had the following data:
      int x, y;
      int lastOpenedStart, lastOpenedGoal;
      Node parentStart, parentGoal;

For some crazy reason, this ended up taking up 40 bytes per node, plus 4 byte per node in the arrays used to hold the Nodes in each chunk!!! I count 24 bytes of actual data in there. 24 bytes of actual data taking 44 bytes! I decided to look for ways of reducing the size of the information.

As our max world size is 7680x5120, it's clear that for x and y we really don't need the full range of a 32-bit integer. With 13 bits for each of X and Y, we can accommodate up to 8192x8192 worlds, which is just enough for us.

The reason why the lastOpened* values are ints instead of booleans is to avoid having to reset the entire world after each pathfinding job. If those were booleans, I'd need to go through all opened nodes and reset them back to false after each job. By making them ints, I can implicitly "reset" the entire world by simply doing jobID++; once in the pathfinder. Then when I want to mark a node as opened, I do
lastOpenedStart = jobID;
, and to check if a node is opened, I just do
if(lastOpenedStart == jobID) ...
. However, with the chunk system, it became trivial (and necessary anyway) to reset chunks when they're reused, so such a system isn't necessary here. Hence, we can compress the two opened fields down to just 2 bits in total.

Lastly, the two parent references are used for backtracking once a path has been found. However, in our simple tile world there are only 4 possible directions you can travel in, so we can store each parent with just 2 bits used to encode which of the 4 directions the parent is in.

This all sums up to a grand total of...... exactly 32 bits per node. In other words, my "node array" is currently just an int[] which I pack in all the data into. I can fit an entire node in just 4 bytes per node instead of the 44 bytes used before. In addition, the chunk system means that in most cases the worst case scenario for memory usage is never reached due to more efficient reuse of chunks between multiple pathfinding jobs. The new system averages less than 50MBs, with an absolute worst case scenario at under 75MBs. The improvements in memory locality also almost doubled the performance of the system.

Here's a comparison normal breadth-first search and bi-directional breadth first search:



There's a huge difference in the number of tiles explored by the two (the red area). The normal search explores almost every single reachable tile in the world, while the bidirectional one generally avoids exploring half the world in worst case scenarios. This reduces peak memory usage by quite a bit.
26  Discussions / Miscellaneous Topics / Re: What I did today on: 2017-04-27 20:02:44
Two monitor person here. Best decision ever. Code on one monitor, talk about anime girls with your best friend on the other.
SHHHHHH!!! That was supposed to be our secret! >___<

Jokes aside, two monitors is great. Having a Eclipse on one monitor, Skype/an internet resource/a second Eclipse window on the other helps a lot.
27  Game Development / Performance Tuning / Pathfinding over too large grid on: 2017-04-26 02:23:11

I'm taking a look at the pathfinder we have for RF, and to my horror I realized that it had a super bad worst case scenario for memory usage. At worse, this memory usage could skyrocket to over a gigabyte of memory! >__<

The problem is that our worlds are really big, up to 7680x5120 or so. The game world is essentially a grid, and the data is split up into chunks of 32x32 tiles that can be paged to disk when they aren't used. The pathfinder I'm using is a bi-directional breadth first search that searches from both the start and the goal and finds where the searches meet to reduce the number of tiles that get checked. For each tested tile, the tile is checked if it's accessible, which may require loading in data from disk. The last N chunks are cached in RAM until a new chunk is needed. The pathfinding may potentially need to explore the entire map to find a path (or admit failure), but this is rare (yet still possible).

The problem I'm having is that the pathfinding requires a huge amount of data per tile. For each visited tile, I need to know if it's been opened by the start search and the goal search, which I store using a job-ID system to avoid having to clear nodes. Secondly, to get neighbors and be able to find neighbors and to backtrack, I need to track the XY position (stored as shorts right now) and two parent references. This all sums up to 28 bytes per tile. Hell, just the array for holding the nodes of the entire world is huge, as 7680*5120*4 = 150MBs. The tiles then add an additional 7680*5120*28 = 1050MBs for a total of 1200MBs of raw pathfinding data. It's insane! The entire point of the chunk system goes out the window. In addition, I cannot only track pathfinding data of loaded chunks as if the search is wide we'd start unloading the chunks looked at at the start of the search (the start and finish), which would mean that we would be unable to backtrack. There's no way of telling which nodes we'll need for the final path.

I tried limiting the search area to reduce memory usage. The idea was to preallocate a square area of nodes. This square grid of nodes would then be centered on the average of the start and goal coordinates so that the search could run with "local" nodes. This obviously limited the maximum distance between the start and the goal, and also made it impossible to find paths that need to venture outside the square area of preallocated nodes. In the end, I realized that less than 1000 tiles of search distance in all directions (2048x2048 square search area) was the minimum I could drop it to, and that's still hundreds of megabytes of data.

The thing is that the number of traversible tiles is pretty low. Most of the world is either ocean, mountains or forest, so if I could eliminate those tiles I could cut memory usage by a lot. However, the 150MBs of memory needed just for the array is a lot, and it's so useless when a majority of the elements will be null anyway. If I could find a better way to store the pathfinding objects than in an array, I could eliminate a lot of that data. Similarly, if I can avoid allocating 70% of the nodes as they're not traversible, That would cut memory usage from 1050MBs to ~350MBs. Finally, I could reduce memory usage of the node class from 28 bytes to 20 bytes, another 100MBs saving for a final total of 250MBs. That would be...... acceptable.

So I guess my real question is: What's a good data structure for storing a very sparse/lazily allocated grid? It needs to quickly be able to retrieve a node given its coordinates (or null if none exist). Maybe a custom hashmap that doesn't use linked lists to resolve collisions? Any other ideas?
28  Discussions / General Discussions / Re: Programmer jokes on: 2017-04-24 23:53:28
Sorry, but even I don't have any advice on how to make UI design fun.

Hahahahahaha! Hahaha! Hahaha... Haha... Ha..... Ha....... Q___Q
29  Discussions / Miscellaneous Topics / Re: What I did today on: 2017-04-20 15:01:41
The quote could be interpreted as "Things always get better, even when they get worse.", which I assume is the cause of Riven's distress.
30  Game Development / Game Play & Game Design / Re: Graphics Backend Abstraction on: 2017-04-19 20:35:51
Porting being a no-op is, I think, an illusion. Different platforms require different gameplay and different performance trade offs. Whether or not your graphics-engine can seamlessly transition from HTML5/WebGL to multiple dedicated GPUs is a mere side note. The effort that is put into building an VK-like API on top of OpenGL is impressive, but by the time you're ready to publish your game, VK has probably gone mainstream, and WebGL would support it by then.

It all reminds me of some of the efforts Cas and I took upon us, to scale the sprite-engine from the low-budget to high-end GPUs, and seeking fast-paths through outdated drivers and/or hardware. It just got in the way of publishing the games, and that framerate bump wasn't ultimately worth it.

If this technical tinkering takes you 18 months, in that time high-end GPU performance has doubled, but more interestingly: your low-end supported hardware doubled in performance too. To make all that raw power, and super efficient API/engine worth the effort, you need a small army of artists and designers. A one or two man team most likely cannot come close to what the big (and free) engines support, so why not pick an art-style that hides the technical inefficiencies? It's hard enough to turn a tech demo into a polished, playable, purchable game. Very little of game programming is performance sensitive, so why focus all your energy on it?

My $0.02 Kiss
Of course, that different platforms require different trade offs is a given. A game designed for a keyboard will not work in app form, and a graphics engine designed for 500GB/sec bandwidth GPUs will not run well on a mobile GPU. However, there most definitely are overlaps in the technical department. For example, allowing the user of an Android tablet to plug in a controller and play the game just like on PC is definitely a plus, and a shitty Intel GPU has more in common with a mobile GPU than a dedicated 400$ GPU from AMD or Nvidia. In other words, alternate ways of controlling the game will be a nice feature to have on mobile/tablets, while a version of the engine tweaked for mobile will work very well as a "toaster mode" setting for weak desktop GPUs too, something we've gotten requests for for WSW.

Note: By "engine", I mean the engine built on top of the abstraction, not the abstraction itself. In the example above, I was specifically referring to using deferred rendering for desktop to take full advantage of the huge amounts of processing power and bandwidth available on desktop to achieve great lighting, while falling back to forward rendering on mobile GPUs where bandwidth is scarce and a 24 byte/pixel G-buffer isn't practical. Again, this is pretty much exactly the same problem that Intel GPUs have as they use system RAM, so using forward rendering on Intel GPUs could easily double or triple performance for us (but only at the absolute lowest settings with minimal lighting of course).

I appreciate your advice regarding technical stuff, but I think that this will be worth it. SkyAphid focuses 100% on developing games using the stuff I write, and I spend most of my time on helping out with that. It's worked out so far. If anything, a solid backend will allow us to work faster when creating the actual game as there's much less to worry about when it comes to performance.
Pages: [1] 2 3 ... 122
Archive (343 views)
2017-04-27 17:45:51

buddyBro (540 views)
2017-04-05 03:38:00

CopyableCougar4 (992 views)
2017-03-24 15:39:42

theagentd (1023 views)
2017-03-24 15:32:08

Rule (997 views)
2017-03-19 12:43:22

Rule (980 views)
2017-03-19 12:42:17

Rule (977 views)
2017-03-19 12:36:21

theagentd (1082 views)
2017-03-16 05:07:07

theagentd (1008 views)
2017-03-15 22:37:06

theagentd (779 views)
2017-03-15 22:32:18
List of Learning Resources
by elect
2017-03-13 14:05:44

List of Learning Resources
by elect
2017-03-13 14:04:45

SF/X Libraries
by philfrei
2017-03-02 08:45:19

SF/X Libraries
by philfrei
2017-03-02 08:44:05

SF/X Libraries
by SkyAphid
2017-03-02 06:38:56

SF/X Libraries
by SkyAphid
2017-03-02 06:38:32

SF/X Libraries
by SkyAphid
2017-03-02 06:38:05

SF/X Libraries
by SkyAphid
2017-03-02 06:37:51 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!