I investigated a lot of different solution, from hierarchical systems to waypoint systems, but they all had different problems, ranging from expensive precomputing to not being able to handle dynamic worlds (units can block a tile). In the end I decided to try optimizing the existing implementation and see how far that got me.
The first issue was the memory usage of the array that holds all the nodes. As the worlds we generate are mostly inaccessible terrain like mountains or ocean, a majority of the elements in the array are going to be null. However, a 40 million element array of 32-bit references is still going to use 150MBs, so the single monolithic array had to go. I split up the world into chunks of 16x16 tiles. That left me with a 480x320 chunk array instead, which only uses around half a megabyte of memory instead. This reduced the constant by around 150MBs. Chunk objects are then allocated (and reused) on demand when the pathfinder first enters a chunk for each job. Chunks have all their nodes preallocated, so there's a tiny bit of inefficiency for the edge around an inaccessible area; if just one tile in a chunk is accessible and explored by the pathfinder, the entire chunk will need to be allocated/reserved.
The second problem was the size of each node. The Node object I used to store the pathfinding data of each tile had the following data:
int x, y;
int lastOpenedStart, lastOpenedGoal;
Node parentStart, parentGoal;
For some crazy reason, this ended up taking up 40 bytes per node, plus 4 byte per node in the arrays used to hold the Nodes in each chunk!!! I count 24 bytes of actual data in there. 24 bytes of actual data taking 44 bytes! I decided to look for ways of reducing the size of the information.
As our max world size is 7680x5120, it's clear that for x and y we really don't need the full range of a 32-bit integer. With 13 bits for each of X and Y, we can accommodate up to 8192x8192 worlds, which is just enough for us.
The reason why the lastOpened* values are ints instead of booleans is to avoid having to reset the entire world after each pathfinding job. If those were booleans, I'd need to go through all opened nodes and reset them back to false after each job. By making them ints, I can implicitly "reset" the entire world by simply doing jobID++; once in the pathfinder. Then when I want to mark a node as opened, I do
, and to check if a node is opened, I just do
if(lastOpenedStart == jobID) ...
. However, with the chunk system, it became trivial (and necessary anyway) to reset chunks when they're reused, so such a system isn't necessary here. Hence, we can compress the two opened fields down to just 2 bits in total.
Lastly, the two parent references are used for backtracking once a path has been found. However, in our simple tile world there are only 4 possible directions you can travel in, so we can store each parent with just 2 bits used to encode which of the 4 directions the parent is in.
This all sums up to a grand total of...... exactly 32 bits per node. In other words, my "node array" is currently just an int which I pack in all the data into. I can fit an entire node in just 4 bytes per node instead of the 44 bytes used before. In addition, the chunk system means that in most cases the worst case scenario for memory usage is never reached due to more efficient reuse of chunks between multiple pathfinding jobs. The new system averages less than 50MBs, with an absolute worst case scenario at under 75MBs. The improvements in memory locality also almost doubled the performance of the system.
Here's a comparison normal breadth-first search and bi-directional breadth first search:
There's a huge difference in the number of tiles explored by the two (the red area). The normal search explores almost every single reachable tile in the world, while the bidirectional one generally avoids exploring half the world in worst case scenarios. This reduces peak memory usage by quite a bit.