Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (697)
Games in Android Showcase (202)
games submitted by our members
Games in WIP (767)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
   Home   Help   Search   Login   Register   
  Show Posts
Pages: [1] 2 3 ... 118
1  Game Development / Newbie & Debugging Questions / Re: Low poly vs. highres normal maps on: 2016-10-25 05:43:34
Texture size has pretty much no impact at all on performance; it really just uses more memory. The actual bandwidth usage doesn't really go up with texture size. This is assuming you're using mipmaps and coherent sampling patterns though, which is essentially what always happens when you do normal mapping or any kind of texturing of triangles. There is usually no real reason to not use a normal map if you have one, since they're so cheap and can improve the lighting a lot.

The real trade-off is between mesh complexity and performance. More vertices = more work for the GPU = lower FPS. No matter if you you're using a normal map or not, you usually want more triangles anyway to get better silhouettes. In addition, tessellation doesn't replace normal mapping. Calculating new vertex normals for the generated vertices doesn't improve things much, so simply sampling a normal for the heightmap used will result in the by far best result with tessellation.

Finally, there really are cases where a normal map is actually better than real geometry. If you have tiny seams, edges, etc that you want to have strong specular reflections, not even high amounts of MSAA can fully anti-alias that if you're rendering with HDR. In this case, a normal map can do that job better and faster, as you can filter the normal map in realtime to anti-alias them much more effectively, all while using fewer vertices. You can even prefilter the normal map to anti-alias it ahead of time as well, for no realtime cost.
2  Java Game APIs & Engines / Engines, Libraries and Tools / Re: GdxState is a state library for use in libgdx made by me. take a look on: 2016-10-25 04:45:11
Looking good. Congrats on your first GitHub project. Pointing

To make your repository look a bit more professional, you should move your test/demo code to a separate package so that the core classes are clearly contained. I'm also still a bit confused about what your project does. I get that it's about managing game states, but what people want to know is what kind of problems it solves; basically why they should use it. Adding that as a kind of introduction/summary would help a lot. It's just hard right now to get a good overview of what exact purpose the library fills.

Sorry if I'm sounding negative; I just have a strong tendency to focus on things that can be improved. Good if you want to solve problems, bad in social situations.
3  Java Game APIs & Engines / Engines, Libraries and Tools / Re: GdxState is a state library for use in libgdx made by me. take a look on: 2016-10-24 22:52:32
And maybe actually tell us what the library does and doesn't. xd
4  Game Development / Newbie & Debugging Questions / Re: Low poly vs. highres normal maps on: 2016-10-24 18:56:46
Tessellation can actually be faster and have less popping. It's faster because you can draw the exact same mesh/LOD level as many times as you want and have the GPU dynamically tessellate it, leading to less CPU overhead and the perfect number of triangles for a given view distance of the mesh (optimal quality/performance trade-off on the GPU). In addition, skinning is done only for the base and interpolated for the vertices added by tessellation, so it can be cheaper on the GPU in that sense. It has quite a few advantages in other words.

That being said, it's probably completely overkill for OP. I was simply mentioning that there's a reason why tessellation exists, because it solves a problem that normal mapping doesn't solve.
5  Game Development / Newbie & Debugging Questions / Re: Low poly vs. highres normal maps on: 2016-10-24 18:39:54
A normal map is not a full replacement for a high-poly mesh. You could use a cube as a low-poly "sphere". It'd still look like an obvious cube, but the lighting would be computed as if it was a sphere. No matter how high-res your normal map is, it will not be able to compensate for the shortcomings of the geometry.

Normal maps are still awesome for filling in detailed stuff that would both be too expensive to keep in the model and look bad due to containing details smaller than a pixel, requiring a lot of anti-aliasing/multisampling. They just don't help with improving the silhouette of of an object. Tessellation on the other hand could be used to turn that cube into a sphere, and normal mapping could then be used to fix up the normal.
6  Java Game APIs & Engines / Engines, Libraries and Tools / Re: JOML 1.8.0 Release on: 2016-10-22 18:47:44
I had a rather unique situation where I was using a "circular heightmap", a 1D heightmap that wraps around a circle with a certain resolution. I needed to find out what height index was closest to a certain point (= round(heightmapResolution * atan2(dy, dx) / (2*PI) + 0.5)), one of the rare cases where I actually had to work with angles instead of just vectors. This was being used for collision detection to figure out which part of the heightmap to check the object against, so performance was critical. However, any atan2() approximation I could find had too low precision, and in the case of my collision detection precision was so critical that the approximations caused missed collisions.

My point is that for any actual valid use of atan2() that you can't use vectors for instead, you need very good precision. That being said, if we could find such an approximation I would make very good use of it.
7  Java Game APIs & Engines / Engines, Libraries and Tools / Re: JOML 1.8.0 Release on: 2016-10-19 17:59:54
Tested Roquen's version for error:


Precision is mostly unaffected. I'll leave the benchmarking to KaiHH. =P
8  Game Development / Newbie & Debugging Questions / Re: Artemis-odb, how to initialize components? on: 2016-10-15 00:35:23
Hmm, that looks a bit cleaner I guess. I just wanted that extra performance from the Archetypes... I got a response on their GitHub. Apparently you can get the injected ComponentMappers from World, and those are usable.
9  Java Game APIs & Engines / Engines, Libraries and Tools / Re: JOML 1.8.0 Release on: 2016-10-14 23:58:28
Err.... Isn't that just a simple Taylor expansion of sin(x) around x=0? You can calculate almost optimal coefficients for that very easily. I wrote a tiny program that calculates them, then refines them by testing extremely tiny random variations of to try to refine the final results, compared to Math.sin(). Took me a solid 20 min to code up:

Current error: 9.706913049840798E-14

Error was the total squared error of 100 evenly distributed sin() samples between 0 and PI/2.

private static double sinApprox(float[] constants, float x){
   float x2 = x * x;
   float tx = x * x2;
   float sin = x * constants[0];
   for(int i = 1; i < constants.length; i++){
      sin += tx * constants[i];
      tx *= x2;
   return sin;
10  Java Game APIs & Engines / OpenGL Development / Re: Issues with batching meshes on: 2016-10-14 22:07:53
Literally just dump all your meshes sequentially in a single VBO. Keep track of the vertex and index offset of each mesh in the buffer, and use those to draw the right one with the *BaseVertex() draw calls. The base vertex allows you add an offset to every single vertex read. This is nice, because it avoids you having to use 32-bit indices if you have more than 65536 vertices. So you can pack 100 meshes with 60000 vertices each and you can still use 16-bit indices.
11  Java Game APIs & Engines / OpenGL Development / Re: Issues with batching meshes on: 2016-10-14 16:28:44
I'm assuming, for instance that "attribute vec3 vertices" is only capable of accepting one buffer of vertices that is passed in via glVertexAttribPointer() but how can many different objects be passed in with only one draw call?
There are major advantages to placing all your geometry (at least the one that uses the same shader) into a single VBO, just packed sequentially into the buffer. This way, you can bind a single VAO, using index buffer offsets and BaseVertex-versions of the draw calls to read the right mesh inside the VBO. This still requires one draw call per mesh, but the cost of a draw call is proportional to how much state you change inbetween each call, as the OpenGL driver has to do extensive validation, state setup, look up premade resources for certain state combinations, etc.
12  Java Game APIs & Engines / OpenGL Development / Re: Issues with batching meshes on: 2016-10-14 14:05:09
Batching has two purposes:

1. Improving performance.
Batching is faster than doing one drawcall per cube. Draw calls are expensive, so packing your vertex data together and drawing it all in one draw call will be faster.

2. As an investment.
In the case of cube worlds, batching also works as an investment. Looping through the volume of an entire cube world is slow as hell. A 1000x1000x1000 world is already 1 billion blocks to check. This is a huge waste to each frame, because in the end only a tiny mesh will be generated for this world, as 99.99% of the world will be either solid ground or air. Hence, we precompute the mesh once and store the mesh, giving us the flexibility of cube worlds with the performance of a mesh.

The problem occurs when you want to change a single cube in the world. It's too slow and expensive to try to modify the mesh based on the cube added/removed, so the only real choice is to regenerate the mesh from scratch. This will cause a massive spike even when just a single cube is changed. The solution is to split up the world into chunks, so that you only need to regenerate the chunks affected by the changed cube instead of the entire world. In theory, this introduces a trade-off, as we now have more than 1 draw call, but even 10 or 100 draw calls isn't really that significant. The main point is to avoid having to regenerate the mesh each frame, which is still the case.

You get further problems if you want to update the terrain every frame. The answer to this is: don't. If you want to animate the terrain textures, you can either update the texture to animate all cubes identically, or do the animation in the shader based on an ID (this way all cubes can be animated individually).

Instancing is NOT a good choice for a cube world. You want to only draw the faces that are between a solid block and an air/transparent block. In 99% of all cases, you won't even be drawing an entire block. Hence, only being able to control visibility with the granularity of an entire cube with all faces visible is not good enough. For a simple case of a flat floor, this will draw 6x as many triangles as a face-based mesh. In addition, instancing is not as effective for small meshes. A cube is only 24 vertices. You should try to have have batches of at least 100 vertices or you'll get reduced GPU performance drawing all the tiny meshes. Basically, on the CPU side instancing is much faster as it's just one draw call, but on the GPU side you're still drawing a crapload of small meshes, which GPUs aren't good at drawing.

TL;DR: Use chunking, and if you need to animate the texture of a cube either update the texture or use a shader to animate it.
13  Game Development / Newbie & Debugging Questions / Artemis-odb, how to initialize components? on: 2016-10-14 12:56:32
Hey, everyone.

I recently tried out Artemis a bit, and boy has it changed since I last used it.

Basically, I'm having trouble initializing a component after creating an entity. I'm doing this:

Archetype archetype = new ArchetypeBuilder().add(Position.class).build(world);
int entity = world.create(archetype);

At this point, I want to initialize the Position component to the initial position of the entity. However, this seems to be impossible to do cleanly.

 - ComponentMapper doesn't work since I'm not inside a System.
 - If I use an Entity object, I can use entity.getComponent(Position.class), but that function delegates to a protected function in ComponentManager which I can't call directly with an int entity.

Literally the only way I've managed to get the Position component is to use this monstrosity:
Position p = (Position) world.getComponentManager().getComponentsFor(entity, new Bag<>()).get(0);

Am I approaching this wrong? Am I supposed to do this in some other way? Am I missing something obvious?
14  Game Development / Newbie & Debugging Questions / Re: [LWJGL] Best way to handle resizing textures on: 2016-10-05 07:50:32
1. Irrelevant, nothing is draw.
2. Irrelevant, nothing is drawn.
3. Irrelevant, the point is to improve relative performance.
15  Game Development / Newbie & Debugging Questions / Re: LibGDX Depth Testing is horrendously slow on: 2016-10-05 07:49:01
@Hydroque: Completely irrelevant to the thread. The problem here is depth TESTING.
16  Game Development / Articles & tutorials / Re: GPU blur performance analysis on: 2016-10-04 17:29:51
Sure, although there are quite a few dependencies on my own utility classes (Texture2D, ShaderProgram, etc), but they're not too difficult to replace I guess.

Java code:
17  Java Game APIs & Engines / OpenGL Development / Re: Gaussian Blur Blobs? on: 2016-10-04 12:31:24
It's an issue with the alpha I bet. Don't blur the alpha.

Also, you can rewrite your clamping code to
brightColor = max(outColour - 1.0, 0.0);
18  Java Game APIs & Engines / OpenGL Development / Re: Gaussian Blur Blobs? on: 2016-10-04 10:51:12
Are you accidentally writing negative values after thresholding? A float texture can hold negative values, you know. This could mess up your tone mapping later.

Side note: You don't need 32-bit precision, use GL_RGB16F instead.
19  Java Game APIs & Engines / OpenGL Development / Re: Gaussian Blur Blobs? on: 2016-10-04 10:28:04
Can you show me how the input to the blur looks like? I assume you do some kind of thresholding to extract the bright parts. Also, what kind of texture formats are you using?
20  Java Game APIs & Engines / OpenGL Development / Re: Gaussian Blur Blobs? on: 2016-10-04 10:25:36
Is your input texture simply messed up?
21  Java Game APIs & Engines / OpenGL Development / Re: Gaussian Blur Blobs? on: 2016-10-04 09:59:34
Are you possibly trying to read and write to the same texture?
22  Game Development / Articles & tutorials / GPU blur performance analysis on: 2016-10-04 08:23:16
Blurring is a very useful effect in games, often used in numerous postprocessing effects, but can also be very performance heavy. Hence, optimizing it as much as possible is often important to allow for big blur kernels.

In this article I have tested four different blur algorithms in an attempt to help people pick the best blur algorithm for their particular use case. For this test, I have used a simple box blur for simplicity, but all of these techniques can be used for Gaussian blurs as well and all yield the exact same results (bar floating point/render target rounding errors). The four techniques tested are:

  • Naive blur: The naive NxN blur shader naively does an NxN kernel blur. It does exactly N^2 texture samples, but is the most straightforward way of doing a blur. It only requires a single pass on the data.
  • Separable blur: The separable NxN blur shader splits up the NxN kernel into two N blur passes. It does exactly N*2 texture samples, each of the two passes doing N texture samples.
  • Naive linear blur: The naive linear NxN blur shader works very similarly to the naive NxN blur shader, but uses hardware linear filtering to read up to 4 values per texture sample. It therefore requires less texture samples than the naive NxN blur shader, (N+1)/2^2 texture samples and only requires a single pass on the data.
  • Separable linear blur: The separable NxN blur shader splits up the NxN kernel into two N blur passes, using linear texture filtering to read up to two values per texture sample. It does exactly N+1 texture samples, each of the two passes doing (N+1)/2 texture samples.

There are 3 main points to take into consideration when choosing which blur algorithm to use:
  • Kernel size: As the kernel size increases, the naive blurs become less viable as their their bigger dependence on the size of the kernel makes them slower on the better scaling separable blurs.
  • Render target bit-depth:: The bit-depth of the render target affects the performance of the blur, mostly due to a higher write cost. This heavily affects the overhead of doing two passes for the separable blurs, while the texture sample cost isn't as heavily impacted thanks to the texture cache.
  • Bilinear hardware acceleration: Some render targets (read: 32-bit float texture formats) are not able to do full-speed linear filtering, so the algorithms that rely on linear filtering may perform worse for those render targets.

Since the performance depends on the render target format/bit-depth, I have run separate benchmarks for three different bit depths. The benchmarks also include a baseline, which is the cost of simply copying the render target once. This can be thought of as the minimum overhead of an additional pass.

32-bit render targets (GL_R11F_G11F_B10F / GL_RGB8 / GL_SRGB8)

A relatively small 32-bit render targets with full speed bilinear filtering means that the overhead of the extra pass of the separable blurs is small, so the separable linear blur dominates. The only exception is for 3x3 blurs, in which case the naive linear blur shader is much faster. This is because they both read the same number of total samples, four, but the separable blurs requires two passes which gives them too much overhead compared to the cost of the texture samples.

64-bit render targets (GL_RGB16F)

For the bigger 64-bit render targets, the overhead of the additional pass for the separable blurs becomes greater, favoring the naive blurs more for smaller blur kernels. In this case, the naive linear blur wins against the separable blur for both 3x3 and 5x5 blur kernels. Linear filtering is still done at full speed on all major GPUs.

128-bit render targets (GL_RGB32F)

If you for some crazy reason find yourself in a position where you need to blur a 32-bit floating point format, here you go. For 32-bit floating point render targets, bilinear filtering is done at half speed. The naive linear blur shader can still extract some performance despite that as it reads up to four values per sample (although it's still slower than the others), but the separable linear blur shader suffers since it only reads up to two values per sample, leaving it slower than the separable blur shader. In the end, the naive blur shader wins for 3x3 kernels, while the separable blur shader wins at 5x5 and above. You're probably much better of doing some fancy compute shader blur if you really need a 32-bit float blur.

Result matrix

32-bit formatNaive linearSeparable linearSeparable linear
64-bit formatNaive linearNaive linearSeparable linear
128-bit formatNaiveSeparableSeparable
23  Game Development / Newbie & Debugging Questions / Re: LibGDX Depth Testing is horrendously slow on: 2016-10-02 04:57:38
I'm a bit interested in a follow-up on this, if you have time sometime. =P
24  Game Development / Newbie & Debugging Questions / Re: [LWJGL] Best way to handle resizing textures on: 2016-10-01 06:37:06
Generating mipmaps live is way too costly, especially if you want to compress them as well. All major engines, especially the ones that do texture streaming, store all mipmaps precomputed in file(s). Therefore, it's relatively trivial to choose which mipmap levels to load at all when loading them in.

If you're using LWJGL3, look into using the STBimage. It's much faster, threadsafe (so you can load multiple images in parallel, which ImageIO can't), uses less memory that is explicitly allocated by you (so no garbage collection whatsoever), handles sRGB correctly, etc.
25  Game Development / Newbie & Debugging Questions / Re: LibGDX Depth Testing is horrendously slow on: 2016-09-29 02:03:44
Thanks for the in-depth reply!

I'm drawing the same conclusion as you. You're most likely hitting a slow path on Intel for some reason. Some possibilities to explore:
 - gl_FragDepth disables hierarchical depth buffers, compression, etc making it slow.
 - gl_FragDepth may just be inherently slow in hardware on Intel cards.
 - Clearing the depth buffer to 0.0 and using GL_GREATER depth testing may be slow and/or disable hardware optimizations.
26  Game Development / Newbie & Debugging Questions / Re: LibGDX Depth Testing is horrendously slow on: 2016-09-28 05:20:08
- The game runs slower the more we depth-test
= "the more overdraw we add"? That would of course be expected, as more pixels = more work, especially for a shitty integrated laptop GPU.

That being said, depth testing with 3D geometry usually improves performance as the depth test can run before the fragment shader, allowing the GPU to avoid running the fragment shader for occluded pixels. This depends on two things to work:

 - If the shader writes to gl_FragDepth, the shader implicitly has to run before the depth test as it determines the value used in the comparison. This has VERY significant performance implications.
 - If discard; is used, the early depth test's functionality is severely limited as it cannot update the value in the depth buffer until the shader has executed or it would write depth for discarded pixels. This again can have performance implications, but usually not as severe as an early depth test can still be run against previously drawn geometry, potentially avoiding shader execution, but this has to be much more conservative.

You should never write to gl_FragDepth if you can avoid it since it disables so many important optimizations. If your geometry is flat, then simply outputting the depth to the vertices will give you the same result but allow all the optimizations to work as expected. If you however for some reason need non-linear per-pixel depth, there are still things you can do to improve performance. If you are able to calculate the minimum depth (the depth value closest to the camera), you're able to output that as a conservative depth value in the vertex shader. You can then in the fragment shader specify how exactly you will modify the depth value of gl_FragDepth to allow the GPU to run a conservative depth test against the hardware computed depth (the one you outputted from the vertex shader). You always want to modify the depth in the OPPOSITE way that you're testing it. Example:

 - You use GL_LESS for depth testing and the depth is cleared to 1.0.
 - You output the MINIMUM depth that the polygon can possibly have from the vertex shader.
 - In the fragment shader, you specify that the depth value will always be GREATER than the hardware computed value using
layout (depth_greater) out float gl_FragDepth;

This will allow your GPU to run a conservative depth test using the hardware computed depth value, at least giving the GPU a chance (similar to when discard; is used) of culling things before running the fragment shader. This feature requires hardware compatibility though, but GL_ARB_conservative_depth is available on all OGL3 GPUs as an extension, even Intel, plus OGL2 Nvidia GPUs. Additionally, this can be queried in the GLSL shader and be enabled if available, and won't cause any damage if it isn't available at least (at least if you skip computing the minimum depth in vertex shader as well).

Clearing the screen to 0.0 would cause nothing to ever pass the depth test if you use standard GL_LESS depth testing. I'd strongly suggest using GL_LESS and clearing to 1.0 instead as that is the standard way of using a depth buffer, which in some cases could be faster in hardware.

If you could specify some more information about your use case, I could give you better advice and more information.
27  Game Development / Newbie & Debugging Questions / Re: Multiple shaders or if statements? on: 2016-09-28 02:28:13
I don't have any performance problems at the moment but I'm thinking about the problem with further development. As example if I implement bump mapping and material maps (specular maps, gloss maps, etc), it will multiply the amount of shaders for each combination. Thinking even further, when I get at the stage where I want to implement my idea of skeletal animations, it would lead to even more shaders. Being at that stage, it would be a lot of work to add more light types or making other improvements because I would have to change tons of shaders.
I'm using deferred shading, so for me the bottleneck is almost always the write to the massive G-buffer (4 render targets!). Due to this, I can get away with always doing normal mapping and reading all 4 optional texture maps I support. The texture units and ALU cores are simply idle otherwise waiting for the ROP writes, so doing some extra reads from empty 4x4 textures won't affect performance at all. Doing small too small draw calls in general is also very bad for GPU performance, small being something like under ~100 triangles or so.

Normal mapping doesn't actually have that much of an overhead, so you should probably be able to get away with just using an pass-through 4x4 texture and always using it.
28  Game Development / Newbie & Debugging Questions / Re: Multiple shaders or if statements? on: 2016-09-26 18:57:14
It all depends on what you're doing .

First of all, the choice between many shaders and a few uber-shaders depends on where your bottleneck is. Each shader bind has a fairly big CPU cost, so if you have 1000 shader switches per frame this could easily be your bottleneck. In this case, switching to an uber-shader will improve performance as it reduces the CPU load by a lot for some GPU overhead instead. If your GPU is already the bottleneck, then increasing the GPU cost to eliminate a couple of shader switches will just pile on more stuff on the GPU, reducing performance.

Some shader tips:
 - An if-statement is not inherently slow in shaders. It all depends on divergence and the size of the if/else blocks. If all shader invocations in a group (= all vertex shader invocations in a certain batch or all fragment shader invocations in a certain pixel area) take the same path in the if-statement (either all true or all false), then the if-statement will be cheap and only one of the two paths will be executed. If the shader invocations diverge, both sides will have to be executed for all invocations as the group runs in lockstep. In the end, this just doesn't matter if both the if and/or else blocks don't contain a lot of code.

 - Simple conditional assignments, like
x = condition ? a : b;
generally compile to conditional assignment instructions that don't require any branching at all. You can use this to your advantage.
29  Java Game APIs & Engines / Java Sound & OpenAL / Re: Should sound be on its own thread? on: 2016-09-26 17:41:28
In addition to the points that people have brought up here, you should never place disk access in your game loop. Disk access is both slow and unpredictable. For normal hard drives, the access time is in the order of 10ms. In addition, they're a shared with all the other processes in your computer, so if your anti-virus software decides that there's no better time than right now to start a full virus scan, a 0.25 second sound effect can now take 1 second to load.
30  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-26 03:57:56
Accidentally threw together this scene while messing with a new terrain system, so I decided to pose the guy up a bit and take a screenshot.

DoF is still broken due to massive a register usage on Nvidia, causing it to be extremely slow. It also isn't compatible with like 95% of my other post processing at the moment, causing ghosting and artifacts in motion. For still shots, it looks good though as you can see.
Pages: [1] 2 3 ... 118
theagentd (19 views)
2016-10-24 17:51:53

theagentd (18 views)
2016-10-24 17:50:08

theagentd (20 views)
2016-10-24 17:43:15

CommanderKeith (38 views)
2016-10-22 15:22:05

Roquen (41 views)
2016-10-22 01:57:43

Roquen (54 views)
2016-10-17 12:09:13

Roquen (55 views)
2016-10-17 12:07:20

Icecore (69 views)
2016-10-15 19:51:22

theagentd (331 views)
2016-10-04 17:29:37

theagentd (335 views)
2016-10-04 17:27:27
List of Learning Resources
by elect
2016-09-09 09:47:55

List of Learning Resources
by elect
2016-09-08 09:47:20

List of Learning Resources
by elect
2016-09-08 09:46:51

List of Learning Resources
by elect
2016-09-08 09:46:27

List of Learning Resources
by elect
2016-09-08 09:45:41

List of Learning Resources
by elect
2016-09-08 08:39:20

List of Learning Resources
by elect
2016-09-08 08:38:19

Rendering resources
by Roquen
2016-08-08 05:55:21 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!