Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (798)
Games in Android Showcase (234)
games submitted by our members
Games in WIP (865)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 ... 74 75 [76] 77 78 ... 214
  ignore  |  Print  
  What I did today  (Read 3430938 times)
0 Members and 2 Guests are viewing this topic.
Offline theagentd
« Reply #2250 - Posted 2015-06-02 19:50:44 »

I will double check my results tomorrow and compare it more extensively.
Of course I just had to go and get a small idea while trying to sleep, and surprisingly it had a huge impact.

In the full resolve shader, for each color sample (5 from the current frame, 5 from the previous frame = 10 total color samples) I run the following code to sum up the color of all SRAA samples:
1  
2  
3  
4  
5  
6  
//float weight = reprojection weight based on motion vector length = 1.0 for the current frame and 0.0-1.0 for the previous frame

   for(int i = 0; i < SRAA_SAMPLES; i++){
      float w = weight * float(ids[i] == color.a);
      sraaSamples[i] += vec4(color.rgb, 1) * w;
   }

The idea here is to avoid the branch by casting the boolean ID matching result to a float, which converts true to 1.0 and false to 0.0, exactly what I want to multiply the weight by. When looking at the assembly code on both AMD and Nvidia hardware, it was a mess. On AMD, the code was riddled with what seemed like unnecessary MOVs that simply moved stuff around for no reason, and it also used ~25 registers. On Nvidia, it used a massive 36 vec4 registers while optimally it should be able to get by with less than 10. The compiler clearly didn't do a good job there. Both the Nvidia and AMD code looked massively reorganized. The compilers had clearly reordered the instructions a lot, which bumped up the register requirements and introduced the MOVs it seemed.

Register count is a funny quirk about shaders. Shader cores rely on the ability to hide latency to stay busy and get optimal throughput. Basically when a shader core hits a texture read that'll take some time to finish, the entire work group immediately switches to something else so that it doesn't have to sit idle. For that reason, each shader core has (compared to a CPU) an unusually large number of registers so that it can fit many invocations of shaders at once and work on whichever isn't blocked. If your shader uses a lot of them, it limits the number of invocations it can have loaded in at the same time, which can reduce performance in very unpredictable ways. Even worse, the register count is almost impossible to predict as it depends completely on the compiler. Anyway, since the main bottleneck of the full resolve shader seemed to be to be ALU performance (possibly amplified by the high register count) and that loop was pretty much the only ALU operations in the entire shader, it seemed to be worth trying some different things in an attempt to speed it up.

Out of curiosity, tried to simply rewrite the code like this:
1  
2  
3  
   for(int i = 0; i < SRAA_SAMPLES; i++){
      sraaSamples[i] += vec4(color.rgb, 1) * weight * float(ids[i] == color.a);
   }

In raw instruction count, this is actually slower. The original code multiplied together two floats (weight*float(...)), then multiplied this new float by a vec3 (color.rgb*w) for a total of 4 instructions. This new one-liner is actually a lot more instructions. First we do color.rgb*weight which is 3 instructions. 1*weight is simply optimized away. After that, we do <temp vec4>*float(...), which is another 4 instructions, for a total of 7. The funny part is that according to the AMD Shader Analyzer, this slower code should perform 50% faster! The assembly looks completely different, but in the end it didn't matter. Tests on both Nvidia and AMD hardware showed that performance was pretty much identical. The Shader Analyzer is most likely just using an older quirkier GLSL compiler than my live hardware.

I decided to try getting rid of my clever boolean-->float cast optimization and use if() statements instead, just to see what would happen. The idea was that if the driver is smart enough, it'll do the same thing that I did but maybe better optimized as it's free to make more liberal optimizations. Plugging the code into ShaderAnalyzer showed a grim future. Throughput was predicted to drop to half the original value, but at least the assembly resembled the original source code more. On Nvidia, the assembly even had the exact same structure as the source code which was a bit cool at least. I ran the code, expecting the shader to slow down to a crawl as I was doing 80 branches per pixel...
1  
2  
3  
4  
5  
   for(int i = 0; i < SRAA_SAMPLES; i++){
      if(ids[i] == color.a){
         sraaSamples[i] += vec4(color.rgb, 1) * weight;
      }
   }

BAM! 34-50% better performance on Nvidia and ~10% better performance on AMD! Why is it faster? Seems to be register count. The register count on Nvidia at 8xAA dropped from 37 to 19, and I assume something similar happened to AMD.

Here's the performance summary. Note that the GTX 770 is a much faster card, so the difference between AMD and Nvidia isn't relevant.
Quote
Nvidia GTX 770 @ 4x AA
  float w: 0.922
  weight*float(): 1.024
  if(): 0.688

Nvidia GTX 770 @ 8x AA
  float w: 1.693
  weight*float(): 1.627
  if(): 1.126


AMD HD7790 @ 4x AA
  float w: 1.936
  weight*float(): 1.928
  if(): 1.734


AMD HD7790 @ 8x AA
  float w: 3.050
  weight*float(): 3.010
  if(): 2.784

The if-statement seems to be faster in every case EXCEPT 16xAA on Nvidia hardware. In that specific case, the if-statements just drop.
Quote
if()
  2x  133 FPS
  4x  127 FPS
  8x  112 FPS
  16x 49 FPS

float()
  2x  132 FPS
  4x  123 FPS
  8x  105 FPS
  16x 83 FPS

What does all this mean? Well, before the new optimizations the stencil mask always had a positive or no impact. With these optimizations, 2xAA is actually a tiny tiny bit slower. 8xAA still gets a small boost from the stencil mask, and in special cases like when you're looking up into the sky the stencil mask obviously works wonders. For now, I think I will stick to having it disabled though.


Fun fact: AMD doesn't support 16xMSAA so I can't test 16x on AMD. Nvidia actually doesn't support 16x MSAA either, they just give you 2x2 OGSSAA + 4xMSAA. It's even possible to force 32xAA through the Nvidia control panel which is 2x2 OGSSAA + 8xMSAA. If you're really crazy, you can get 64x SSAA by turning on SGSSAA and running 2x2 OGSSAA + 8xSGSSAA + 2x SLI supersampling. I'm not sure if it's supported, but you could theoretically get 128x supersampling with 4 GPUs.


EDIT:
Bonus chart! This is the result of the cumulative optimizations that I've talked about in my last 3 posts:

Myomyomyo.
Offline pitbuller
« Reply #2251 - Posted 2015-06-02 21:08:48 »

Theagentd could you post more of that shader code. If there is way to avoid that temporal array it should be lot easier for register pressure.
Offline theagentd
« Reply #2252 - Posted 2015-06-02 21:24:18 »

Theagentd could you post more of that shader code. If there is way to avoid that temporal array it should be lot easier for register pressure.
Temporal array? You mean the vec4 sraaSamples[SRAA_SAMPLES] array?

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline pitbuller
« Reply #2253 - Posted 2015-06-02 21:43:33 »

Temporal array? You mean the vec4 sraaSamples[SRAA_SAMPLES] array?
Yes. I meant just that.
Offline theagentd
« Reply #2254 - Posted 2015-06-03 00:05:45 »

Temporal array? You mean the vec4 sraaSamples[SRAA_SAMPLES] array?
Yes. I meant just that.

The code is a bit messy right now, but the gist of it is basically that it's a nested loop. On one hand, you have 10 color samples with different weights, and on the other hand you have N SRAA samples (2, 4 or 8 ), so we basically have 10*n iterations in total. This can be implemented in two ways. Either the SRAA samples are the inner loop (what I have now) or the color samples are the inner loop (what I originally had). The data for the inner loop needs to be sampled before the nested loop runs since the inner loop is run multiple times, and we can't afford doing 10+10*n samples instead of 10+n.

If the SRAA samples are the inner loop, it means we first read the N SRAA IDs (just 1 float) at the start. We also need to allocate a color and a total weight to divide by at the end, so in the end we need (4+1)*n values stored. For 2, 4 and 8 samples, that's 10, 20 and 40 values stored. So, for each color sample (the outer loop), we add the color multiplied by its weight to all SRAA samples that it matches (the inner loop that I posted in my previous post).

If the color samples are the inner loop, we need all 10 color samples in memory (1 vec4 each for packed RGB+ID), meaning we need a constant 40 values stored. For each SRAA sample (the outer loop), we loop over all the color samples (the inner loop) and check which color samples that match and sum them up multiplied by their relevant weights.

To visualize all this with a simple code example...
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
//Version 1:

for(int i = 0; i < n; i++){
    vec4 outerData = texture(..., i); //n samples

    for(int j = 0; j < m; j++){
        vec4 innerData = texture(..., j); //n*m samples, BAD!

        //compare data...
    }
}
//Total of n + n*m samples


//Version 2

//Cache inner loop data
vec4 innerLoopData[n];
for(int j = 0; j < m; j++){
    innerLoopData[j] = texture(..., j); //m samples
}

for(int i = 0; i < n; i++){
    vec4 outerData = texture(..., i); //n samples

    for(int j = 0; j < m; j++){
        vec4 innerData = innerLoopData[j];

        //compare data...
    }
}
//Total of n + m samples


As you can see, if you want to do only n+m samples, you need to cache the data of the inner loop. Choosing the N SRAA samples as the inner loop uses less memory as N*5 <= 40 (N = 2, 4, 8 ). Like I said, I used to do it the other way around before, but since I gained ~2-3x better performance by doing it the current way I obviously decided to stick with it.

Myomyomyo.
Offline ClaasJG

JGO Coder


Medals: 43



« Reply #2255 - Posted 2015-06-03 08:01:01 »

@theagentd
I am one of those guys that likes backends so I don't really care about opengl and shaders and stuff, but I am a fan of your explanations.
I enjoy reading your posts and the way you share the thoughts you had about the things you show.

-ClaasJG

My english has to be tweaked. Please show me my mistakes.
Offline theagentd
« Reply #2256 - Posted 2015-06-03 13:46:08 »

@theagentd
I am one of those guys that likes backends so I don't really care about opengl and shaders and stuff, but I am a fan of your explanations.
I enjoy reading your posts and the way you share the thoughts you had about the things you show.

-ClaasJG
Thanks, that's nice to hear. =D

Myomyomyo.
Offline ryukujinishi

Senior Devvie


Medals: 12
Projects: 2



« Reply #2257 - Posted 2015-06-03 18:58:17 »

Been trying to improve my pixel art, today's attempt was a side-view of myself. Turned out okay, still much room for improvement though.

Offline wessles
« Reply #2258 - Posted 2015-06-04 00:17:05 »

I finally gave in to popular demand, and ported RFLEX to android using libGDX. Currently the game / menus are done, all that's left to port to GDX is the editor (which is pretty much done) and the level select (haven't even started  Undecided).

Reactions among friends have ranged from "wow, that's cool!" to "where do I get it?". I am very happy with it.

Video of standard gameplay.
GIF of desktop gameplay.

I also got two new musicians in addition to Ashedragon:
  • ForeverBound (soundcloud) - made the song "Stereo Madness" for Geometry Dash.
  • Adhenoid (soundcloud) - does lots of different types of music.

Things are going quite well Cheesy.

-wes
Offline Opiop
« Reply #2259 - Posted 2015-06-04 01:41:04 »

Wessles, that CRT effect is amazing. Good work man!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #2260 - Posted 2015-06-04 03:40:18 »

Improved my motion blur quite a bit. It turned out that a fast blur path of the shader sometimes kicked in at unexpected times due to the threshold for it being too low, so now it only kicks in if the fast path would yield very close to the same result. That made it a bit slower in some cases, but in most cases it has no significant impact on performance

In addition, I found a way of renormalizing the weights based on the maximum motion vector spread of the pixels processed. This helped in cases where a single fast moving pixel could cause an entire 3x3 area of tiles to get a very high dominant motion vector, effectively corrupting the motion blur of slow-moving objects around it and showing an ugly reduction in blurriness for nearby pixels. This was most visible when the camera was moving and a foreground object was moving fast on the screen, causing the slower moving background around it to get a reduced amount of motion blur. Although my fix isn't perfect, it's a LOT better than nothing, and together with the fast path fixes it really improved the quality, stability and continuity of the motion blur. I really like the result.

In the following picture, the old and new technique is compared. The problem here is that the foreground objects are moving much faster than the distant tall wall, and the tiles around the foreground object don't compute the motion blur for the background walls correctly. The left side is the how the blur looks with my renormalization. The green circles highlight areas that it improved. The right side uses the old version and the red circles show the artifacts that are fixed by the renormalization.


Tomorrow I will look into just one more thing. Since my anti-aliasing is applied before the motion blur (the guy in the middle that isn't moving looks great), the motion blur doesn't get anti-aliased. If you look closely at the motion blur at the top of the big white thing to the left, the top edge looks quite aliased. I think I can with a clever AA-system offset the blur a tiny bit to get some anti-aliasing on the motion blur edges as well.

Myomyomyo.
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #2261 - Posted 2015-06-04 03:45:22 »

@theagentd Really looking forward to seeing it in all it's glory, especially with knowing bits of the technical narrative behind it.
Thanks for these great WIDT posts.
Offline theagentd
« Reply #2262 - Posted 2015-06-04 03:50:57 »

@theagentd Really looking forward to seeing it in all it's glory, especially with knowing bits of the technical narrative behind it.
Thanks for these great WIDT posts.
Glad you like them. It's quite giving for me to write down my progress as well like this, as it forces me to think through and justify my decisions. It helps me look at it from a different perspective. Hopefully it's also useful/interesting to others. >___<

If you're interested in the motion blur I have, it's based first on this paper: http://graphics.cs.williams.edu/papers/MotionBlurI3D12/, with the improvements described in these slides: http://advances.realtimerendering.com/s2014/#_NEXT_GENERATION_POST For Advanced Warfare, they did a lot of cool motion blur enhancements that helped make the motion blur better and for reconstructing the background. I'm looking into depth of field from that slide as well for cutscenes, we'll see how that goes.

Myomyomyo.
Offline PocketCrafter7

Senior Devvie


Medals: 6
Projects: 2
Exp: 2 years


One man's bug is another man's feature


« Reply #2263 - Posted 2015-06-04 18:07:47 »

Just creating small prototypes to learn new things  Grin

Click to Play


Click to Play



Nothing is difficult in this world. It is just how you look at it.
Offline Slyth2727
« Reply #2264 - Posted 2015-06-04 18:40:25 »

Got a 17 second 3x3 solve. Finally sub 20...
Offline Opiop
« Reply #2265 - Posted 2015-06-04 18:46:41 »

All these great projects are making me really want to make a game. Now what should I make...
Offline theagentd
« Reply #2266 - Posted 2015-06-04 19:50:21 »

Did some motion blur anti-aliasing experiments, but nothing worked out... Trying to use the previous frame failed miserably due to horrible ghosting, and the subpixel nudging idea didn't help at all. I've run out of ideas for now, but at least the current problems are hard to spot in motion, which luckily is when the motion blur obviously kicks in. Still, once you've seen the artifacts you can't unsee them... >___<

Myomyomyo.
Offline thedanisaur

JGO Knight


Medals: 59



« Reply #2267 - Posted 2015-06-04 20:11:51 »

@theagentd sorry, haven't followed the whole discussion, but why can't you do the motion blurring before AA?

Every village needs an idiot Cool
Offline ags1

JGO Kernel


Medals: 367
Projects: 7


Make code not war!


« Reply #2268 - Posted 2015-06-04 20:27:55 »

Switched to Java 8 because I want default methods for my entities. Yum... multiple inheritance, kinda!

Offline DarkCart

JGO Kernel


Medals: 124
Projects: 9
Exp: 50 years


It's all in the mind, y'know.


« Reply #2269 - Posted 2015-06-04 20:39:07 »

Checked out this 1042-page monster from the public library today


The darkest of carts.
Offline SauronWatchesYou

JGO Ninja


Medals: 33
Projects: 4
Exp: 2 years


Hi there! :)


« Reply #2270 - Posted 2015-06-04 20:49:04 »

Today I played around with the Android SDK/XML a little and had a blast Grin

Definitely think this is the type of field I want to go into (mobile/tablet development)!

All these great projects are making me really want to make a game. Now what should I make...
Makes me want to start another game alongside my current one Sad

Hey, you! Back to work
Offline theagentd
« Reply #2271 - Posted 2015-06-04 20:57:14 »

@theagentd sorry, haven't followed the whole discussion, but why can't you do the motion blurring before AA?
To properly do motion blur where sharp objects aren't blurred by motion blur, you need to take the depth of each pixel into consideration. If a sharp foreground is overlapping an unsharp background, the sharp foreground should stay sharp and the background should only blur the background, etc. Anti-aliasing basically averages together color values. If you anti-alias the silhouette of a foreground object you will in some way mix in the color of the background to get a smoother image. That means that the depth information you had is no longer relevant since each pixel is now a weighted average of samples that originally had different depth values, and we can't tell them apart anymore since they're just a single color now. The result is that the silhouette will "smear" since you can't get correct depth ordering.

EDIT: Oh, I misread your question. I AM doing it before anti-aliasing, but if you motion blur the aliased picture the aliasing can sometimes get more obvious. A small, bright flickering pixel is no longer just a pixel, it's a flickering 30 pixels long line.

EDIT: Also, as my anti-aliasing only works on triangle edges and I can't do temporal anti-aliasing of motion transparent stuff like motion blurred edges without getting ghosting, the motion blur basically has to be added on top of the anti-aliased picture, reintroducing the aliasing.

Myomyomyo.
Offline philfrei
« Reply #2272 - Posted 2015-06-04 21:16:32 »

Today, for the first time, I played a sound on an Android emulator while making use of a Java audio library that I wrote.

Leading up to this (over last 6 months):
 - building new PC from parts that is able to handle Android dev;
 - installing Android Studio and emulator in Linux;
 - working through various books and tutorials to learn the basics of Android programming;
 - getting a tutorial that has a "simple synth" in Android working (https://audioprograming.wordpress.com/2012/10/18/a-simple-synth-in-android-step-by-step-guide-using-the-java-sdk/);
 - refactoring my audio mixer (in Java) to separate out the platform-specific parts (yesterday mostly), testing that it still works in the Java context;
 - importing the jar to the Android project and running a simple synth written in Java, running as a track on my audio mixer.

Sounded as clear as a bell. Next up, maybe I will test getting the FM synth bell working (the one used in Hexara) as well as the Event Handling parts. I'm a bit concerned that one of the classes I'm using (ConcurrentSkipListSet) may not be implemented in Android.

music and music apps: http://adonax.com
Offline ags1

JGO Kernel


Medals: 367
Projects: 7


Make code not war!


« Reply #2273 - Posted 2015-06-04 22:23:05 »

Switched to Java 8 because I want default methods for my entities. Yum... multiple inheritance, kinda!

Learned that default methods really aren't a good idea for this. Going back to using simple composition.

Offline Opiop
« Reply #2274 - Posted 2015-06-04 23:36:13 »

Switched to Java 8 because I want default methods for my entities. Yum... multiple inheritance, kinda!

Learned that default methods really aren't a good idea for this. Going back to using simple composition.
I had a quiet chuckle.                   
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #2275 - Posted 2015-06-05 01:28:31 »

I like java8 interfaces because you can finally have static methods in them. You would think that with java's class/namespace conflation that would have happened earlier.
Offline Opiop
« Reply #2276 - Posted 2015-06-05 01:37:04 »

At work a couple days ago I found a great opportunity for static interface methods, but it wasn't to be because our projects are still running Java 6. I love dusty old enterprise software, it's so fascinatingly horrible to manage!
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #2277 - Posted 2015-06-05 01:41:12 »

If you just need a place for methods that doesn't have to be an interface I usually use a variant-less enum:

1  
2  
3  
4  
5  
6  
7  
public enum Namespace {
    ;

    public static blah blah(blah blah) {

    }
}


I love dusty old enterprise software, it's so fascinatingly horrible to manage!

Be glad you aren't one of the poor schmucks stuck on 1.4.2 or whatever.
Offline ClaasJG

JGO Coder


Medals: 43



« Reply #2278 - Posted 2015-06-05 06:04:17 »

As long as nobody uses lambdas for recursion in methods everything is fine...

-ClaasJG

Quote
   private static void printUserDirContent(){
      @SuppressWarnings("unchecked")
      Consumer<Path>[] printer = new Consumer[1];
      printer[0] = p -> {
         System.out.println(p);
         if (Files.isReadable(p) && Files.isDirectory(p) && !Files.isSymbolicLink(p))
            try (DirectoryStream<Path> ds = Files.newDirectoryStream(p)){
               ds.forEach(printer[0]);
            } catch (IOException e) {
               e.printStackTrace();
            }
      };
      printer[0].accept(Paths.get(System.getProperty("user.dir")));
   }

My english has to be tweaked. Please show me my mistakes.
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #2279 - Posted 2015-06-05 06:11:06 »

Well that's just an infinite recursion, so of course it's not fine. It would work (although still be a bit weird) if it wasn't.
Pages: 1 ... 74 75 [76] 77 78 ... 214
  ignore  |  Print  
 
 

 
Riven (35 views)
2019-09-04 15:33:17

hadezbladez (3999 views)
2018-11-16 13:46:03

hadezbladez (1448 views)
2018-11-16 13:41:33

hadezbladez (4011 views)
2018-11-16 13:35:35

hadezbladez (772 views)
2018-11-16 13:32:03

EgonOlsen (4088 views)
2018-06-10 19:43:48

EgonOlsen (4666 views)
2018-06-10 19:43:44

EgonOlsen (2756 views)
2018-06-10 19:43:20

DesertCoockie (3648 views)
2018-05-13 18:23:11

nelsongames (3852 views)
2018-04-24 18:15:36
Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08

Deployment and Packaging
by gouessej
2018-08-22 08:03:45
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!