Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (524)
Games in Android Showcase (127)
games submitted by our members
Games in WIP (592)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Improving draw performance for constantly changing image  (Read 5216 times)
0 Members and 1 Guest are viewing this topic.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Posted 2014-02-11 12:39:36 »

Hi,

EDIT: TL;DR:
I'm using ffmpeg to decode video, so I get all the benefits of hardware acceleration that it provides.
I get decoded RGB frame data from ffmpeg from each frame and put it into a Raster bound to a BufferedImage.
I render the BufferedImage. This bit is slow.


I'm implementing a Java-based media player that's hooked into the ffmpeg media library. I'm simultaneously decoding multiple video streams and displaying them in a number of lightweight java components across 4 monitors. However, I'm encountering performance issues when rendering the decoded video to screen. The video decoding part is negligible here (approx. 10% CPU use on 1 core) so I'm not going to discuss that here. The issue arises in the swing painting code, which blocks deep within Java's Direct2D code while scaling/drawing the image itself, so I hope someone might be able to help me here.

I currently receive the image as a series of ints in packed 0rgb format (this is flexible, so I can use whatever colour model and packing gives best performance). The data is written directly into an image raster that I've instantiated in the following manner:

        imageBuffer = new DataBufferInt(lineStride * height);
        imageRaster = Raster.createPackedRaster(imageBuffer, width, height, lineStride, new int[] {0xff0000, 0xff00, 0xff}, null);                
        ColorModel colourModel = new DirectColorModel(32, 0xff0000, 0xff00, 0xff);
        currentFrame = new BufferedImage(colourModel, imageRaster, true, null);
        currentFrame.setAccelerationPriority(1.0f);

This code is only called once per video player and the raster and bufferedImage re-used for each subsequent frame.
Each time a new frame is decoded, the data is copied into the raster and repaint(30) called (to trigger an asynchronous swing repaint).

I am currently hitting 100% CPU usage on a Core i5 with the following scenario:

Four 1280x1024 monitors are connected to the PC.
Four 720x576 videos are being decoded at 25FPS (this is handled by ffmpeg and is not the contributing factor to the performance issues)
Each monitor is displaying a single JFrame without window decordations, scaled to fit the entire monitor
The JFrame's paintComponent method is overridden with the paint code as follows:

        graphics.setColor(Color.BLACK);
        graphics.fillRect(0, 0, getWidth(), getHeight());

        graphics.drawImage(currentFrame,
                        0, 0, getWidth(), getHeight(),
                        0, 0, currentFrame.getWidth(), currentFrame.getHeight(),
                        null);

For comparison, if I use four full-screen media players (such as VLC or ffplay), each playing one of the streams, the total CPU usage is around 30%.

I have experimented with various options (such as disabling Direct3D, etc) but without any real gains and mostly losses/unexpected behaviour. I have also considered using full-screen exclusive mode but would like to know whether this is likely to help before I try that route.

Does anyone have any suggestions? I'm happy to elaborate on any areas of the code that may be relevant.
Offline Roquen
« Reply #1 - Posted 2014-02-11 14:29:26 »

You can at least lose the clear.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #2 - Posted 2014-02-11 14:56:20 »

Acknowledged. I have removed it and re-tested but it gives no observable improvement to performance.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline TeamworkGuy2

Junior Devvie


Medals: 10



« Reply #3 - Posted 2014-02-11 15:24:51 »

Have you tried profiling the program while it is running to see which method calls are taking the longest?

The most basic way to do this is with a lot of System.nanoTime() calls and some print statements to discover how long each chunk of coding is running.  Be careful, this is fairly messy.  If you do try this, don't including prints statements around a method call which contains print statements inside the method.  Depending on the JVM, print statements can be very slow and including timer prints statements inside of methods that call other methods that have timer print statements causes the parent method calls to take much longer and appear slower until you remove the child method's print statements.
(from personnel experience trying to profile a program this way)

I would recommend VisualVM for profiling, although simply starting your Java app from the command line with the -Xprof switch can sometimes be helpful.

The -Xprof switch will keep track of which methods are running every 'tick' (depending on VM, around 60 ticks per second) and when you close the program you will get a printout to the command line of how many 'ticks' each method was running for.

The most powerful option is a profiler like VisualVM. VisualVM can be tricky to get running and navigate through, but once you figure it out, it's a very useful tool for tracking memory and CPU usage of a Java program.
Here's a link to VisualVM's documentation page (there's a link to download it in the page's menu bar).  There are a number of links on the page including an introduction page and troubleshooting page: http://visualvm.java.net/docindex.html.
Offline trollwarrior1
« Reply #4 - Posted 2014-02-11 15:30:39 »

The problem is probably "receiving and sending each image". You should be sending a lot of images to GPU instead of 1 at a time. I have no idea how to implement that, but that is my guess for your problem Smiley
Offline Roquen
« Reply #5 - Posted 2014-02-11 15:37:39 »

In an ideal world you use multiple buffers and have the codec directly perform the color conversion into the target...massive reduction in memory motion.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #6 - Posted 2014-02-11 15:38:52 »

I did previously state this, but perhaps I didn't make this clear enough in my original post:

It's blocking within the D3D blitting part of the java2D code.

I've already profiled the application (many times!) and here's a typical trace (using sampling as full instrumentation slows everything down to a massive crawl):



Switching off D3D does give a small performance improvement:



Switching on openGL stops the application working completely.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #7 - Posted 2014-02-11 15:42:21 »

In an ideal world you use multiple buffers and have the codec directly perform the color conversion into the target...massive reduction in memory motion.

Yes - I am in the process of porting the java code directly to C++ and thinking about using the SDL libraries or similar to do the rendering direct in YUV420p encoding. However, I'm definitely not a C++ engineer, so it's taking a while and has lots of snags - the media decoding alone (let alone the rendering) is hugely complex and not easy to port. Plus it has to run on windows which is just plain horrible for C++ (though minGW is quite nice).

Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #8 - Posted 2014-02-11 15:44:58 »

The problem is probably "receiving and sending each image". You should be sending a lot of images to GPU instead of 1 at a time. I have no idea how to implement that, but that is my guess for your problem Smiley

Unfortunately it's a video stream, and one of the main selling points is that it's capable of (almost) zero latency. While I can buffer frames, I'm never buffering more than about 10, and often there's no buffering at all.
Offline Roquen
« Reply #9 - Posted 2014-02-11 15:54:21 »

If you're completely nuts you could move the heavy lifting to the GPU.  Having the CPU pretty much only does entropy decoding of the video side at the far extreme.  A heck of a lot of work though...unless someone else has already done it (and supported codec(s) dependent).
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #10 - Posted 2014-02-11 15:57:11 »

If you're completely nuts you could move the heavy lifting to the GPU.  Having the CPU pretty much only does entropy decoding of the video side at the far extreme.  A heck of a lot of work though...unless someone else has already done it (and supported codec(s) dependent).

Forgive me if I'm misinterpreting you here, but are you saying I re-implement all possible video codecs I could encounter to run entirely on the GPU?
Offline CommanderKeith
« Reply #11 - Posted 2014-02-11 15:57:31 »

Unfortunately the performance of java2D is not great when it comes to blitting rotated images. This might also be the case for scaled images too, I can't remember.
Use the java2d trace option and let us know what the output is. It's better than using the profiler.
http://www.oracle.com/technetwork/java/javase/java2d-142140.html#gcrus

Most people doing demanding rendering on this forum use openGL. OpenGL is faster but less reliable since it  depends on opengl drivers which can be flaky.
When I use java2d it's usually only for painting in a window or applet that is a fraction of the screen size since that's the only way I can achieve a reasonable frame rate on slow computers.
Since you're blitting to 4 monitors I think you will have to switch to opengl to achieve reasonable performance, I don't think java2d can be tweaked to achieve 60 fps on 4 monitors using a computer with or without a video card since many java2D operations (such as image rotation) are not hardware accelerated.

Cheers,
Keith

Offline Roquen
« Reply #12 - Posted 2014-02-11 16:10:33 »

Forgive me if I'm misinterpreting you here, but are you saying I re-implement all possible video codecs I could encounter to run entirely on the GPU?
I have no idea what you're doing...I'm just tossing out wild duck ideas.  In the simplest case you could send the 3 color channels buffers to the GPU and perform color conversion there.  I'm not saying this is reasonable in your case.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #13 - Posted 2014-02-11 16:10:53 »

Output from adding -Dsun.java2d.trace=count:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
19 calls to sun.java2d.d3d.D3DSwToTextureBlit::Blit(IntArgb, SrcNoEa, "D3D Texture")
36066 calls to sun.java2d.d3d.D3DMaskFill::MaskFill(AnyColor, SrcOver, "D3D Surface")
5004 calls to D3DFillRect
4255 calls to D3DDrawGlyphs
4503 calls to sun.java2d.d3d.D3DTextureToSurfaceBlit::Blit("D3D Texture", AnyAlpha, "D3D Surface")
31994 calls to sun.java2d.d3d.D3DMaskFill::MaskFill(OpaqueColor, SrcNoEa, "D3D Surface")
941 calls to sun.java2d.d3d.D3DSwToSurfaceScale::ScaledBlit(IntRgb, AnyAlpha, "D3D Surface")
942 calls to sun.java2d.d3d.D3DRTTSurfaceToSurfaceBlit::Blit("D3D Surface (render-to-texture)", AnyAlpha, "D3D Surface")
19 calls to sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntArgb, AnyAlpha, "D3D Surface")
83743 total calls to 9 different primitives


With  -Dsun.java2d.d3d=false as well:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
2718 calls to sun.java2d.loops.MaskFill::FillAAPgram(AnyColor, Src, IntRgb)
1361 calls to sun.java2d.windows.GDIBlitLoops::Blit(IntRgb, SrcNoEa, "GDI")
12420 calls to sun.java2d.loops.MaskBlit::MaskBlit(IntArgb, SrcOver, IntRgb)
22574 calls to sun.java2d.loops.MaskFill::MaskFill(AnyColor, SrcOver, IntRgb)
1359 calls to sun.java2d.loops.ScaledBlit::ScaledBlit(IntRgb, SrcNoEa, IntRgb)
15660 calls to sun.java2d.loops.MaskFill::FillAAPgram(AnyColor, SrcOver, IntRgb)
12420 calls to sun.java2d.loops.Blit$GeneralMaskBlit::Blit(IntArgb, SrcOver, IntRgb)
3 calls to sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
4083 calls to sun.java2d.loops.DrawGlyphListAA::DrawGlyphListAA(AnyColor, SrcNoEa, IntRgb)
72598 total calls to 9 different primitives


Hope this helps.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #14 - Posted 2014-02-11 16:15:44 »

I have no idea what you're doing...I'm just tossing out wild duck ideas.  In the simplest case you could send the 3 color channels buffers to the GPU and perform color conversion there.  I'm not saying this is reasonable in your case.

Appreciated. I thought I'd explained the situation quite clearly, but TL;DR:

I'm using ffmpeg to decode video, so I get all the benefits of hardware acceleration that it provides.
I get decoded RGB frame data from ffmpeg from each frame and put it into a Raster bound to a BufferedImage.
I render the BufferedImage.
Slowness intensifies.
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #15 - Posted 2014-02-11 16:20:46 »

Oh, my previous traces had quite a lot of extra stuff in them (debug overlay text etc) that for simplicity's sake I've commented out. Here it is with literally just the image render:

With D3D enabled:

1  
2  
3  
4  
5  
6  
862 calls to sun.java2d.d3d.D3DSwToSurfaceScale::ScaledBlit(IntRgb, AnyAlpha, "D3D Surface")
16 calls to sun.java2d.d3d.D3DMaskFill::MaskFill(AnyColor, SrcOver, "D3D Surface")
863 calls to sun.java2d.d3d.D3DRTTSurfaceToSurfaceBlit::Blit("D3D Surface (render-to-texture)", AnyAlpha, "D3D Surface")
5 calls to D3DFillRect
1 call to D3DDrawGlyphs
1747 total calls to 5 different primitives

With D3D disabled:

1  
2  
3  
4  
5  
627 calls to sun.java2d.loops.ScaledBlit::ScaledBlit(IntRgb, SrcNoEa, IntRgb)
629 calls to sun.java2d.windows.GDIBlitLoops::Blit(IntRgb, SrcNoEa, "GDI")
31 calls to sun.java2d.loops.MaskFill::MaskFill(AnyColor, SrcOver, IntRgb)
3 calls to sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
1290 total calls to 4 different primitives

Please note they weren't run for exactly the same amounts of time, so numbers may vary.
Offline jonjava
« Reply #16 - Posted 2014-02-11 17:20:58 »

Since you're blitting to 4 monitors I think you will have to switch to opengl

Which was done by Riven a while ago iirc:

http://www.java-gaming.org/topics/java-media-player/27100/view.html

Swing is swing. If the issue here seems to be the speed of java2d drawing and scaling an image - there may not be much to be done and the performance will vary a lot between systems.

Offline ctomni231

JGO Wizard


Medals: 99
Projects: 1
Exp: 7 years


Not a glitch. Just have a lil' pixelexia...


« Reply #17 - Posted 2014-02-11 19:29:28 »

Java has a hard time playing an animated GIF without frame reduction, let alone converting each frame into pixels and then displaying it. I think you'll have to definitely look into doing the heavy lifting via the GPU. When it comes to reliable frames, Java can't be trusted to run perfectly on all platforms.  Undecided

Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #18 - Posted 2014-02-11 19:48:53 »

Java has a hard time playing an animated GIF without frame reduction, let alone converting each frame into pixels and then displaying it. I think you'll have to definitely look into doing the heavy lifting via the GPU. When it comes to reliable frames, Java can't be trusted to run perfectly on all platforms.  Undecided
I'm doing the heavy lifting (the media decoding) in ffmpeg, which makes full use of SSSE3 CPU extensions for doing this kind of thing. All I need Java to do is to display each frame without consuming 100% CPU. Cross-platform isn't an issue; I will only be running the Java version on windows as it's tied in to ffmpeg via JNI. I'm working on a C++ port which I'm sure will solve the CPU issue and be cross platform, but that's some time away yet...
Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #19 - Posted 2014-02-11 20:01:57 »


Now that is a nice project.

I'll probably drop a message or two on there suggesting some improvements, as it would be possible to remove the MJPEG dependency altogether with the right output format.

I think I have some old code based on LWJGL lying around that can draw images to the screen and respond to mouse clicks that I'll probably refresh my memory on and make something of similar nature. The annoying part is that I'll have to re-implement all of the Swing components that I overlay onto the video in openGL (these aren't included in the benchmarking above, they were the first thing I suspected and removed when testing the performance), unless anyone can suggest otherwise?
Offline ctomni231

JGO Wizard


Medals: 99
Projects: 1
Exp: 7 years


Not a glitch. Just have a lil' pixelexia...


« Reply #20 - Posted 2014-02-11 20:11:25 »

did you try Thread.sleep() to reduce the CPU time? There really is no other way to prevent the 100% CPU except by forcing the CPU to rest... (or using a RESTful system Tongue )

Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #21 - Posted 2014-02-11 20:16:46 »

did you try Thread.sleep() to reduce the CPU time? There really is no other way to prevent the 100% CPU except by forcing the CPU to rest... (or using a RESTful system Tongue )
I use a combination of Object.wait()s, PriorityBlockingQueue()s and Thread.sleep()s as appropriate. The CPU usage isn't a constant 100% - it varies depending on work done and unfortunately, java2D is doing more work than it needs to  Sad

You also refer to RESTful - are you referring to the web service architecture? Because if so, I'm afraid I don't see the relevance. Even web frameworks gotta sleep (or wait, or select...)
Offline jonjava
« Reply #22 - Posted 2014-02-12 01:15:02 »


The annoying part is that I'll have to re-implement all of the Swing components that I overlay onto the video in openGL (these aren't included in the benchmarking above, they were the first thing I suspected and removed when testing the performance), unless anyone can suggest otherwise?

IIRC JOGL works with Swing to some extent, maybe that might help?

http://jogamp.org/jogl/doc/userguide/#overview
https://jogamp.org/wiki/index.php/Using_JOGL_in_AWT_SWT_and_Swing#JOGL_in_Swing

Offline CommanderKeith
« Reply #23 - Posted 2014-02-12 05:11:57 »

Oh, my previous traces had quite a lot of extra stuff in them (debug overlay text etc) that for simplicity's sake I've commented out. Here it is with literally just the image render:

With D3D enabled:

1  
2  
3  
4  
5  
6  
862 calls to sun.java2d.d3d.D3DSwToSurfaceScale::ScaledBlit(IntRgb, AnyAlpha, "D3D Surface")
16 calls to sun.java2d.d3d.D3DMaskFill::MaskFill(AnyColor, SrcOver, "D3D Surface")
863 calls to sun.java2d.d3d.D3DRTTSurfaceToSurfaceBlit::Blit("D3D Surface (render-to-texture)", AnyAlpha, "D3D Surface")
5 calls to D3DFillRect
1 call to D3DDrawGlyphs
1747 total calls to 5 different primitives

With D3D disabled:

1  
2  
3  
4  
5  
627 calls to sun.java2d.loops.ScaledBlit::ScaledBlit(IntRgb, SrcNoEa, IntRgb)
629 calls to sun.java2d.windows.GDIBlitLoops::Blit(IntRgb, SrcNoEa, "GDI")
31 calls to sun.java2d.loops.MaskFill::MaskFill(AnyColor, SrcOver, IntRgb)
3 calls to sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
1290 total calls to 4 different primitives

Please note they weren't run for exactly the same amounts of time, so numbers may vary.

sun.java2d.loops operations are the software blitting, and in your output for D3D it looks like none of that is going on which is good.
But in your D3D output there is a lot of:
862 calls to sun.java2d.d3d.D3DSwToSurfaceScale::ScaledBlit(IntRgb, AnyAlpha, "D3D Surface")
"Sw" stands for software I believe, so this may be the slow operation that is bogging down the CPU and frame rate.
I am not an expert on java2D and most of my knowledge is just from experimenting with different things and some reading, but I have found that VolatileImages are a little faster than BufferedImages. So you could try switching to them.
Another thing, fullscreen exclusive mode with D3D on and triple buffering may result in much faster speeds since page flipping is likely to be enabled.
Another thing, there is a java2d class that is supposed to give details of graphics capabilities but it is buggy and won't actually tell you if page flipping is possible or not.
Unfortunately Java2D has not been worked on for some time and all of oracle/sun's energy has been put into javafx, though there is little to show for it.

Offline EdTorbett

Senior Newbie


Exp: 12 years



« Reply #24 - Posted 2014-02-12 10:20:16 »

I've replaced my earlier post containing dropbox links to the profiling results with the actual image to increase visiblity as I assume people haven't noticed them.

Essentially, the problem isn't in any one command such as the surface scale/blit in d3d; it's in the mandatory buffer flush immediately following. Turning off d3d and using the software blitting is actually faster for displaying the image alone as it's not doing this extremely expensive operation, for whatever reason. Unfortunately the gains on image rendering are lost in the general reduction in performance with d3d switched off.
Offline CommanderKeith
« Reply #25 - Posted 2014-02-12 14:26:14 »

Java2d graphics pipelines are a part of the VM and are not configurable beyond the few vm args you already know about.

Quote
Each time a new frame is decoded, the data is copied into the raster and repaint(30) called (to trigger an asynchronous swing repaint
Perhaps using active rendering might work better, bit then again it may be negligible.

Why dont you try switching to OpenGL and see how it goes?

Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

toopeicgaming1999 (57 views)
2014-11-26 15:22:04

toopeicgaming1999 (50 views)
2014-11-26 15:20:36

toopeicgaming1999 (10 views)
2014-11-26 15:20:08

SHC (24 views)
2014-11-25 12:00:59

SHC (24 views)
2014-11-25 11:53:45

Norakomi (26 views)
2014-11-25 11:26:43

Gibbo3771 (24 views)
2014-11-24 19:59:16

trollwarrior1 (36 views)
2014-11-22 12:13:56

xFryIx (75 views)
2014-11-13 12:34:49

digdugdiggy (52 views)
2014-11-12 21:11:50
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!