Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (513)
Games in Android Showcase (120)
games submitted by our members
Games in WIP (577)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  GC Bomb - CMS GC starves high priority system threads  (Read 19066 times)
0 Members and 1 Guest are viewing this topic.
Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Posted 2005-07-26 01:06:26 »

Not game related, but it might interest you just the same.

My company makes video encoding stuff.  We have our own capture cards and a fancy app that does video encoding and braocasting using various codecs.  The UI is written in Java. It runs on Windows and the same UI code provides a remote interface to our encoders on Linux and Mac.  My test were done on 1.5.0_01, since 1.5.0_03 or _04 will crash hard in jvm.dll if you let it run for 30 hours or so.

Recently while testing we found that the hardware was dropping frames of video because the software wasn't emptying the hardware's FIFO fast enough.  This happened when the Concurrent Mark Sweep collector did a full sweep.   There is a driver thread that runs at the highest windows priority (31) that is responsible for emptying the hardware FIFO.  That thread is running in the System process, quite separate from the UI application.

After much experimenting I found that the 'sweep' phase of the CMS collector would run for a bit over 200 ms.  During that time there was a lot of thread context switching and it managed to starve the high priority driver thread for long enough (100ms) that the hardware FIFO overflowed.

Incidentally, my code during a test does nothing but query some stats from another process and display them on a Swing UI.  It has one indeterminant progress bar and several JLables that update once a second.  As far as I know I generate no new objects that should live more than one second during this process. Most of the time the app is idle.  A polling thread wakes up every second and collects some stats via JNI.

I switched  to the openGL pipeline and the problem went away.  Checking the GC activity showed that with the default graphics pipeline on windows or with nodraw=true the CMS sweep was as I said 200ms+.  Simply switching to the OpenGL pipeline and changing nothing else in the application the time taken for the CMS sweep phase was reduced by a factor of 10, though it happened a bit more frequently.  Lowering the max heap size (to 20MB) also made the problem go away, by increasing the freqency and shortening the duration of the CMS sweep phase.

Because the OpenGL pipeline was not stable on all the hardware we needed to support, we searched for another solution and found it with the -XX:+CMSIncrementalMode option (used in addition to the -XX:+UseConcMarkSweepGC that we were already using).

What I find most disturbing about this is that the GC thread could possibly starve a thread in the system process that is set to the highest possible priority.  I have many questions:

What priority does the GC thread run at?
What about the finalizer thread?
Why do any objects accumulate in the tenured generation at all?  (Every object I make is small and lives for at most one second.  The young generation should easily deal with them.)
My thread does little more than while(true) { lable.setText("frame "+x); Thread.sleep(1000); }, though with a few more JLabels, a hidden JTable, and a progress bar animating.

Next I have to figure out why jvm.dll always crashes at the same spot after running for a couple days... though for now 1.5.0_01 will save me from that.

Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #1 - Posted 2005-07-26 09:07:48 »

I noticed that something in 1.5 starves the entire system making the mouse judder - unheard of in Windows XP before. But no-one saw fit to investigate the problem despite it being very severe. Was hoping Azeem or Jeff would pick it up...

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #2 - Posted 2005-07-26 14:26:04 »

That's very interesting Cas,  It just so happens that many years ago when I was programming a time-sensitive thing on Windows NT, I had a process that ran at the highest priority in a user-mode thread and I found that an animated mouse pointer would block my thread and totally hose my code.  The solution was to write a driver and run my code in an interrupt (it needed to do time-sensitive access to the serial port to talk to high end video controllers and respond to their requests in a timely manner).

So this lead me to believe that the code controlling the mouse pointer runs at an extremely high priority, likely part of an ISR, even when it is doing trivial things like making a stupid pink dinosaur walk.   The fact that the JVM might somehow block such high-priority code is very disturbing.

My first guess was that finalizers involved with freeing direct draw surfaces were making the video driver steal CPU in some high-priority driver code.  But I experienced the same problem with -Dsun.java2d.noddraw=true

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #3 - Posted 2005-07-26 14:42:54 »

I should have also stated that I'm running this stuff on a dual processor machine.  As I understand it the CMS collector uses a single thread, my application thread is mostly blocked anyway, AWT/Swing will have a timer thread to animated the indeterminate progress bar, the driver thread will be emptying the hardware FIFO, and there will be an encoder process that is pushing the video frames through some codec (MPEG2 or WM9) on one or two threads.

During the encoding process the CPU usage is around 21%, plenty of head room - until the GC kicks in and kills everything.

Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #4 - Posted 2005-07-26 15:24:17 »

The interesting thing for me here is that this occurs during Eclipse development as well - which hasn't got a whiff of AWT in it, it's all SWT. But most telling of all is that no such juddering and jerkiness occurs whatsoever under 1.4.2.

The thing is, this is such an obvious big issue and so immediately, annoyingly noticeable, I'm not sure why no-one at Sun has noticed it and started to undo it?

Cas Smiley

Offline Raghar

Junior Duke




Ue ni taete 'ru hitomi ni kono mi wa dou utsuru


« Reply #5 - Posted 2005-07-26 20:51:28 »

I still wait for Cas detailed writing about messing with his system to get some idea what caused that problem.

What I listened about it, it looks like something in DirectX pipeline.
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #6 - Posted 2005-07-27 10:17:00 »

I have no idea how to pinpoint the bit that's causing the lock but Scott's discovery that enabling the OpenGL pipeline is intriguing. I suspected DirectX but had up till now discounted it because the problem manifested in Eclipse which doesn't use AWT. Also, no-one seems to have reported the problem on Linux...

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #7 - Posted 2005-07-27 13:15:23 »

Ah.. but I think the OpenGL pipeline thing is a red herring.   I emailed Chris Campbell about that and he explained that the openGL pipeline simply generated less garbage due to the way it handed off more operations to OpenGL.  My problem occurs with even with -Dsun.java2d.noddraw=true which should mean that only GDI calls are used.

Chris wrote:
Quote
I think the reason you see more frequent/smaller GCs with the OGL pipeline is that it is able to perform many operations directly via OpenGL with few intermediate operations.  For example, if your application does a lot of bilinear image scaling, the OGL pipeline can handle this directly, whereas the DDraw or GDI pipelines can only handle this using our software loops.  Those software loops might require conversion from one image format to another, and therefore temporary intermediate images may be created frequently.  More intermediate images means increased burden on GC.

Note that this is just one example of a situation where your app might create more garbage with one pipeline versus another.  Have you used the -Dsun.java2d.trace flag?
http://java.sun.com/j2se/1.5.0/docs/guide/2d/flags.html#trace

I would suggest that you try running your app with this flag, first with OGL, then with the other pipelines, and then compare the logs.  If you see lots of primitives with "General" in their name, then that's a good indication that an operation is using intermediate surfaces, and possibly creating more garbage.

I think the sweep phase of the collection might still be blocking high-priority threads in an odd way, but simply for shorter times due to the characteristics of the garbage generated.  And of course even GDI calls go through the graphics driver in some form eventually - but I'm afraid I don't know enough about things at that level.

I also recently found this: http://java.sun.com/performance/reference/whitepapers/5.0_performance.html#2.10
which mentions:
Quote
The Concurrent Low Pause Garbage Collector, also known as CMS, has had several improvements for J2SE 5.0. Garbage collection pauses are shorter because of better use of parallelism during collection periods during which application threads are stopped.

This paper :http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.%20The%20Concurrent%20Low%20Pause%20Collector%7Coutline
was also quite helpful, but still leaves some unanswered questions.  Such as when do the finalizers run?  Is there any way to get profiling info about how much time is taken by the finalizer thread?

Here is an example of a collection that starves a priority 31 thread in the System process for 100ms or more:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
1526.756: [GC 1526.756: [ParNew: 3968K->0K(4032K), 0.0159318 secs] 16703K->12797K(21260K), 0.0160274 secs]
1538.213: [GC 1538.214: [ParNew: 3968K->0K(4032K), 0.0105305 secs] 16765K->12860K(21260K), 0.0106204 secs]
1549.802: [GC 1549.802: [ParNew: 3968K->0K(4032K), 0.0151089 secs] 16828K->12923K(21260K), 0.0152276 secs]
1561.191: [GC 1561.191: [ParNew: 3968K->0K(4032K), 0.0156131 secs] 16891K->12984K(21260K), 0.0157176 secs]
1572.565: [GC 1572.565: [ParNew: 3968K->0K(4032K), 0.0154554 secs] 16952K->13046K(21260K), 0.0155757 secs]
1583.973: [GC 1583.973: [ParNew: 3968K->0K(4032K), 0.0122170 secs] 17014K->13107K(21260K), 0.0123115 secs]
1595.531: [GC 1595.531: [ParNew: 3968K->0K(4032K), 0.0162585 secs] 17075K->13168K(21260K), 0.0163687 secs]
1606.968: [GC 1606.968: [ParNew: 3968K->0K(4032K), 0.0158047 secs] 17136K->13229K(21260K), 0.0159176 secs]
1606.984: [GC [1 CMS-initial-mark: 13229K(17228K)] 13229K(21260K), 0.0008632 secs]
1606.985: [CMS-concurrent-mark-start]
1607.077: [CMS-concurrent-mark: 0.092/0.092 secs]
1607.077: [CMS-concurrent-preclean-start]
1607.077: [CMS-concurrent-preclean: 0.000/0.000 secs]
1607.092: [GC[YG occupancy: 86 K (4032 K)]1607.092: [Rescan (parallel) , 0.0009427 secs]1607.093: [weak refs processing, 0.0109421 secs] [1 CMS-remark: 13229K(17228K)] 13315K(21260K), 0.0161694 secs]
1607.123: [CMS-concurrent-sweep-start]
1607.308: [CMS-concurrent-sweep: 0.184/0.184 secs]
1607.308: [CMS-concurrent-reset-start]
1607.321: [CMS-concurrent-reset: 0.013/0.013 secs]
Encoding stopped on instance 0
1862.099: [GC 1862.099: [ParNew: 3968K->0K(4032K), 0.0171288 secs] 14714K->10755K(21944K), 0.0172192 secs]
2160.773: [GC 2160.773: [ParNew: 3968K->0K(4032K), 0.0169685 secs] 14723K->10756K(21944K), 0.0170589 secs]

The "encoding stopped" message if from my app realizing that we aborted a video capture because the capture hardware was forced to drop frames.
I'm not sure what the typical timeslice is on WinXP, but I would expect that in 184ms that a priority 31 thread in the system process would get a couple shots at the CPU.  It doesn't need to do much to setup a DMA operation to empty the FIFO of our hardware so it is quite surprising that the hardware FIFO overflowed.  The FIFO can buffer 100ms of video frames

Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #8 - Posted 2005-07-27 13:48:51 »

I've had eclipse juddering so much under 1.5 that the Windows event queue filled up and beeped at me when I've moved the mouse - gnarly.

What is at issue here is not thread priorities! The JVM process runs at Normal priority - I can see this in the task manager - and I think, but can't remember for sure, that none of the threads it can produce can interrupt the "ring 0" stuff like the mouse pointer, which is a realtime priority.

So what is actually happening, I believe, is that either new() or GC may be holding a kernel mode lock somewhere that has everything else waiting for it to finish. The actual throughput of the system is not changed particularly but responsiveness is through the floor as the JVM holds this lock preventing extremely high priority tasks from executing. I suspect it's some kernel mode call that's doing it because userland tasks are completely at the mercy of the kernel and don't normally have the ability to block the kernel unless they perform kernel operations. This is one reason why OpenGL drivers tend to be far stabler than D3D ones, because GL mostly lives in userland whereas DirectX is nastily mixed up into the kernel. Well, that's how it was last time I read up on it.

The GL pipeline is a total red herring - didn't have any effect on Eclipse, as I suspected - AWT doesn't have anything to do with it. However Eclipse generates a hell of a lot of garbage.

There is one other possibility is that JIT compilation is also holding on to this kernel mode lock too. Eclipse does an awful lot of recompiling and the problem is worse when I run Eclipse with the -server switch.

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #9 - Posted 2005-07-27 16:01:55 »

Well a process priority and the thread priorities within the process are slightly different things.  The thread priority of GC could be set high even though the base priority of the process is normal.  But I tend to agree with you that it is some interaction with the kernel that is causing this.  But what sort of system-wide lock would be held by the JVM?

I went so far as to wonder if it was the interaction with the kernel memory manager.  When getting more RAM for the java heap, or releasing RAM when it shrinks the heap there will be page tables and stuff that need to be updated in kernel mode.  But I just can't see there being a significant amount of kernel interaction caused by something like that.

Cas, have you tried  -XX:+UseConcMarkSweepGC with -XX:+CMSIncrementalMode to fix the issues you see with Eclipse?  (I have never seen similar issues when I run Eclipse on my XP box.)

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #10 - Posted 2005-07-27 16:16:12 »

I too suspect it's something to do with a kernel mode memory lock of some sort. Those compiler switches just make eclipse bomb out before it finishes loading :/ And the mouse still jerks while it's loading.

Cas Smiley

Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #11 - Posted 2005-07-28 21:37:40 »

I'm absolutely certain it's GC now - I had an OOME today but as the heap gets completely filled and the GC goes into overdrive the system is almost brought to a standstill. I can't even get at the Task Manager, and the mouse judders like an old 2CV struggling up a hill.

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #12 - Posted 2005-07-28 23:50:52 »

Moved from the other thread...
Id be curious if its reproducible in 1.6  (mustang)


The GC activity is significantly different using the same options with Mustang b43.

check this out,  I've been trying to put together a test case...
Here it is running on 1.5.0_03
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
C:\DevJava\BugReports>"c:\Program Files\Java\jre1.5.0_03\bin\java" -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps  -cp . JVMCrash
0.000: [GC 0.000: [ParNew: 3967K->0K(4032K), 0.0200582 secs] 3967K->1066K(16320K), 0.0205862 secs]
10.541: [GC 10.541: [ParNew: 3968K->0K(4032K), 0.0104868 secs] 5034K->1395K(16320K), 0.0109198 secs]
32.334: [GC 32.335: [ParNew: 3968K->0K(4032K), 0.0151047 secs] 5363K->1522K(16320K), 0.0155206 secs]
54.148: [GC 54.149: [ParNew: 3968K->0K(4032K), 0.0140430 secs] 5490K->1663K(16320K), 0.0145384 secs]
75.734: [GC 75.735: [ParNew: 3968K->0K(4032K), 0.0071965 secs] 5631K->1823K(16320K), 0.0077193 secs]
97.746: [GC 97.747: [ParNew: 3968K->0K(4032K), 0.0114971 secs] 5791K->1940K(16320K), 0.0119415 secs]
119.838: [GC 119.839: [ParNew: 3968K->0K(4032K), 0.0141531 secs] 5908K->2056K(16320K), 0.0146701 secs]
141.901: [GC 141.901: [ParNew: 3968K->0K(4032K), 0.0155667 secs] 6024K->2171K(16320K), 0.0161009 secs]
163.354: [GC 163.355: [ParNew: 3968K->0K(4032K), 0.0158057 secs] 6139K->2378K(16320K), 0.0163267 secs]


Here is the exact same code running on Mustang b43:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
C:\DevJava\BugReports>java  -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps  -cp . JVMCrash
0.000: [GC 0.000: [ParNew: 3968K->0K(4032K), 0.0174371 secs] 3968K->953K(16320K), 0.0181594 secs]
28.058: [GC 28.058: [ParNew: 3968K->0K(4032K), 0.0151061 secs] 4921K->1110K(16320K), 0.0154193 secs]
69.326: [GC 69.327: [ParNew: 3968K->0K(4032K), 0.0135772 secs] 5078K->1113K(16320K), 0.0139347 secs]
110.623: [GC 110.624: [ParNew: 3968K->0K(4032K), 0.0148840 secs] 5081K->1114K(16320K), 0.0152321 secs]
151.814: [GC 151.814: [ParNew: 3968K->0K(4032K), 0.0131889 secs] 5082K->1115K(16320K), 0.0135931 secs]
193.205: [GC 193.206: [ParNew: 3968K->0K(4032K), 0.0138703 secs] 5083K->1117K(16320K), 0.0142879 secs]
234.612: [GC 234.613: [ParNew: 3968K->0K(4032K), 0.0150744 secs] 5085K->1118K(16320K), 0.0155725 secs]
273.818: [GC 273.819: [ParNew: 3968K->0K(4032K), 0.0131767 secs] 5086K->1120K(16320K), 0.0137323 secs]
312.053: [GC 312.054: [ParNew: 3968K->0K(4032K), 0.0141546 secs] 5088K->1122K(16320K), 0.0146340 secs]
350.398: [GC 350.398: [ParNew: 3968K->0K(4032K), 0.0151758 secs] 5090K->1123K(16320K), 0.0156526 secs]

Notice the frequency of the collections, and the rate at which the second set of numbers around the "->" grows.
This test generates no objects that should live longer than about one second.  At least not in the test code itself.  I think in the JRE code some long-lived objects are accumulatng... maybe some weak map or something?
1  

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #13 - Posted 2005-07-29 00:21:04 »

Jeff,

The example above only shows collections of the eden space.  They are not a problem in terms ofthe time they take.

The thing to note is the frequency of collections and the amount of garbage promoted to the next generation.  I may not be 100% accurate with my guesses as to what the output means but I think I have the general idea right.

The ###->### represents the before and after size of the objects *somewhere*.  For the first set of numbers this seems to correspond to the eden space - once it is consumed the surviving objects are moved out, so there is nothing left and the usage grows fromn 0 again.

The second set of #-># appears to be related to the next generation.  I think it might be called the 'survivor' space.. and here we can see it accumulating objects.

The main thing to point out is that taking the same code and running it on 5.0 and 6.0 you see a huge difference in the amount of objects that survive.  It a factor of 100 or more larger for the 1.5.0_03 VM compared to Mustang b43.

Because the longer-lived objects grow over hundred times faster in 1.5 this triggers bigger sweeps of the next generation.

This looks like:
1237.204: [GC 1237.204: [ParNew: 3968K->0K(4032K), 0.0165371 secs] 11910K->8058K(16320K), 0.0168901 secs]
1258.767: [GC 1258.767: [ParNew: 3968K->0K(4032K), 0.0171561 secs] 12026K->8170K(16320K), 0.0177815 secs]
1280.768: [GC 1280.768: [ParNew: 3968K->0K(4032K), 0.0165995 secs] 12138K->8285K(16320K), 0.0170013 secs]
1280.786: [GC [1 CMS-initial-mark: 8285K(12288K)] 8285K(16320K), 0.0004433 secs]

1280.786: [CMS-concurrent-mark-start]
1280.839: [CMS-concurrent-mark: 0.053/0.053 secs]
1280.840: [CMS-concurrent-preclean-start]
1280.840: [CMS-concurrent-preclean: 0.000/0.000 secs]
1280.845: [GC[YG occupancy: 83 K (4032 K)]1280.846: [Rescan (parallel) , 0.00053
82 secs]1280.847: [weak refs processing, 0.0123378 secs] [1 CMS-remark: 8285K(12
288K)] 8368K(16320K), 0.0182686 secs]
1280.878: [CMS-concurrent-sweep-start]
1281.052: [CMS-concurrent-sweep: 0.171/0.171 secs]
1281.052: [CMS-concurrent-reset-start]
1281.056: [CMS-concurrent-reset: 0.004/0.004 secs]
1302.538: [GC 1302.539: [ParNew: 3968K->0K(4032K), 0.0135049 secs] 8193K->4340K(16320K), 0.0139591 secs]
1324.754: [GC 1324.755: [ParNew: 3968K->0K(4032K), 0.0174210 secs] 8308K->4459K(16320K), 0.0183872 secs]


It is the CMS sweep phase that runs solid on all processors for 170-250 ms that causes very bad things to happen.

Guess what!  I found what is causing the >100x difference in long lived objects between Java 5 and Java 6. It is an indeterminate JProgressBar, running with the Win XP look and feel.  Apparently the Java 5 implementation has some serious performance issues.   I've taken various GC measurements as well as the output of -Dsun.java2d.trace=count and passed in on to Chris C of Java2D accelerated pipeline fame.  He was helping me track down some of the issues that I observed with the garbage and pausing that varied so widely based on the rendering pipeline used.

The big mysteries that remain:
1- Why does painting a silly progress bar generate ANY long-lived objects, since the Progress bar repaints several times a second, what could have references that last more than a fraction of a second? The ParNew collections happen several seconds apart, I suppose that whatever drawing surface happens to be 'current' must get promoted to the next stage, even though it becomes garbage an instant later.
2- Why does CSM-sweep lock out ultra-hihg priority threads in other processes, even if it does want all the CPU it can get.  It should be running at a "normal" priority.

Offline Jeff

JGO Coder




Got any cats?


« Reply #14 - Posted 2005-07-29 04:02:16 »

I MIGHT have a suggestion for #1.

Without looking at the code, this is just a wild guess but...

There are a few things that can force a short l;ived object to persist past the eden and into the next generation(s).

The chief among these is attaching anything to it that requires special GC handling.  This includes finalizers,
and all the Reference types (weak, soft, phantom).

Maybe there is some object  being rapidly created and disposed that, in teh 1.5 library, has one of these atached to
it?

JK

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #15 - Posted 2005-07-29 10:17:28 »

The actual time spent GC'ing isn't really the issue though, it's the fact that the whole system stops - completely - while it's happening, not just the VM!

Cas Smiley

Offline Raghar

Junior Duke




Ue ni taete 'ru hitomi ni kono mi wa dou utsuru


« Reply #16 - Posted 2005-07-29 18:28:47 »

Cas could you change Eclipse into metal look and feel? IIRC there is version without SWT.
Could you write an application that would generate the same shudder like Eclipse? How looks JRockit? How works Linux VM on the same computer? Try to set Eclipse to low priority. IIRC only things that are really high priority areA recompiler and GC threads, but it could change its behaviour somehow.
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #17 - Posted 2005-07-30 11:32:31 »

Hey, this isn't just Eclipse, it's all stuff running under Java 5. Eclipse just generates plenty of garbage. I don't have much of a problem with my games, for example, because I generate very little garbage and mostly the gameplay hides these machine pauses anyway.

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #18 - Posted 2005-07-30 16:06:05 »

The chief among these is attaching anything to it that requires special GC handling.  This includes finalizers,
and all the Reference types (weak, soft, phantom).

Maybe there is some object  being rapidly created and disposed that, in teh 1.5 library, has one of these atached to
it?

Well it seems that the weak refs are what do it then, notice this message in the CMS phase " [weak refs processing, 0.0123378 secs]"

I had suspected weak refs might cause objects to linger around a bit, it makes sense really.

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #19 - Posted 2005-07-30 16:10:13 »

Armed with this knowledge of how to generate garbage that gets promoted to the older generation using weak references, perhaps now it won't be very hard to write a very simple test case to hand over to Sun so they can see the effect of the full-sweep GC on the rest of the system. 

This issue is very important to resolve for client-side Java.  I imagine Sun does a lot of their GC tests server-side with huge heaps where they just want the GC to go as fast as possible and they aren't paying any attention to how it affects other processes.

Offline trembovetski

Senior Duke




If only I knew what I'm talking about!


« Reply #20 - Posted 2005-08-01 01:13:48 »

The garbage that's being generated by the JProgressBar in windows l&f is likely
to be the image cache that components in win l&f use when rendering.

I'm not entirely sure how it works in swing but they try to cache some of the
pieces of the componets depending on its state. Think of it this way: the L&F
code asks windows to render a button to an offscreen image, and then uses that
image to display the button. This image is cached. There are several states of the button
(rolled over, disabled, etc), so there are multiple images per component. But when a
button is resized, some of those images become invalid and has to be thrown away..

Since the progress bar is constantly changing they're probaby generating and throwing away a
 lot of images.

Now those images' native resources are disposed of using reference queue which
is ran on a special Disposer thread (you can see it in the stack trace as Java2D Disposer).
This is done to avoid the use of finalizers and speed up the disposal.
Note that this disposer thread is a high priority thread, which may caused the starvation of
other threads if plenty of garbage is generated and the thread is busy getting rid of it.

[added] Also, if those images were accelerated (that is, cached in vram which they're likely
to be), then the disposal may involve DirectDraw/Direct3D, wihch means some global system-wide
locks may be taken by the DirectX, which could worsen the situation..

I've pointed swing folks at this thread, may be they can shed more light on why the jprogressbar's
caching scheme is so suboptimal in this case.
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #21 - Posted 2005-08-01 10:08:00 »

OK, I've tried several microbenchmarks to hammer the GC, including SoftReferences, WeakReferences, finalizers, and even BufferedImages and calling getGraphics and failing to dispose of it.

In all cases the CPU shoots up to 100% as expected but the jerking does not occur. This means that it's not a trivial GC issue, it's the interaction of the GC with some other part of the system which is causing the problem. It might possibly be something to do with the JIT.

All this Swing stuff and AWT etc. are red herrings, as I say, Eclipse makes no use of any of it whatsoever.

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #22 - Posted 2005-08-01 20:56:32 »

Now those images' native resources are disposed of using reference queue which
is ran on a special Disposer thread (you can see it in the stack trace as Java2D Disposer).
This is done to avoid the use of finalizers and speed up the disposal.
Note that this disposer thread is a high priority thread, which may caused the starvation of
other threads if plenty of garbage is generated and the thread is busy getting rid of it.

[added] Also, if those images were accelerated (that is, cached in vram which they're likely
to be), then the disposal may involve DirectDraw/Direct3D, wihch means some global system-wide
locks may be taken by the DirectX, which could worsen the situation..

I think the thread priority of the Java2D Disposer needs to be reworked.  It appears to be having a catestrophic affect on the system.  I've selected the CMS collector to reduce pause times, but I'm guessing that during the sweep phase of the CMS collector the Java2D Disposer is signalled and very nasty stuff happens that bring down the system for 100+ms.

Why must he Java2D disposer run at a high priority?  Better yet, can I hack things to lower the priority? Hmmm.

Cas, I think your experiment might have failed because it did not create work for this Java2D disposer.

Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #23 - Posted 2005-08-02 09:26:11 »

I had hoped to - I created tons of RGB BufferedImages and got the graphics out of each one and rendered a black rectangle into it.

But it's still a red herring. SWT doesn't use the Java2D disposer, yet Eclipse is one of the worst sufferers of the jerking.

Cas Smiley

Offline trembovetski

Senior Duke




If only I knew what I'm talking about!


« Reply #24 - Posted 2005-08-02 17:13:15 »

Quote
Why must he Java2D disposer run at a high priority?  Better yet, can I hack things to lower the priority? Hmmm.

Because otherwise a thread which creates tons of images will starve the collector and we'll get OOE.
This is a bug the introduction of the Disposer had addressed.

If you can build the mustang workspace, you can easily experiment with the disposer.
The file is j2se/src/share/classes/sun/java2d/Disposer.java .
You don't need to rebulid the whole workspace, though: you can just hack it, compile the file
to some directory and prepend it to the bootclasspath.

Dmitri
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #25 - Posted 2005-08-04 13:31:05 »

PROBLEM SOLVED

(Would appreciate if you could escalate this for me Trembo/Jeff)

I got suspicious about the whole thread priority issue so I looked at javaw.exe through PView on NT, and lo and behold, there were several time critical priority 15 threads running!!! DUH!!! I'm not sure what nincompoop decided that messing around with the thread priority mappings between JDK major releases was in any way wise at all.

Now it occurs to me the very same problem manifested itself to me when my games stopped responding to mouse and keyboard input on some systems. I used to run the main loop at Thread.MAX_PRIORITY but on Java 5.0 the main loop was mapped into time-critical priority - so DirectX never got to poll the bloody keyboard and mouse!

The fix is simple: remap the offending Java priorities to more reasonable NT ones:

-XX:JavaPriority10_To_OSPriority=10 -XX:JavaPriority9_To_OSPriority=9

Please patch this in the next minor update of 1.5.0 as it has such terrible and wide-ranging mysterious consequences.

Eclipse now runs better than it has ever, ever done - it feels utterly smooth and native now. I'm flabberghasted at the transformation from powerful-but-clunky-and-jerky IDE into butter-smooth ultraIDE.

Cas Smiley

Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #26 - Posted 2005-08-04 13:58:09 »

After a little research what I find interesting is that priority 15 is in a different priority class!

On Win32:

NORMAL is priority 8.

For a Win32 process the thread priorities can only be adjusted +/- 2 within a priority class.  So theoretically if java.exe is running as a normal priority application the max thread priority would usually be 10 anyway.  That would correspond to setting the thread to a relative priority of HIGHEST within a NORMAL priority class.

It seems though that the JRE is using the TIME_CRITICAL setting to rail the priority up to the highest possible for non-realtime processes (15).  (You need to have admin rights to go above 15 and run as a realtime process.)  Interestingly enough, java at priority 15 was able to starve our driver thread at priority 31... that either says something about the quality of the windows scheduler, or there are icky things happening with driver calls and/or system-wide locks in the java threads.

The first thing to suspect of course whenever an application causes stuff to starve/freeze is the graphics drivers.  They are notorious for being very unfriendly to the system.  We are having a hell of a time finding systems that don't lock up after a few hours.  Even with noddraw=true the graphics drivers will get stuck in an infinite loop occasionally.   What is quite cool is that I got a real error message from Windows the last time this happened.  It actually said in plain english (after a reboot) that the nVidia driver (nv_disp) was stuck in an infinite loop.

See http://emea.windowsitpro.com/Windows/Article/ArticleID/302/302.html


Offline Raghar

Junior Duke




Ue ni taete 'ru hitomi ni kono mi wa dou utsuru


« Reply #27 - Posted 2005-08-04 17:56:40 »

Congratulation Cas.

BTW how  Eclipse behaved when set into lower priority?
Offline princec

JGO Kernel


Medals: 407
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #28 - Posted 2005-08-04 20:18:15 »

Due to the craziness that is NTs priorities, if you set java.exe to, say, idle priority, you get threads at priority 2 and 4 etc. right the way up to 15 Smiley It just spreads them out more.

Cas Smiley

Offline trembovetski

Senior Duke




If only I knew what I'm talking about!


« Reply #29 - Posted 2005-08-04 21:17:19 »

Quote
(Would appreciate if you could escalate this for me Trembo/Jeff)

Please file a bug and include all this information..
I've asked vm folks to look at the thread, but a bug is always better.

Dmitri
Pages: [1] 2
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Longarmx (52 views)
2014-10-17 03:59:02

Norakomi (43 views)
2014-10-16 15:22:06

Norakomi (33 views)
2014-10-16 15:20:20

lcass (38 views)
2014-10-15 16:18:58

TehJavaDev (68 views)
2014-10-14 00:39:48

TehJavaDev (68 views)
2014-10-14 00:35:47

TehJavaDev (60 views)
2014-10-14 00:32:37

BurntPizza (73 views)
2014-10-11 23:24:42

BurntPizza (45 views)
2014-10-11 23:10:45

BurntPizza (86 views)
2014-10-11 22:30:10
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!