Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (805)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (868)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Massive internal JEmalloc/Nvidia driver memory leak?  (Read 8888 times)
0 Members and 2 Guests are viewing this topic.
Offline theagentd
« Posted 2016-01-04 11:51:35 »

I think I just found a massive bug in JEmalloc that is fairly complicated but 100% consistent to reproduce for me.

I'm allocating "pages" of vertex attribute data each frame and deallocating them again at the end of each frame. I have this little test program comparing my old and new rendering system, and if I start off at a specific one and switch to another one memory usage quickly rises until I'm out of 16GBs of RAM (takes around 15 second). The Java heap memory is constant as nothing is being allocated after the rendering loop has started. Since the memory usage goes far above my 2GB Java heap size limit, it must be JEmalloc.

I tried enabling the debug allocator which prints memory leaks on program exit. This is the result of a run where it runs until it crashes. Note that the OutOfMemoryError is thrown by my own code when je_malloc() returns 0.
Quote
Exception in thread "main" java.lang.OutOfMemoryError: failed to allocate memory
   at engine.util.gl.gfx.SmartByteBuffer.reserve(SmartByteBuffer.java:51)
   at engine.util.gl.gfx.Test2DShaderProgramAdvanced.render(Test2DShaderProgramAdvanced.java:32)
   at engine.test.GLUtilTest.main(GLUtilTest.java:220)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1010
   at org.lwjgl.system.libffi.Closure.<init>(Closure.java:132)
   at org.lwjgl.system.libffi.Closure$Void.<init>(Closure.java:285)
   at org.lwjgl.glfw.GLFWErrorCallback.<init>(GLFWErrorCallback.java:41)
   at engine.util.gl.glfw.WindowManager$1.<init>(WindowManager.java:31)
   at engine.util.gl.glfw.WindowManager.<init>(WindowManager.java:31)
   at engine.test.GLUtilTest.main(GLUtilTest.java:58)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1020
   at org.lwjgl.system.libffi.Closure.<init>(Closure.java:132)
   at org.lwjgl.system.libffi.Closure$Void.<init>(Closure.java:285)
   at org.lwjgl.glfw.GLFWWindowPosCallback.<init>(GLFWWindowPosCallback.java:35)
   at engine.util.gl.glfw.Window$1.<init>(Window.java:56)
   at engine.util.gl.glfw.Window.<init>(Window.java:56)
   at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:186)
   at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:154)
   at engine.test.GLUtilTest.main(GLUtilTest.java:60)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1030
   at org.lwjgl.system.libffi.Closure.<init>(Closure.java:132)
   at org.lwjgl.system.libffi.Closure$Void.<init>(Closure.java:285)
   at org.lwjgl.glfw.GLFWFramebufferSizeCallback.<init>(GLFWFramebufferSizeCallback.java:35)
   at engine.util.gl.glfw.Window$2.<init>(Window.java:74)
   at engine.util.gl.glfw.Window.<init>(Window.java:74)
   at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:186)
   at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:154)
   at engine.test.GLUtilTest.main(GLUtilTest.java:60)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1040
   at org.lwjgl.system.libffi.Closure.<init>(Closure.java:132)
   at org.lwjgl.system.libffi.Closure$Void.<init>(Closure.java:285)
   at org.lwjgl.opengl.GLDebugMessageARBCallback.<init>(GLDebugMessageARBCallback.java:33)
   at engine.util.gl.debug.DebugCallbackHandler.<init>(DebugCallbackHandler.java:9)
   at engine.test.GLUtilTest.main(GLUtilTest.java:63)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1050
   at org.lwjgl.system.libffi.Closure.<init>(Closure.java:132)
   at org.lwjgl.system.libffi.Closure$Void.<init>(Closure.java:285)
   at org.lwjgl.glfw.GLFWKeyCallback.<init>(GLFWKeyCallback.java:35)
   at engine.test.GLUtilTest$1.<init>(GLUtilTest.java:81)
   at engine.test.GLUtilTest.main(GLUtilTest.java:81)

In other words, the only memory I'm leaking is internal memory allocated by the GLFW callbacks (I didn't clean up my windows and callbacks properly since it died from an exception escaping main()).

 - I can only reproduce it when I start the program at a specific render test and switch to another one. Both use JEmalloc in the exact same, and I only get leaks when using them in this specific order.
 - If I start on a different renderer I can't reproduce it.
 - If I start it on the first renderer and switch to the second one correctly, then back to the first one again the increase permanently stops even if I switch back to the one that triggered the leak before.
 - No leaks are reported by the debug allocator.

Will do more investigations and post the program so you guys can reproduce it soon.

Myomyomyo.
Offline theagentd
« Reply #1 - Posted 2016-01-04 12:42:08 »

I've eliminated all memory leaks (properly dispose window callbacks, error callbacks, etc). Memory usage is still rising.

New observations:

 - Memory usage slowly returns to normal after switching back to the first system.
 - After the leak starts, switching to good old glBegin()-glEnd() rendering makes the leak continue. This could be an Nvidia driver memory management bug. Could JEmalloc be interfering with it maybe?

Myomyomyo.
Offline SHC
« Reply #2 - Posted 2016-01-04 14:43:12 »

I have also seen the memory usage thing with LWJGL3, but I thought it might be something with my program. In my case, it went up 1 MB per 30-50 seconds, and once the memory is reaching ~200 MB it's getting back at 2 or 3, might it be the garbage collector doing it's work.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Spasi
« Reply #3 - Posted 2016-01-04 14:57:32 »

Note that the OutOfMemoryError is thrown by my own code when je_malloc() returns 0.

Do you use jemalloc directly? The debug allocator only tracks memory allocated via MemoryUtil (memAlloc(), memFree(), etc). Another advantage of using MemoryUtil is that you can easily switch to a different allocator with the Configuration.MEMORY_ALLOCATOR option.
Offline Spasi
« Reply #4 - Posted 2016-01-04 15:41:52 »

Could you please try this jemalloc build? It includes this fix, which sounds relevant to what you're describing.
Offline theagentd
« Reply #5 - Posted 2016-01-04 16:52:29 »

Ah, snap. I am using JEmalloc directly. Will try MemUtil.

Sadly there was a change of plans and I won't have access to a computer for a few days. I feel a bit bad for screaming wolf and then ditching you when come to help... ._.

@SHC: In my case I'm seeing ~1GB of memory allocated off-heap per second, with the Java heap staying constant.

Myomyomyo.
Offline theagentd
« Reply #6 - Posted 2016-01-05 16:44:18 »

Double change of plans! I had a chance to test it today.

I replaced all JEmalloc calls with MemoryUtil. I see the memory allocated each frame totalling around 20MB at worst, but not anything else. If I switch to glBegin()-glEnd() I get no memory leaks at all while memory usage goes up to 14GB in a few seconds and then it crashes.


EDIT: The updated version of JEmalloc changes nothing. The memory seems to be allocated by the Nvidia driver.

EDIT2: hs_err log when out-of-memory process crash occurred during glBegin()-glEnd(). Seems like the compiler thread of Java crashed the program: http://www.java-gaming.org/?action=pastebin&id=1400

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 797



« Reply #7 - Posted 2016-01-05 17:15:18 »

Stupid question: Do you actually call glEnd() or more importantly swap buffers?
I experienced something like this a while ago: When I did not swap buffers (but did proper glBegin/glEnd) then the driver would buffer up / delay rendering commands until it crashed. I do not know whether that was due to going out of memory, though.
Offline theagentd
« Reply #8 - Posted 2016-01-05 17:25:31 »

Stupid question: Do you actually call glEnd() or more importantly swap buffers?
I experienced something like this a while ago: When I did not swap buffers (but did proper glBegin/glEnd) then the driver would buffer up / delay rendering commands until it crashed. I do not know whether that was due to going out of memory, though.
No, it happens completely without glBegin()-glEnd() as well, but thanks for asking. Everything is rendering properly, but while memory is leaking CPU performance is inhibited.

 - VRAM usage is constant.
 - No new buffers are created after the first 6 frames, and they're permanently mapped as persistent VBOs, so no memory leaking there (confirmed with my leakage detector as well).
 - JIT Compiler thread does not seem to do anything wrong when the crash occurs. It simply always triggers the out of memory crash for some reason.

EDIT:
 - Has a tendency to permanently break Aero on Windows...... -___-

This is most likely an Nvidia driver bug. I'm betting it's some kind of heuristic going AWOL.

Myomyomyo.
Offline Spasi
« Reply #9 - Posted 2016-01-05 18:06:57 »

Did you try Configuration.MEMORY_ALLOCATOR.set("system") to make sure this has nothing to do with jemalloc?
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline basil_

« JGO Bitwise Duke »


Medals: 418
Exp: 13 years



« Reply #10 - Posted 2016-01-05 18:08:44 »

does it happen if ...

you use only host-memory ? no mapped buffers at all, just passing the host-pointers.

i think you're right. to me, it sounds like the driver is reallocating/moving/optimizing the shit out of the VBO's, which is pretty much open to everything.

btw, did you check ARBDebugOutput ? at least for me it does tell about the optimisations (most time).
Offline thedanisaur

JGO Knight


Medals: 59



« Reply #11 - Posted 2016-01-05 18:16:55 »

Have you tried rolling back the driver yet?

Every village needs an idiot Cool
Offline theagentd
« Reply #12 - Posted 2016-01-06 11:51:25 »

Did you try Configuration.MEMORY_ALLOCATOR.set("system") to make sure this has nothing to do with jemalloc?
Confirmed, the leak still happens with this set as the first line in main().

@basil_
I actually had those debug messages disabled since they're so annoying, but I reenabled them. The only thing I see when I create and map my buffers is:
Quote
[LWJGL] ARB_debug_output message
   ID: 131185
   Source: API
   Type: OTHER
   Severity: DEBUG
   Message: Buffer detailed info: Buffer object 6 (bound to GL_UNIFORM_BUFFER_EXT, usage hint is GL_DYNAMIC_DRAW) will use SYSTEM HEAP memory as the source for buffer object operations.
   Stack trace:
java.lang.Exception: Stack trace
   at java.lang.Thread.dumpStack(Thread.java:1329)
   at engine.util.gl.debug.DebugCallbackHandler.invoke(DebugCallbackHandler.java:87)
   at org.lwjgl.opengl.GLDebugMessageARBCallback.callback(GLDebugMessageARBCallback.java:43)
   at org.lwjgl.system.JNI.callIPPIP(Native Method)
   at org.lwjgl.opengl.GL30.nglMapBufferRange(GL30.java:1751)
   at org.lwjgl.opengl.GL30.glMapBufferRange(GL30.java:1778)
   at engine.util.gl.buffer.simple.PersistentMappedBuffer.setCapacity(PersistentMappedBuffer.java:68)
   at engine.util.gl.buffer.simple.PersistentMappedBuffer.mapUnsafe(PersistentMappedBuffer.java:49)
   at engine.util.gl.gfx.GLGraphics.uploadFrameData(GLGraphics.java:115)
   at engine.test.GLUtilTest.main(GLUtilTest.java:228)
[LWJGL] ARB_debug_output message
   ID: 131185
   Source: API
   Type: OTHER
   Severity: DEBUG
   Message: Buffer detailed info: Buffer object 6 (bound to GL_UNIFORM_BUFFER_EXT, usage hint is GL_DYNAMIC_DRAW) has been mapped WRITE_ONLY in SYSTEM HEAP memory (fast).
   Stack trace:
java.lang.Exception: Stack trace
   at java.lang.Thread.dumpStack(Thread.java:1329)
   at engine.util.gl.debug.DebugCallbackHandler.invoke(DebugCallbackHandler.java:87)
   at org.lwjgl.opengl.GLDebugMessageARBCallback.callback(GLDebugMessageARBCallback.java:43)
   at org.lwjgl.system.JNI.callIPPIP(Native Method)
   at org.lwjgl.opengl.GL30.nglMapBufferRange(GL30.java:1751)
   at org.lwjgl.opengl.GL30.glMapBufferRange(GL30.java:1778)
   at engine.util.gl.buffer.simple.PersistentMappedBuffer.setCapacity(PersistentMappedBuffer.java:68)
   at engine.util.gl.buffer.simple.PersistentMappedBuffer.mapUnsafe(PersistentMappedBuffer.java:49)
   at engine.util.gl.gfx.GLGraphics.uploadFrameData(GLGraphics.java:115)
   at engine.test.GLUtilTest.main(GLUtilTest.java:228)

Nothing unexpected is printed when the memory leak starts and nothing at all is printed when the leak ends.


@thedanisaur
I have tried updating to the latest Nvidia drivers, but the problem remains (I had the previous version). I'll try to downgrade when I get the time.

Myomyomyo.
Offline Icecore
« Reply #13 - Posted 2016-01-06 14:22:34 »

Am..
(2 days)

Any source code? ^^
@SHC: In my case I'm seeing ~1GB of memory allocated off-heap per second, with the Java heap staying constant.
or + saved Heap (or analyze Yourself HEAP, any java profiler - for strange numbers).

try localize bug with 1-2 "clean" source files for profiling and reproduce

p.s Stack trace - can't help)
its show last Thread position - not bug (memory leak) call

pp.s sorry i fast read topic, can miss something, without source code it's like guessing what i see now behind window

Last known State: Reassembled in Cyberspace
End Transmission....
..
.
Journey began Now)
Offline thedanisaur

JGO Knight


Medals: 59



« Reply #14 - Posted 2016-01-06 17:26:14 »

Yeah the rollback is going to be the important one in seeing if it's a driver bug, assuming you updated and then the bug appeared (and they didn't notice on the latest release).

Edit: I really don't think this is a driver bug, if you don't call genBuffers() every frame there's no way that memory usage should increase due to mapping existing buffers. Once they're mapped the graphics card doesn't care what happens in RAM so it wouldn't hold onto it. It's far more likely that something is being missed with jemalloc, or rather how it's being used.

Every village needs an idiot Cool
Offline theagentd
« Reply #15 - Posted 2016-01-08 22:12:30 »

Sorry for the long response time. Thanks for your responses.

@thedanisaur
The thing is that once the leak is triggered memory usage continues to rise even after triggering the bug and then switching to simple glBegin()-glEnd() rendering. In that case memory usage increases to around 14GB at which point it stops, i.e. the driver "gracefully" handles the fact that it cannot allocate more memory. The next time the Java compiler thread tries to compile something its memory allocation will fail and crash the process. I have confirmed that I do not use JEmalloc directly at any point in the program, so it can't be JEmalloc, and since the debug allocator isn't saying I'm leaking it's probably not me. It's not OpenGL buffers since I have code that scans buffers with glIsBuffer() and also queries the buffer size to calculate the total amount of memory allocated for buffers, which remains constant. Since the bug is maintained by simple calling glBegin()-glVertex2f()-glEnd() a few thousand times per frame followed by swapping buffers and handling GLFW events, the logical conclusion is that the driver is doing something.

I'm using the debugger in Eclipse to step through this loop one iteration at a time:
1  
2  
3  
4  
5  
6  
               for(int i = 0; i < numPoints; i++){
                  Vector2f v = points[i];
                  glBegin(GL_POINTS);
                  glVertex2f(v.x, v.y);
                  glEnd();
               }

And every 250 iterations or so the memory usage goes up by ~64 kbs or something it seems. Here's an outline:

>Start program at setting 0.
>Switch to other renderer that writes a lot of uniform buffer data each frame. Bug is now activated.
>At this point, every single OpenGL call seems to leak memory no matter which renderer is used besides 0.
>Switching back to 0 causes the memory leak to permanently stop until the program is restarted, and memory usage slowly falls to the original value (~100MBs/sec until stabilizing at ~110MBs).



 - I've closed all programs that I believe could interfere, still happens.
 - Once again confirmed my memory usage is constant and that buffers are being reused each frame.
 - Threaded Optimization does not matter.
 - Unsynchronized or persistently mapped buffers make no difference.
 - While the bug is in effect FPS is reduced. When the bug is permanently ended by switching back to 0 FPS rises by around 40% for all renderers.

@Icecore and everyone else
I really don't know where to proceed from here on. Is there any way to trace which library/.dll-file is allocating all this memory to confirm that it's the driver? I'm not sure a driver rollback makes sense as this happened before I updated to the latest driver too (so confirmed for current and previous drivers). I am however gonna throw together an executable and go run it on my Intel GPU PC.




EDIT: CONFIRMED! DOES NOT HAPPEN ON AN INTEL GPU! Memory usage remained at a beautifully constant 65 MB even when I tried to trigger the bug. Also, Intel GPUs have a much better uniform buffer offset alignment of 16 instead of Nvidia and AMD's 256, which means a lot less wasted memory on padding.

Myomyomyo.
Offline Icecore
« Reply #16 - Posted 2016-01-09 01:12:29 »

I really don't know where to proceed from here on.
- Сreate the smallest source file you can
- Post it - Wait same result from ppl
- Send source to library developers causing current bug (LWJGl?) - Wait respund
- Send source GPU driver provider

Is there any way to trace which library/.dll-file is allocating all this memory to confirm that it's the driver?
You can find Java library source code and trace in eclipse, for dll you can use visual studio.
(reverse library source)

And every 250 iterations or so the memory usage goes up by ~64 kbs

for trace use in diff places
http://stackoverflow.com/questions/1058991/how-to-monitor-java-memory-usage

+ add before frame
System.gc();
and wait(1) - to give VM time clear memory

p.s
-Switch to other renderer
have no idea why you do this ^^
this looks like a bug itself (multi render context - looks for me unstable)

pp.s i think i understand what happening - when you switch render context,
and write data to first context - data is written to buffer (wait until switch back to send GPU)
because you not switching back - buffer grows

if i right - problem can be on top,
when creating second render context or swithing it,
during this process can be error - that you may forget to catch,
and using context - create some unusual behavior
(or maybe its "library opengl wrapper" mistake, to catch error)

Last known State: Reassembled in Cyberspace
End Transmission....
..
.
Journey began Now)
Offline HeroesGraveDev

JGO Kernel


Medals: 383
Projects: 11
Exp: 4 years


┬─┬ノ(ಠ_ಠノ)(╯°□°)╯︵ ┻━┻


« Reply #17 - Posted 2016-01-09 03:13:46 »

@Icecore and everyone else
I really don't know where to proceed from here on. Is there any way to trace which library/.dll-file is allocating all this memory to confirm that it's the driver?

On linux there's valgrind for debugging these sorts of problems, but assuming you still haven't got a linux box/VM up and running (also the bug might be a platform-specific thing anyway), that's not exactly helpful.

A quick search on the internet turns up with https://github.com/dynamorio/drmemory, which at least has the memory-leak checking aspect.

Pages: [1]
  ignore  |  Print  
 
 

 
Riven (586 views)
2019-09-04 15:33:17

hadezbladez (5526 views)
2018-11-16 13:46:03

hadezbladez (2409 views)
2018-11-16 13:41:33

hadezbladez (5788 views)
2018-11-16 13:35:35

hadezbladez (1229 views)
2018-11-16 13:32:03

EgonOlsen (4667 views)
2018-06-10 19:43:48

EgonOlsen (5685 views)
2018-06-10 19:43:44

EgonOlsen (3204 views)
2018-06-10 19:43:20

DesertCoockie (4103 views)
2018-05-13 18:23:11

nelsongames (5121 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!