Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (480)
Games in Android Showcase (110)
games submitted by our members
Games in WIP (546)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 [2] 3
  ignore  |  Print  
  JAVA IS SLOW!  (Read 10881 times)
0 Members and 1 Guest are viewing this topic.
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #30 - Posted 2004-01-14 13:01:27 »

Quote
Aha, now if the problem is scaling geometrically on Java  and not in C it points to something else that's scaling wrongly rather than your direct interpretation of the port. Could it be GC?


I'm not exactly sure how the performance changes with the size of the data set (not enough data set sizes to test on yet), but I do know that it sure as heck isn't linear. It's possible that it could be gc. I should just really take the time to run the profiler again.

Quote
Ask yourself this though before losing more hair - would you rather have an implementation with guaranteed safe GC and null pointer detection that runs slower, or a super-fast but entirely unsafe version of the algorithm?


That's like asking me if I wouldn't prefer bubble-sort over a mergequick-sort, because it's easier to write a bubble-sort. Maybe if I was only dealing with 100 objects. I think we would have written it in assembly if we could have gotten the same performance improvements. (I don't believe rewriting the app in assembly would really make a discernable difference, but the point holds).

In general, yes, we prefer Java over C++. C++ just doesn't scale well over more than a few senior engineers who are well-versed in the language.

Quote
As this is a serverside task your doing IIRC, you're probably better off trading performance for reliability and investing your money in faster processors for the server. Programmer time is very expensive indeed compared to CPU upgrades!


First, it will cost every single one of our customers more money for faster servers which makes up for the one-time software engineering costs. Second, you won't find much faster processors than 2.4GHZ. Third, the difference in memory speed is even smaller. Fourth, it all doesn't make any difference anyway because small linear hardware speedups don't make up for geometric software slowdowns. The Java version doesn't perform fast enough on even the smallest data sets.

Quote
This is, of course, no real excuse for the algorithm being so slow in Java. What parameters are you running your benchmark with?


It's not really a benchmark anymore - it's a small application (that depends heavily upon the fibonacci heap). Depending upon the data set size, we run it with max heap set anywhere from 512M to 1.7G with -server set. We tried using different min heap sizes, but that doesn't really affect anything. We tried using the incremental garbage collector for kicks, but that gave us performance that was, urmmm.... garbage.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Mark Thornton

Senior Member





« Reply #31 - Posted 2004-01-14 18:19:07 »

Quote

Telling me to implement a d-heap is just skirting the issue and is utterly pointless as that data structure doesn't give me the algorithmic performance characteristics I need.

Runtime o() values aren't always helpful if the constant factors are sufficiently different. What you need is an algorithm that performs well in practice. Given the benchmark example of princec, with the string creation taken out of the loop I get 4.5s on my machine for your code. A completely straight leftist tree written in 30 minutes from a text book takes 2.4s for the same test (and gives the same results). The Leftist tree code including benchmark comes to 121 lines compared with 503 for the fibonacci heap. I have done no performance tuning on it at all.

The fib heap should give an amortised o(1) time for insert vs o(log n) for the leftist tree. Both have o(log n) for deleteMin. The fib heap also has o(1) for decreaseKey.

Next thing to note is that the benchmark is not representative of any practical use of such trees. In path finding the maximum size of the tree is often about sqrt(n) where n is the total number of nodes inserted during the algorithm. The decreaseKey operation in my experience doesn't happen often enough to spend much time optimising. My usual problem is computing minimum cost paths in road networks containing upto 2.5 million nodes and 3.3 million links. My latest Java code for this is an order of magnitude faster than the C++ code which preceded. Of course I changed the algorithm, but this was assisted by the ease of writing in Java, and the algorithm actually makes use of the behaviour of the garbage collector. It would be difficult to achieve the performance without a garbage collector (or an unlimited memory budget)!

So there you have it; C++ is REALLY REALLY SLOW!!! :-)
Offline Mark Thornton

Senior Member





« Reply #32 - Posted 2004-01-14 18:34:28 »

Incidentally the size of the node object is quite a bit less for the leftist tree than for the fibonacci heap. Running with the -server option the times are 1.6 for the leftist tree vs about 3 for the fib heap. However the timing for the fib heap was much more variable than that for the leftist tree, possibly because of the extra data being allocated (more gc operations).

I also put a loop around princec benchmark and ran it quite a few times to make sure the JIT was properly warmed up.

The following is an adequate description of a leftist tree.

http://www.dgp.toronto.edu/people/JamesStewart/378notes/10leftist/
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #33 - Posted 2004-01-14 19:00:26 »

Quote

Runtime o() values aren't always helpful if the constant factors are sufficiently different.


No, really? *rolls eyes* I guess that's why we use mergequickinsertsort instead of mergesort. *sigh*

Quote
The fib heap should give an amortised o(1) time for insert vs o(log n) for the leftist tree. Both have o(log n) for deleteMin. The fib heap also has o(1) for decreaseKey.


Yes indeed - those are the performance characteristics for a fib heap.

Quote
Next thing to note is that the benchmark is not representative of any practical use of such trees.


Yes, and the "application" which I mentioned in my last post is quite representative of a practical use of such trees. It is the application which we are benchmarking now, and not just the fibonacci heap.

Quote
The decreaseKey operation in my experience doesn't happen often enough to spend much time optimising.


I agree that decreaseKeys are less important for _our particular application_, but that doesn't negate the fact that they _are faster_ and that binary heaps have O(log n) costs to pay on inserts.

In general, whether decreaseKey is important for you or not depends upon exactly what algorithm you are implementing. Standard Dijkstra's (regardless of the type of queue involved) requires an inordinate amount of decreaseKeys and was precisely one of the reasons the fibonacci heap was invented.

Quote
My latest Java code for this is an order of magnitude faster than the C++ code which preceded. Of course I changed the algorithm, but this was assisted by the ease of writing in Java, and the algorithm actually makes use of the behaviour of the garbage collector. It would be difficult to achieve the performance without a garbage collector (or an unlimited memory budget)!


I'd love to know what you mean by "taking advantage" of the garbage collector.  I can guarantee you that I can write a C++ application that spends less cpu cycles in memory management than your Java application. In fact, these kinds of applications just scream aloud for manual memory management.

Quote
So there you have it; C++ is REALLY REALLY SLOW!!! :-)


Now you're just plain being silly, Mark. What you're trying to tell me is to use a slower data structure, because current Java VMs aren't capable of handling a faster data structure. Yes, that certainly encourages me to stick with Java.

Look folks, I'm not trolling here. I just want fast running Java code, and it's not happening. Mark managed to eek out a constructive comment, which was that he finds that the current vms aren't good with fibonacci heaps (for whatever reason that may be). So, I'll try out a 2-heap with the Java version.

Thanks Mark.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Mark Thornton

Senior Member





« Reply #34 - Posted 2004-01-14 19:28:25 »

Quote

implementing. Standard Dijkstra's (regardless of the type of queue involved) requires an inordinate amount of decreaseKeys and was precisely one of the reasons the fibonacci heap was invented.

It depends on the data --- on my data the number of decreaseKeys is small relative to the number of inserts.

Quote

I'd love to know what you mean by "taking advantage" of the garbage collector.

Many of the paths are very short, thus only a small part of the total graph need be scanned. However you still have to initialise all the nodes (or at least appear to). I use special labels which can be distinguished as belonging to a previous computation. I end up with an unknown number of references to these  and just leave it to the garbage collector to dispose of them when they are no longer relevant. This is something that a good garbage collector can do very efficiently. The simplest solution in C++ would be to just tolerate a slow leak or do reference counting which is nasty and not all that fast.

Quote

Now you're just plain being silly, Mark. What you're trying to tell me is to use a slower data structure, because current Java VMs aren't capable of handling a faster data structure. Yes, that certainly encourages me to stick with Java.

Well there may be a better way to implement a fib heap in Java, but the complexity of the algorithm makes it more difficult to determine where the performance problem really lies. You might want to try using a red/black tree for comparison as well. Sometimes it is easier to determine the best implementation if you have a selection of competing alternatives --- they may offer a hint as to what is going wrong with the one that ought to be best according to theory.

Best of luck,
Mark
Offline crystalsquid

Junior Member




... Boing ...


« Reply #35 - Posted 2004-01-14 19:42:30 »

As a C++ programmer who has recently moved to Java, there are 2 aspects of your code that I would happily do in C++, but would avoid like the plague in Java (perhaps wrongly, but...) and they are:

assert() - lovely function, but no pre-compiler means its still gonna give you a hit at runtime.

Inlining - or the complete lack of it. I would suspect the asserts, and all your lovely access functions would not be removed, and if I recall correctly, they incur what would be a virtual function overhead (jump to pointer from table) that is gonna screw up the branch prediction on the CPU.

Now in an optimised C++, the aggresive inline would make all these go away, and I would expect a 4x speed up over the non-inlined version.

That array allocate also scares me a bit. Are you sure clearing a static array is really slower than allocating a new one?!? How about arraycopy'ing a zero'd array over the static one - rumour has it that arraycopy is nice & fast.

Java dudes - feel free to shoot my paranoid fantasies down in flames Smiley

- Dom
Online princec

JGO Kernel


Medals: 360
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #36 - Posted 2004-01-14 20:18:49 »

The server VM completely optimises asserts away, and performs excellent - better than C++ from what I can tell - inlining of code. Even the virtual function calls get eliminated, ISTR.

The client VM is nowhere near as hot Sad

The array allocation is extremely cheap in Java, nearly the same speed as a stack allocation in C++. The GC is also similarly very fast as the allocated arrays never make it out of Eden.

Have you tried -XX:MaxInlineSize=128 (or better) with -server?
(And use -XX:CompileThreshold=1000 to shorten the time it takes to get benchmarking results)

Cas Smiley

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #37 - Posted 2004-01-14 20:20:44 »

Quote

It depends on the data --- on my data the number of decreaseKeys is small relative to the number of inserts.

Yes, that is true too. Dense networks make for lots of decreaseKeys.

Quote
Many of the paths are very short, thus only a small part of the total graph need be scanned. However you still have to initialise all the nodes (or at least appear to). I use special labels which can be distinguished as belonging to a previous computation. I end up with an unknown number of references to these  and just leave it to the garbage collector to dispose of them when they are no longer relevant. This is something that a good garbage collector can do very efficiently. The simplest solution in C++ would be to just tolerate a slow leak or do reference counting which is nasty and not all that fast.


I don't scan the entire graph either. The only dynamic allocation I perform per route is a small constant (about 100 allocations of perhaps a few Kb total). I allocate all other references from pools and bulk release those references when I'm done. Allocation and deallocation are almost exactly free.

Quote
Well there may be a better way to implement a fib heap in Java, but the complexity of the algorithm makes it more difficult to determine where the performance problem really lies.


Part of the benefit of implementing the fibheap in C++ is that the allocation of the fibheap nodes is no longer a cost. Unfortunately, pooling/recycling large numbers of small objects in Java typically slows down an application, where it can be used in C++ to effectively entirely remove the cost of allocation and deallocation.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Jeff

JGO Coder




Got any cats?


« Reply #38 - Posted 2004-01-14 21:10:03 »

Yep all of the above is true Smiley

Don't have much to add except to say the last 2 posters have it 100% correct Smiley

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline Mark Thornton

Senior Member





« Reply #39 - Posted 2004-01-15 05:51:10 »

Quote
int largestDegree = ( int ) Math.floor( Math.log( size ) / LOG_PHI );


I don't suppose there is any performance problem in this bit? I haven't tested it, it is just one of the items that caught my eye as worth checking.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Mark Thornton

Senior Member





« Reply #40 - Posted 2004-01-15 05:55:48 »

Quote

I allocate all other references from pools and bulk release those references when I'm done. Allocation and deallocation are almost exactly free.

I wrote two versions of my code, one recycling node objects via a pool (quite easy to do) and the other just leaving it to the GC. The performance gain from using the pool was unmeasureable (smaller than the variation in run time) and had the expense of extra lines of code and reduced clarity.
This is one problem with the benchmark in that the nodes live long enough to be copied out of eden whereas in real applications they don't last that long.

Offline Mark Thornton

Senior Member





« Reply #41 - Posted 2004-01-15 07:26:22 »

My copy of "Introduction to Algorithms" (CLR) has this to say about Fibonacci heaps after noting their use in the asymptotically fastest algorithms for paths on dense graphs.

Quote
From a practical point of view, however, the constant factors and programming complexity of Fibonacci heaps make them less desirable than ordinary binary (or k-ary) heaps for most applications. Thus Fibonacci heaps are predominantly of theoretical interest.

Offline Mark Thornton

Senior Member





« Reply #42 - Posted 2004-01-15 17:05:47 »

Quote

int largestDegree = ( int ) Math.floor( Math.log( size ) / LOG_PHI );

Changing the division to multiply by the reciprocal (pre calculated) saves about 3%. Replacing the calculation with a really crude integer approximation saves around 10%. The algorithm works for any value larger than that calculated as shown. So the constant 28 will work for all heap sizes up to 1000000.
Offline Jeff

JGO Coder




Got any cats?


« Reply #43 - Posted 2004-01-15 19:58:18 »

Toby,

I'd challenge you to write an app in C++ that has a large number of short lived objects that can do better then Hotspot using manual memory management.

Eden allocation consists of a single pointer increment.  Eden usage is tracked with some very very clever datastructures.  If
the eden contains totally short lived objects whose life is over at the time of collection, deallocation of ALL of them is again 1 pointer change.

Mark's point which is VERY legitimate, is that by pooling your objects you keep them alive.  By keeping them alive you DEFEAT the eden and end up creating a lot MORE work for the gc in moving them out of the eden and into long lived storage.

Hotspot GC is optimized for short lived objects because
(1) if you write good object oriented code then you WILL create a LOT of these short lived objects.
(2) Long lived objects are by definition, an issue much less often just because of their life-cycle.

Could you equal Hotspots allocation? Sure, by writing it over again, its all in C after all. But its very tricky, very highly evolved C code.

This is part of the problem with modern VMs.  They are much more sophisticated then people realize and thus people tend to imagine they are working in very different ways then they really are.  This can lead you to bad conclusions if you are not very careful.

Oh and Mark's dead on, though terse, about Order if i understand him.  Order is a scalability measure.  By definition the larger N the more order OF N matters.  The smaller the N, the more your constants effect the equation.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline gregorypierce

Senior Member




I come upon thee like the blue screen of death....


« Reply #44 - Posted 2004-01-15 20:24:59 »

Quote
Oh and Mark's dead on, though terse, about Order if i understand him.  Order is a scalability measure.  By definition the larger N the more order OF N matters.  The smaller the N, the more your constants effect the equation.


Then why doesn't Sun create a performance white paper, post it on java.net and put an end to the insanity once and for all. People complained that the PS2 was slow until Sony showed them how to make it fast. If people are making the wrong assumptions, then an expert (i.e. Sun) needs to tell them which assumptions are valid, which are invalid, and which are the worse case things. The hardware manus do it over and over again every generation as do the videocard manus. I haven't seen a modern performance whitepaper come out of Sun in almost 2 generations of JVM.

Can't blame people for doing things wrong if they've never been told the RIGHT way to do things. You can't expect people to go and try to figure it out for themselves. If the optimal path is not obvious, expect people to do something not optimal.

That's just my 0.02.

http://www.gregorypierce.com

She builds, she builds oh man
When she links, she links I go crazy
Cause she looks like good code but she's really a hack
I think I'll run upstairs and grab a snack!
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #45 - Posted 2004-01-15 20:47:25 »

Quote
Toby,
I'd challenge you to write an app in C++ that has a large number of short lived objects that can do better then Hotspot using manual memory management.

Eden usage is tracked with some very very clever datastructures.  If
the eden contains totally short lived objects whose life is over at the time of collection, deallocation of ALL of them is again 1 pointer change.


Yep. I'm pretty much doing that right now. It costs a pointer increment to allocate and costs me nothing to deallocate. Of course, I don't have to spend cycles tracking what the lifetimes of my objects are, because I just know what they are. I can also guarantee that this optimization occurs where I need it to occur, and Hotspot can't seem to figure it out. Maybe my objects are too long lived compared to Mark's?

Quote
Mark's point which is VERY legitimate, is that by pooling your objects you keep them alive.


I don't pool them in the Java code, but I do in my C++ code. That was my point. The objects aren't allocated in a single function and go away - like normal stack objects. They are pretty much around for the duration of the graph traversal - because they have to be. HotSpot probably isn't smart enough to figure out how to optimize that allocation - though I can't say for sure because I haven't profiled the Java version of the app yet.  I just know it runs _really_ slow compared to the C++ version.

Quote
Could you equal Hotspots allocation? Sure, by writing it over again, its all in C after all. But its very tricky, very highly evolved C code.


Well, I wrote the "very tricky, very highly evolved C++ code" - It's about seventy lines including comments and took me about half an hour.

Quote
This is part of the problem with modern VMs.  They are much more sophisticated then people realize and thus people tend to imagine they are working in very different ways then they really are.  This can lead you to bad conclusions if you are not very careful.


You're totally misunderstanding what's been said. Read my comments again. I specifically state that I'm not using pooling in Java code, because pooling works against the virtual machine (in the case of large numbers of small objects). I explicitly state that I use it in C++, because I can do it for free.

Quote
Oh and Mark's dead on, though terse, about Order if i understand him.  Order is a scalability measure.  By definition the larger N the more order OF N matters.  The smaller the N, the more your constants effect the equation.


Again - you aren't listening. Read my posts again. What do you think I'm saying when I talk about the difference between mergesort and quickmergeinsertsort? Believe me, when working with nearly 100 million objects, I'm quite aware of the importance of both constant factors and algorithmic complexities.

I don't have the time to say anything more right now, but I'll get to the rest (Mark's posts included) later.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Jeff

JGO Coder




Got any cats?


« Reply #46 - Posted 2004-01-15 21:03:14 »

I think we all missed the point that you werent pooling in Java. Apologies there.

As for the "trickey" code, sure it becomes a  lot easier if
(a) its sepcific to one data structure
and
(b) you never really have to worry abotu garbage.

The point about HS is that it gets the same kind of performance transparently on ALL short lived objects, even ones that aren't copious enough to make a pool make any sense, AND still allows for automated garbage collection.

I would say this, by the time you start building C pools you are doing C optimization.  This legitimizes any and all optimization of the Java code to Java's quirks in return.

And in the end, I've always maintained that *over-all* todays Java and C++ show equivalent execution performance for equivalently (not identically) optimized code.  This includes choosing the proper algorithms for the platforms.

There are some tasks and algorithms that java is naturally more suited to and will kick C++'s butt at (eg recrusive stuff).  There are some kinds of algorithms that C is naturally better at (eg complex access into arrays) and will always come out ahead on.

Over-all though on complex problems Java today gives about the same results, for a lot less work. I think thats quite an accomplishment.

So in conclusion your problem is you asked the wrong question.  the right question is not "why the heck isn't Java as fast as C on this algorithym designed for C" but instead "can someone produce the same results to the same or better degree of accuracy in the same time in Java?"

The second sounds like an interesting coding challenge for someone here with free cycles Smiley

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #47 - Posted 2004-01-15 21:06:54 »

Oh, and though I don't have the time, I just feel compelled to do some bitching about the overhead of object headers. In C++ I can instantiate 70 million objects just fine. Try that with Java, and you'll be hearing the wooshing sound of 560M of pure object header overhead flying by your head. No doubt the objects will be optimized for alignment, so you can be sure to hear hundreds of more megabytes of space go wooshing by in the form of padding.

What's the _only_ technique to combat this? Allocate everything as freaking arrays:

float node_x[]; // x values
float node_y[]; // y values

int[] link_to_node; // index of to node
int[] link_from_node; // index of from node

int[] node_link_indices; // indices of the links for individual nodes

and so on...

Now you have to use some annoying flyweight pattern all throughout your code. Plus you'll be suffering array access penalties just about everywhere. *sigh*

The C++ code is about 10x more readable than the Java code. It's rare that you'll hear me make a statement like that.

On the other hand, I can happily say that I'm enjoying the new JNI direct byte buffer facilities in JDK1.4. That was a stroke of pure brilliance.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Jeff

JGO Coder




Got any cats?


« Reply #48 - Posted 2004-01-15 21:09:01 »

I hear Shawn and Cas saying "Structures!"  right about now....

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #49 - Posted 2004-01-15 21:31:35 »

Argghhh... must respond....

Quote
As for the "trickey" code, sure it becomes a  lot easier if
(a) its sepcific to one data structure
and
(b) you never really have to worry abotu garbage.


It's actually for two data structures and would work for any others. (Welcome to the marvels of templates!)

Quote
The point about HS is that it gets the same kind of performance transparently on ALL short lived objects, even ones that aren't copious enough to make a pool make any sense, AND still allows for automated garbage collection.


On the other hand, stack allocation is trivial in C++, and in Java you just have to hope the VM is smart enough to handle it - or just plain squirm when it doesn't.

Quote
I would say this, by the time you start building C pools you are doing C optimization.  This legitimizes any and all optimization of the Java code to Java's quirks in return.


Absolutely. I started this post asking what could be done to make the Java! fibonacci heap fast.

Quote
And in the end, I've always maintained that *over-all* todays Java and C++ show equivalent execution performance for equivalently (not identically) optimized code.  This includes choosing the proper algorithms for the platforms.


Sometimes you can't "choose" your algorithms, because the fastest known algorithms only have one basic implementation.

Quote
So in conclusion your problem is you asked the wrong question.  the right question is not "why the heck isn't Java as fast as C on this algorithym designed for C" but instead "can someone produce the same results to the same or better degree of accuracy in the same time in Java?"


That fibonacci heap was not designed for C. I wrote it in Java _first_ and then _ported_ it to C++. The question was, "how can you make this faster - at all?"

Now the question is really, "why is the entire application slower by orders of magnitude" (which, by the way, was also first written in Java, then ported to C++), "and what can be done to possibly make it faster?".

Since I can't just post thousands of lines of code (and people probably wouldn't like that anyway), I'm going to have to resort to posting profiler results.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #50 - Posted 2004-01-15 21:34:09 »

Quote

Then why doesn't Sun create a performance white paper, post it on java.net and put an end to the insanity once and for all. People


Indeed; this is the same reason I've often requested someone (anyone, although Sun staff working on the VM would likely be a lot better at it than anyone else) do a regularly updated series on this (it was one of my suggestions for articles way back when...).

"regular" needs to be at least once per major Java version, and usually once per minor version too (lots of cases of performance "bugs" being fixed in minor version updates...)

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #51 - Posted 2004-01-15 21:43:24 »

Quote
Oh, and though I don't have the time, I just feel compelled to do some bitching about the overhead of object headers. In C++ I can instantiate 70 million objects just fine. Try that with Java, and you'll be hearing the wooshing sound of 560M of pure object header overhead flying by your head.


I. Feel. Your. Pain.

I did an academic project on evolving AI for computer games, via Genetic Programming (which is like GAlgos, except that it evolves parse-trees; this makes objects a much more natural data structure than packed arrays (which are trivial for GA)) - and made the twin mistakes of using Java and objects.

I had over 384MB RAM (this was about 4 years ago; maybe I had more), I stripped down the OS to about 40 MB total virtual memory, and still had to allocate many hundreds of MB additional swap space just to get a half-way decent evolution... Sigh. 50,000 parse-trees of arbitrary complexity implemented with one object per node is not a clever idea in java Smiley, especially if you are iterating over all of them several times in a row; java + OS also showed no signs of predictive paging on the iterations, which hurt too.

malloc will be first against the wall when the revolution comes...
Offline Jeff

JGO Coder




Got any cats?


« Reply #52 - Posted 2004-01-15 21:47:17 »

Quote
Absolutely. I started this post asking what could be done to make the Java! fibonacci heap fast.


Which is not QUITE the right question.  I'm not entirely sure if the point here is a fibonacci generator, which there are many approaches to, or a heap manager.  You'll pls excuse me as i dont have the cycles to dig in and figure the code out myself right now.

If the point is a heap allocator, then my question is "why?" at all when Java does your memory mangement for you.  if the point is a fibonacci series as output then the question is how to get that output as fast as this approach is in C, not how to make this particular approach fast in java.  Subtle but important difference.

Quote

Sometimes you can't "choose" your algorithms, because the fastest known algorithms only have one basic implementation.


I'd beg to differ  in that you can always chose your data structures.  However I will also grant that there are certain micro-tasks that C by nature of its insecurity will be faster, if scarier, at.  There is a reason why we still distribute a C MPEG decoder with JMF.  Optimal C  in that case is about 10% faster then optimal Java.

Quote

That fibonacci heap was not designed for C. I wrote it in Java _first_ and then _ported_ it to C++. The question was, "how can you make this faster - at all?"


Then why does it only have a pool in C?  I'm confused again I'm afraid.

Quote

Now the question is really, "why is the entire application slower by orders of magnitude" (which, by the way, was also first written in Java, then ported to C++), "and what can be done to possibly make it faster?".


Ah a good question.  What are you using this for?
Im assuming you already know its a hot-spot, so what is the purpose in your code?  It may be that a totally different solution to the problem is warranted.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #53 - Posted 2004-01-15 23:04:55 »

Quote
Which is not QUITE the right question.


No, it is the right question. I'm writing a component that calculates shortests paths. If you read the literature, you'll see that a fibonacci heap is a type of priority queue which changes the algorithmic complexity of dijkstra's (the only known algorithm that is guaranteed to find the shortest path in a graph) from O( v^2 ) to O( e + vlogv ). This is the fastest known algorithm for finding the shortest paths in a generic network. There are other implementations of dijkstra's (aside from classic dijkstra's) that can be used based on the sparseness/denseness of the network, maximum path limits, and other specifics (i.e. a radix-heap based dijkstra's, dial's, etc... ). We've investigated all of these and have our algorithm tweaked to work best with our particular problem and data set. The priority queue is critical, as the bulk of a path finding algorithm its spent accessing the priority queue.

Quote
I'd beg to differ  in that you can always chose your data structures.


Not if you want the ones that are the fastest. Need O( 1 ) insert and O( 1 ) lookup on a sparse dataset? What choices do you have?

Quote
Then why does it only have a pool in C?  I'm confused again I'm afraid.


Because in the port, I changed it from being "optimal in Java" to "optimal in C++". While the fibonacci heap was almost exactly a line-for-line port, the pathing component had a few minor changes. The addition of pooling and the use of real objects instead of groups of arrays were the largest changes I made.

Quote
Ah a good question.  What are you using this for?Im assuming you already know its a hot-spot, so what is the purpose in your code?  It may be that a totally different solution to the problem is warranted.


I think I've already answered this.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline swpalmer

JGO Coder




Where's the Kaboom?


« Reply #54 - Posted 2004-01-16 02:17:30 »

Quote
However I will also grant that there are certain micro-tasks that C by nature of its insecurity will be faster, if scarier, at.  There is a reason why we still distribute a C MPEG decoder with JMF.  Optimal C  in that case is about 10% faster then optimal Java.


This is a beef of mine.  C/C++ is still the absolute wrong way to do a MPEG (or JPEG) decoder it will be orders of magnitude slower than it should be.  It is just a REALLY REALLY bad choice.

These are the exact things that MMX/SSE2 - SIMD in general are in the processor for.

Using generic 'C' codecs may get you a degree of portability.. but the cost is HUGE on the performance side.

I suggested to some Sun guys many months ago to use Intel's (once free) JPEG library.  I happen to have an app that really needs good performance for decoding JPEG images and due to bugs in ImageIO that don't allow me to decode successive JPEG images into the same image buffer without leaking memory and the general no-workingness of JMF, I was forced to use the general Toolkit image loading.. which has pathetic performance.

Offline Jeff

JGO Coder




Got any cats?


« Reply #55 - Posted 2004-01-16 02:36:57 »

Quote

Not if you want the ones that are the fastest. Need O( 1 ) insert and O( 1 ) lookup on a sparse dataset? What choices do you have?

Hmm.  It may be you are working with far larger datasets then I'm used to.  My first inclination for sparse data set insertion/lookup is hashtable.  My second thought is B+ tree but thats because any time I've worked with such a large data set its been disk based.

My third thought is a commerical in memory database product like TimesTen and let someone else solve the problem, but thats because I have that option.

If you've profiled, what is it in the algorithym thats slowing you down?  Are you having to implement a heap on an array?  I could see access there being expensive.  There might be caching strategies depending on the expected pattern of access....

Whats the profiler telling you about the sticking points?

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline Mark Thornton

Senior Member





« Reply #56 - Posted 2004-01-16 06:29:00 »

Quote

Then why doesn't Sun create a performance white paper, post it on java.net and put an end to the insanity once and for all.


http://java.sun.com/j2se/1.4.2/1.4.2_whitepaper.html

http://java.sun.com/docs/performance/

http://www.javaperformancetuning.com/

Offline Mark Thornton

Senior Member





« Reply #57 - Posted 2004-01-16 07:07:44 »

Quote
Oh, and though I don't have the time, I just feel compelled to do some bitching about the overhead of object headers. In C++ I can instantiate 70 million objects just fine. Try that with Java, and you'll be hearing the wooshing sound of 560M of pure object header overhead flying by your head.

I know exactly what you mean. I did notice that Blackdown have an RC1 version of 1.4.2 for the AMD64 on Linux available (and it uses HotSpot). Of course the object headers would then be even bigger but at least the address space limit would be much further out.
My own partial solution to this is to reduce the number of objects but not by going all the way to arrays. Classically to represent a road junction where U turns are prohibited, you end up with lots of vertices and arcs --- many more than appear as visible objects on the ground. I combine these to give a single node object with more complex behavious plus link objects which each represent two (or more) arcs in the classical representation. The end result is that despite the object headers the typical data usage is less than the previous C++ implementation.
By the way would you mind saying what use you are making of the path calculations? Also how many of the intermediate vertex labels are needed once the path calculation is complete? I replace the labels which aren't wanted at the end by a reference to a single dummy label (with -infinite cost) as soon as the vertex has been scanned and thus no longer needed. This helps keep the lifetimes of most such objects low.

I gather that your problem is dense. Sometimes it is possible to temporarily discard most of the arcs and solve the remaining sparse problem. If you can then prove that the result is optimal for the original problem, you are done, otherwise you have to add back a few more arcs and resolve. In the case I investigated in my PhD thesis doing all this was still faster than solving the original dense problem.

The advantage I find with Java is that I can get a lot more algorithm experiments done in the time available and thus am more likely to end up with a better algorithm than if I was still working in C++. Even an area as well studied as shortest path algorithms still has room for improved algorithms in special cases, especially if you have the freedom to slightly rephrase the question.
Offline gregorypierce

Senior Member




I come upon thee like the blue screen of death....


« Reply #58 - Posted 2004-01-16 18:38:12 »

Quote


Perhaps I should be more specific. A white paper that has in it code that is performant and code that isn't and then list why as a function of how the JVM functions. There are a wide variety of myths with respect to Java performance out there that just won't die.

http://www.gregorypierce.com

She builds, she builds oh man
When she links, she links I go crazy
Cause she looks like good code but she's really a hack
I think I'll run upstairs and grab a snack!
Offline Mark Thornton

Senior Member





« Reply #59 - Posted 2004-01-16 18:48:47 »

Quote
There are a wide variety of myths with respect to Java performance out there that just won't die.

Unfortunately there people who continue to propagate some of these myths regardless of any evidence you offer them. Apart from these trolls, there are many other who inexplicably don't even bother to look in the obvious place for information despite it being free.
So while more information on how to write effective code is always welcome, don't expect it to have much impact on the propagation of myths. These myths are harder to kill than the demons in any game.
Pages: 1 [2] 3
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

atombrot (20 views)
2014-08-19 09:29:53

Tekkerue (21 views)
2014-08-16 06:45:27

Tekkerue (21 views)
2014-08-16 06:22:17

Tekkerue (12 views)
2014-08-16 06:20:21

Tekkerue (19 views)
2014-08-16 06:12:11

Rayexar (55 views)
2014-08-11 02:49:23

BurntPizza (37 views)
2014-08-09 21:09:32

BurntPizza (27 views)
2014-08-08 02:01:56

Norakomi (35 views)
2014-08-06 19:49:38

BurntPizza (64 views)
2014-08-03 02:57:17
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!