Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (499)
Games in Android Showcase (118)
games submitted by our members
Games in WIP (568)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 2 [3]
  ignore  |  Print  
  Programming to the cache  (Read 22162 times)
0 Members and 1 Guest are viewing this topic.
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #60 - Posted 2008-09-24 09:27:17 »

Not really proof but for a piece of code I had, I replaced x*2 with x<<1 and it decreased the cpu usage a noticeable amount. Something like 10-15%.

To be honest, I wouldn't be surprised if the decrease of CPU usage you were seeing had a different cause.

On a phone or something I could believe it could make a tiny difference, but on a modern PC with a decent JRE...  Undecided
In any case, in a microbenchmark both ways are exactly as fast.

I do write things like <<1 sometimes because it *looks* faster, so it kind of hints the reader that this piece of code is very performance critical.
Otherwise, I've never seen an actual difference on modern PCs.


Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #61 - Posted 2008-09-24 09:46:45 »

Well, I think (as according to the JIT sourcecode) POT multiplications (of constants) are indeed replaced by bit-shifting, but there is this other problem: the 'sweepspot' of the JIT that one has to touch / potentially touches / probably misses.


I';ve seen cases like this:

Code Listing A
1  
2  
3  
4  
arr[p] = a[p] * b[p] + c[p];
arr[p+1] = a[p+1] * b[p+1] + c[p+1];
arr[p+2] = a[p+2] * b[p+2] + c[p+2];
arr[p+3] = a[p+3] * b[p+3] + c[p+3];


Code Listing B
1  
2  
3  
4  
arr[p+0] = a[p+0] * b[p+0] + c[p+0];
arr[p+1] = a[p+1] * b[p+1] + c[p+1];
arr[p+2] = a[p+2] * b[p+2] + c[p+2];
arr[p+3] = a[p+3] * b[p+3] + c[p+3];



As you can see, B should be slightly slower, or equally fast, if the JIT removes the redundant addition by zero.

Yet in...
_that_very_specific_case_with_those_very_specific_other_in_a_certain_order_compiled_classes_
the addition of +0 in the sourcecode DOUBLED performance - in a real app, not a silly microbenchmark.


The JIT just took a different optimalisation path...!

And please don't try to say I'm wrong, because in your benchmark the change has no impact. I bet it has no impact, simply because it doesn't have the same condition as:
_that_very_specific_case_with_those_very_specific_other_in_a_certain_order_compiled_classes_

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #62 - Posted 2008-09-24 10:29:50 »

You've given this example before as proof of your 'bump the JVM in a random way and strange performance consequences may happen' theory. The thing is, if changes like that can change performance that much, it might as well have been the other way around, right?  Smiley

When you gave this example earlier, I've tested this in JEmu2 (where there are a lot of things like that in unrolled loops and such) but I guess I never hit that very specific case... Believe me, in JEmu2 i've tried MANY strange things to improve performance (JEmu2 is very sensitive to micro optimizations) and rarely did I encounter such things and usually the optimizations that did make a difference made logical sense.

I'm not saying you're wrong because I *have* seen strange performance jumps/drops, but in my experience they don't happen often and you can never optimize in anticipation of that. In the end it's still a matter of profiling -> optimizing -> profiling etc.
But, you're right if you say that it might be good to keep these things in the back of your head when you're optimizing your code. It's also good to keep in mind that you might be optimizing for a specific JVM on a specific platform when things like '+0' makes such a difference.

If you take this back to the *2 versus <<1 thing, if what zammbi was seeing was caused by hitting some strange JIT sweetspot, then my point remains that it was not necessarily caused by the fact that <<1 is faster than *1, and this sweetspot might as well have been hit by writing *2 instead of <<1.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline zammbi

JGO Coder


Medals: 4



« Reply #63 - Posted 2008-09-24 10:32:41 »

To be honest, I wouldn't be surprised if the decrease of CPU usage you were seeing had a different cause.

Oh it was <<10, my mistake. It was also a method that was called a lot per second, haven't checked if this is still the case though.

Current project - Rename and Sort
Online princec

JGO Kernel


Medals: 392
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #64 - Posted 2008-09-24 11:03:55 »

Here's a tidbit... the P4 no longer has a barrel shifter so anything more than << 1 takes more cycles these days, as I understand it.

Cas Smiley

Offline DzzD
« Reply #65 - Posted 2008-09-24 12:16:05 »

Quote
I know I'm late coming to this thread, but: Wha...? Since when? Proof?

hum, why Huh this does not have to be proof, it as always been the case, it is true in C, C++ and certainly in all language...

on PC, shift by one is a single machine op, most of the time it is one cpu cycle (if register already loaded)
multiplication require operand 2 to be loaded and the machine op for multiplication ususally take more than one cpu cycle

on a pc shift will become something like that :
MOV EAX,value
SHL (one cpu cycle)

multiplication something like that
MOV EAX,value
MOV EBX,2
MUL EBX (2/3 cpu cycle)



Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #66 - Posted 2008-09-24 12:38:43 »

but:
Quote
(as according to the JIT sourcecode) POT multiplications (of constants) are indeed replaced by bit-shifting

Also, from what I understand SHL doesn't execute in 1 cycle but rather in 6, which is due to the removal of the barrel shifter Cas talked about (it used to execute in 1 cycle on 486).

Anyway, in asm the probably fastest way to do multiplication by 2 is using a simple addition (well, it used to be the fastest way in the old days but it has been a long time since I last did asm).

Offline Markus_Persson

JGO Wizard


Medals: 15
Projects: 19


Mojang Specifications


« Reply #67 - Posted 2008-09-24 13:26:39 »

Here's a tidbit... the P4 no longer has a barrel shifter so anything more than << 1 takes more cycles these days, as I understand it.

Cas Smiley

AMD seems to still have them. Or at least the one I have at home.

Play Minecraft!
Offline DzzD
« Reply #68 - Posted 2008-09-24 15:10:54 »

Quote
Anyway, in asm the probably fastest way to do multiplication by 2 is using a simple addition (well, it used to be the fastest way in the old days but it has been a long time since I last did asm).

so, three methods, anybody motivated to benchmark that :

n<<=1
n*=2
n+=n

Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #69 - Posted 2008-09-24 15:31:16 »

microbenchmarks are nearly impossible in java
in reallife cases, performance is already impossible to predict
further, it'll depend on CPU, OS, and JVM.

Lets not waste time on it

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #70 - Posted 2008-09-25 09:45:51 »

microbenchmarks are nearly impossible in java
in reallife cases, performance is already impossible to predict
further, it'll depend on CPU, OS, and JVM.

Lets not waste time on it

Well, "<<1 is faster than *2" was your idea to begin with  Wink
You're right of course that it's almost impossible to measure this accurately and that the results might very well be totally different in real case scenarios, but you're forgetting that benchmarks are fun  Smiley

On my machine at work (Dual core P4 @3.2GHz, 1GB, Java 1.6.0_03-b05):
-client
2515ms : val *= 2;
2515ms : val <<= 1;
3765ms : val += val;

-server
1969ms : val *= 2;
1968ms : val <<= 1;
1891ms : val += val;

For what it's worth (if anything):
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
public class Test {

    public static void main(String[] args) {
        int vResult = 0;
        long ms1 = Long.MAX_VALUE;
        long ms2 = Long.MAX_VALUE;
        long ms3 = Long.MAX_VALUE;

        for (int i = 0; i < 10; i++) {
            long start = System.currentTimeMillis();
            vResult += test1();
            long ms = System.currentTimeMillis() - start;
            if (ms < ms1)
                ms1 = ms;
            System.out.println(ms + "ms : val *= 2;");

            start = System.currentTimeMillis();
            vResult += test2();
            ms = System.currentTimeMillis() - start;
            if (ms < ms2)
                ms2 = ms;
            System.out.println(ms + "ms : val <<= 1;");

            start = System.currentTimeMillis();
            vResult += test3();
            ms = System.currentTimeMillis() - start;
            if (ms < ms3)
                ms3 = ms;
            System.out.println(ms + "ms : val += val;");
            System.out.println(vResult);
        }
        System.out.println("--------------------------------------------------------");
        System.out.println(ms1 + "ms : val *= 2;");
        System.out.println(ms2 + "ms : val <<= 1;");
        System.out.println(ms3 + "ms : val += val;");

    }

    /**
     * @return
     */

    private static int test1() {
        int val = 1;
        for (int ii = 0; ii < 2000000000; ii++) {
            val *= 2;
            val |= (ii & 1);
        }
        return val;
    }

    /**
     * @return
     */

    private static int test2() {
        int val = 1;
        for (int ii = 0; ii < 2000000000; ii++) {
            val <<= 1;
            val |= (ii & 1);
        }
        return val;
    }

    /**
     * @return
     */

    private static int test3() {
        int val = 1;
        for (int ii = 0; ii < 2000000000; ii++) {
            val += val;
            val |= (ii & 1);
        }
        return val;
    }

}


The last figures are the fastest times that were measured.
I did "val |= (ii & 1);" to prevent val to be shifted into zero, and thus to prevent the (server) JVM to do any clever optimization with that.

Offline DzzD
« Reply #71 - Posted 2008-09-25 15:52:07 »

Quote
Well, "<<1 is faster than *2" was your idea to begin with 
You're right of course that it's almost impossible to measure this accurately and that the results might very well be totally different in real case scenarios, but you're forgetting that benchmarks are fun 


cool, yup dont know really why but I agree with that "bench are fun", thanks for the bench code, I'll post mine results.

EDIT:

Quote
2515ms : val *= 2;
2515ms : val <<= 1;

and

Quote
1969ms : val *= 2;
1968ms : val <<= 1;

very curious results, maybe they are both translated into the same op code



Offline ewjordan

Junior Member





« Reply #72 - Posted 2008-09-25 17:24:44 »


very curious results, maybe they are both translated into the same op code

Earlier in this thread I posted a reference to the function in Hotspot that does this replacement; the question, then, is less whether it can happen than "In what situations does the Jit skip optimizations?". That's something I still don't have much idea about, other than that for very long methods it will stop optimizing at some point. Does anyone know the cutoff, though? I couldn't find the right debug VM flag to see for myself...
Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #73 - Posted 2008-09-25 19:26:29 »

It's more that methods are inlined, and inlined, and inlined and inlined, and eventually something triggers the end of inlining.

That trigger is way more important than the cut-off in a single method, because nobody in their right mind write such long methods anyway.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #74 - Posted 2008-09-25 21:50:54 »

On my 1.8GHz Athlon at home:

-server
3515ms : val *= 2;
3531ms : val <<= 1;
3281ms : val += val;

-client
4578ms : val *= 2;
4593ms : val <<= 1;
5547ms : val += val;

Kind of interesting to see the difference in behaviour of val+=val between -client and -server, both on Intel and AMD... Especially on the client, this isn't quite 'noise level'...

It's more that methods are inlined, and inlined, and inlined and inlined, and eventually something triggers the end of inlining.

That trigger is way more important than the cut-off in a single method, because nobody in their right mind write such long methods anyway.

I could imagine that the amount of inlining might depend on the CPU/instruction cache size you're running on?

Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #75 - Posted 2008-09-25 22:17:07 »

A better indicator of * vs << would be *8 and << 3 (or so) because of what Cas said.



If you want to investigate (which is 'fun' right!) just monitor the very un-informative output of -XprintCompilation (or something similar, I bet you guys know). After a while you see the patterns, and how whole methods are removed / optimized away all together... now when you think of it, it's a miracle we still have those perfect stacktraces in Java, as the CPU certainly doesn't have those methods in its stack!



About my previous comment on 'please let's not waste time': I'm just getting tired of analyzing JIT performance. It's not only very dependant on the slightest changes in your sourcecode, it's also simply non-deterministic!


You can run some benchmark like 8 times (8 times launching the JVM => 8 processes), and get 2 totally different groups of results, like 8s, 8s, 13s, 8s, 13s, 13s, 8s, 8s. So... I'm certainly not writing anymore microbenchmarks for a while, only when I can get like more than factor 2 performance gain (in the general case), it's worth messing with your precious time.


Measuring performance differences between float[] and FloatBuffer is even worse! Both have very different cache characteristics! running the benchmarks on float[432], float[433], float[434] can have VERY different results. Which are in turn not comparable to directs FloatBuffers with capacities: 423, 433, 434.
In general the calculations/sec are very much like a max(maxPerformance, 1/dataSize + sqrt(dataSize)), with MAJOR spikes down (up to factor 3 slower) at different intervals of dataSize for float[] and FloatBuffer, where in average float[] is 10-20% faster, depending on dataSize.


So whenever you think your FloatBuffer performance is bigger than your float[] performance (by a large margin), you probably just hit a down-spike on your float[]. Try increasing the float[] length by 1 (!), or on another CPU, or other memory, or wait for full moon and light some candles, it helps! See? Way too much variables to get any meaningful results in 1 lifetime.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #76 - Posted 2008-09-25 23:40:55 »

A better indicator of * vs << would be *8 and << 3 (or so) because of what Cas said.

Yes, in fact I already tried that (<<10 versus *1024 to test zammbi's case). Exactly the same result...
Actually I couldn't find anything about that more shifting would make the shift slower.

Quote
About my previous comment on 'please let's not waste time': I'm just getting tired of analyzing JIT performance. It's not only very dependant on the slightest changes in your sourcecode, it's also simply non-deterministic!

Hmmm, are you sure it's not just the PC you're testing on that has some funky background tasks going on or something? Performance test results are usually quite consistent IME, even though I don't always understand the results...
I do notice some strange hiccups sometimes, like running a performance test the 2nd time giving different results, but I'm not seeing that very often and I usually just blame my PC and try again... I also found that killing eclipse can sometimes make benchmark test results more consistent if things behave erratically.

Offline DzzD
« Reply #77 - Posted 2008-09-26 01:14:16 »

Quote
Measuring performance differences between float[] and FloatBuffer is even worse! Both have very different cache characteristics! running the benchmarks on float[432], float[433], float[434] can have VERY different results. Which are in turn not comparable to directs FloatBuffers with capacities: 423, 433, 434.
In general the calculations/sec are very much like a max(maxPerformance, 1/dataSize + sqrt(dataSize)), with MAJOR spikes down (up to factor 3 slower) at different intervals of dataSize for float[] and FloatBuffer, where in average float[] is 10-20% faster, depending on dataSize.
I really dont understand how a FloatBuffer  could be faster than a simple float ?? I mean you access a float directly (except bounds checks...., it is only a memory read, FloatBuffer requiere a method call that make things slower , no ?


about bounds checks and their slowness (and a bottleneck when making computation on a big array), I was thinking that a good addition to the java language could be something like the foreach statement that way array acces will be cntrolled by the JVM and bounds check will no have to be performed, that would be cool....

1  
2  
3  
4  
5  
int b[]=new int[1024*1024];
foreach(int c in b)
{
 c++;
}

 Roll Eyes

<<1
maybe <<1 will stay as fast or faster than *2 as it is a very simple bit op and very simple bit op are "hard-wired ?", maybe in some case and when JIT have enought time it replace *2 by <<1 ? ==> same results ?

Offline zammbi

JGO Coder


Medals: 4



« Reply #78 - Posted 2008-09-26 02:49:32 »

Quote
Yes, in fact I already tried that (<<10 versus *1024 to test zammbi's case). Exactly the same result...
Actually I couldn't find anything about that more shifting would make the shift slower.
Well one other thing is to test if the results are the same as a AMD. Since princec said that the P4 hasn't got bit shifting and your using a P4.
But really I don't think its something to worry about...

Current project - Rename and Sort
Offline ewjordan

Junior Member





« Reply #79 - Posted 2008-09-26 06:36:56 »

You can run some benchmark like 8 times (8 times launching the JVM => 8 processes), and get 2 totally different groups of results, like 8s, 8s, 13s, 8s, 13s, 13s, 8s, 8s. So... I'm certainly not writing anymore microbenchmarks for a while, only when I can get like more than factor 2 performance gain (in the general case), it's worth messing with your precious time.
I'd tend to agree, anything less than a factor of 2 is not worth worrying greatly about or arguing over, especially in Java.  Which brings us back to the original topic of this thread, if you'll pardon my re-hijacking of the discussion: cache misses, which if minimized can result in greater speed improvements than a mere factor of 2.  The question was essentially, do people notice the same significant speedups in Java as in C++ due to programming cache friendly algorithms?  This is pretty significant, because if not, then it would actually be better to stick with naive random access algorithms and let the JVM do what it does rather than re-casting problems and potentially throwing more ops at the processor for negligible (or even negative) speed gain.

The facts as I see them at the moment are:

1) Nobody here seems to have happened across a situation in the field where a predictable (i.e. caused by your choice of algorithm, not some JVM quirk) cache effect showed a major speedup in a program, otherwise they would have said something.  Right? Smiley

2) The simple interleaved vs. separate array speed tests I ran, flawed as they might have been in Java, did not exhibit anywhere near the 5x speed discrepancy that I saw in C++, and furthermore, all the Java versions beat out the cache-unfriendly C++ version by a significant factor.

So it's at least plausible (though by no means certain or proven) that in Java, cache optimizations are already largely carried out by the JVM, and this is a less fruitful avenue for optimization than it is in languages that give you direct memory control.

My original purpose was to predict whether it would actually help or not to restructure my physics engine to make contact solving cache friendly - the bottleneck in the engine is almost always the contact solver, which (at least in C++) tends to be dominated by cache misses because it solves islands of bodies based on contact graph connectivity instead of array index, and thus jumps all over the body array, looping several times doing random accesses all over the place.  It's possible to copy away all the relevant variables to an interleaved array at the beginning of the solve and put them in a "nice" order, then iterate, and then copy them back.  This is more work for the processor, but usually in C++ it actually goes a lot faster because almost every access hits the cache whereas naively you miss very often.

I originally though it would be a lot of work to set up the engine this way, but now it actually looks like it will be fairly simple.  So I'll just try the damn thing and see if it helps or not, that should give a much clearer real life answer than any micro-benchmark I can come up with, so we'll see...
Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #80 - Posted 2008-09-26 07:36:28 »

I really dont understand how a FloatBuffer  could be faster than a simple float ?? I mean you access a float directly (except bounds checks...., it is only a memory read, FloatBuffer requiere a method call that make things slower , no ?

No, the JVM does some very smart optimisations on natively ordered direct buffers.

about bounds checks and their slowness (and a bottleneck when making computation on a big array), I was thinking that a good addition to the java language could be something like the foreach statement that way array acces will be cntrolled by the JVM and bounds check will no have to be performed, that would be cool....

1  
2  
3  
4  
5  
int b[]=new int[1024*1024];
foreach(int c in b)
{
 c++;
}

 Roll Eyes

Simple loops and slightly more complex loops have not suffered from bound-checks for ages now. Make a benchmark in C, and you'll get exactly the same performance as in Java, as long as SIMD is not enabled.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline jezek2
« Reply #81 - Posted 2008-09-26 08:00:35 »

My original purpose was to predict whether it would actually help or not to restructure my physics engine to make contact solving cache friendly - the bottleneck in the engine is almost always the contact solver, which (at least in C++) tends to be dominated by cache misses because it solves islands of bodies based on contact graph connectivity instead of array index, and thus jumps all over the body array, looping several times doing random accesses all over the place.  It's possible to copy away all the relevant variables to an interleaved array at the beginning of the solve and put them in a "nice" order, then iterate, and then copy them back.  This is more work for the processor, but usually in C++ it actually goes a lot faster because almost every access hits the cache whereas naively you miss very often.

I've just tried it with JBullet. It supports both cache friendly and unfriendly accesses, here are results:

- client VM: cache friendly: ~30 fps, unfriendly: ~22 fps
- server VM: cache friendly: ~50 fps, unfriendly: ~42 fps

I've used BasicDemo which has stack of 125 boxes. To test yourself, add this code after creating DiscreteDynamicsWorld:

1  
dynamicsWorld.getSolverInfo().solverMode &= ~SolverMode.SOLVER_CACHE_FRIENDLY;


Also remember to press 'd' button to disable deactivation when testing.
Offline xinaesthetic

Senior Member


Medals: 1



« Reply #82 - Posted 2008-09-26 08:03:11 »

about bounds checks and their slowness (and a bottleneck when making computation on a big array), I was thinking that a good addition to the java language could be something like the foreach statement that way array acces will be cntrolled by the JVM and bounds check will no have to be performed, that would be cool....

1  
2  
3  
4  
5  
int b[]=new int[1024*1024];
foreach(int c in b)
{
 c++;
}

There is a for each statement, the syntax looks like
1  
2  
3  
4  
5  
int b[]=new int[1024*1024];
for(int c : b)
{
 c++;
}

but it's no use for altering primitive types as in the example because of the lack of pointers to primitives.  I'd have assumed everyone here knew about this, but I'm slightly surprised you wrote what you did in that case.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #83 - Posted 2008-09-26 08:31:50 »

Remember DzzD writes for Java 1.1 IIRC.

Everybody else knew :p

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline DzzD
« Reply #84 - Posted 2008-09-26 09:46:24 »

Remember DzzD writes for Java 1.1 IIRC.

Everybody else knew :p
hehe, true, I am not really new-features&shortcut friend... basically I will never learn the full Java API spec as well as in all other languages, I am not a Java coder, just a coder!

Quote
...but it's no use for altering primitive types as in the example because of the lack of pointers to primitives...
I am afraid that new bachelor will knew all high levels features without never learn about base. people say it is good for conception etc... but I think that the main reason that new software requiere so much memory and cpu even if they could have been made faster in "smaller configuration" computer. not sure but the sentence above make me thing about that a little... can you explain what you mean exactly, how to you think Java modify an int, what difference do you see between c [ x ] and *c= ?


Quote
Simple loops and slightly more complex loops have not suffered from bound-checks for ages now. Make a benchmark in C, and you'll get exactly the same performance as in Java, as long as SIMD is not enabled.

nop, Java bounds checks are bottleneck, perform any smart benchmark and the result will show that random Java array acces is at least two time (up to 5 times for random acces) slower than a direct memory acces in another language as C or even in Basic... I will try to take some times and make benchmark between ASM/C/C++/Java on PC and I can already give you results :
1 - ASM
2 - C/C++
3 - Java

Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #85 - Posted 2008-09-26 10:23:26 »

You showed a foreach loop: that is linear access, not random access.

You should read my post in that context. Wink

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline DzzD
« Reply #86 - Posted 2008-09-26 10:38:44 »

You showed a foreach loop: that is linear access, not random access.

You should read my post in that context. Wink

ok , I have to stop being "demagogic" sometimes Wink, anyway I will try even if sequential acces are quit fast in Java too.

@xinaesthetic

Quote
1  
2  
int b[]=new int[1024*1024];
foreach(int c in b)

maybe I haven't explain enought when I post above sample, sry , this was an idea only interresting for primitive, I mean this way the JIT know you wont go away bounds as you have no acces to the array index and then it will be able to perform direct memory access without bounds check. it could have been

1  
2  
forEach (int c in b[100-300])
c++

with forEach able to throw a ForEachBoundsCheckException and ForEachTypeException, but only check bounds once.




Offline xinaesthetic

Senior Member


Medals: 1



« Reply #87 - Posted 2008-09-26 11:08:33 »

hehe, true, I am not really new-features&shortcut friend... basically I will never learn the full Java API spec as well as in all other languages, I am not a Java coder, just a coder!
Fair enough, but that doesn't quite justify proposing new syntax that has already been there for some time... anyway, nothing against you for not realising.
 
can you explain what you mean exactly, how to you think Java modify an int, what difference do you see between c [ x ] and *c= ?
Well, put it this way.
1  
2  
3  
int[] c = new int[8];
for (int i : c) i = 10; // won't alter the value in the array.
for (int i : c) System.out.println("i = " + i); // all default to zero

The only way the value of the primitive in the array will change, is if you explicitly assign a value to the slot in the array, which effectively nullifies the usefulness of the for each loop in many circumstances.

It also has more far-reaching effects. You essentially can't easily pass a reference to a primitive value you want to change and it's a PITA IMHO.
Offline xinaesthetic

Senior Member


Medals: 1



« Reply #88 - Posted 2008-09-26 11:27:04 »

it could have been

1  
2  
forEach (int c in b[100-300])
c++

with forEach able to throw a ForEachBoundsCheckException and ForEachTypeException, but only check bounds once.
That could be useful.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #89 - Posted 2008-09-26 15:20:36 »

Again, foreach (linear array access) already has 1 bound-check, so there is nothing to gain there.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Pages: 1 2 [3]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Riven (9 views)
2014-10-02 14:36:20

Pippogeek (41 views)
2014-09-24 16:13:29

Pippogeek (32 views)
2014-09-24 16:12:22

Pippogeek (22 views)
2014-09-24 16:12:06

Grunnt (47 views)
2014-09-23 14:38:19

radar3301 (30 views)
2014-09-21 23:33:17

BurntPizza (65 views)
2014-09-21 02:42:18

BurntPizza (37 views)
2014-09-21 01:30:30

moogie (44 views)
2014-09-21 00:26:15

UprightPath (53 views)
2014-09-20 20:14:06
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!