Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (477)
Games in Android Showcase (106)
games submitted by our members
Games in WIP (533)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 [2] 3
  ignore  |  Print  
  Bound's checks and struct's  (Read 13615 times)
0 Members and 1 Guest are viewing this topic.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #30 - Posted 2005-10-21 14:57:16 »

Hehe, I've got the feeling i'm being watched Smiley



Okay, here we go:

I had it like this:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
private final int a0, a1, a2, a3, a4, a5;

ClassB()
{
   a0 = nextOffset + 0;
   a1 = nextOffset + 1;
   a2 = nextOffset + 2;
  etc...
}

buffer.get(a0);
buffer.get(a1);
buffer.get(a2);
etc...



Now it's:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
private final int off;

ClassB()
{
   off = nextOffset;
}

buffer.get(off+0);
buffer.get(off+1);
buffer.get(off+2);
etc...




So the second option was much faster: from factor 2.7 to 1.3.

It also saves ram, so no tradeoffs...

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Mark Thornton

Senior Member





« Reply #31 - Posted 2005-10-21 15:04:48 »

That is bizarre. An apparently trivial change makes the performance change by a factor of 2. It doesn't give me confidence that the code would work as expected on another JVM (e.g. one by IBM or Apple).
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #32 - Posted 2005-10-21 15:08:03 »

Write once, profile everywhere

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline abies

Senior Member





« Reply #33 - Posted 2005-10-21 15:54:39 »

After rearranging code a bit (to put loops inside their methods, so it will be easier to see what is happening) and simplifying loops (hotspot was doing everything on registers after few first memory accesses, plus it was again to much code to check), I have come up with following

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
  private static void testB(int classes) {
    for(int i = 0; i < loop; i++) {
      ClazzB b = cB[i % classes];
      b.c(b.a() + b.b());
      b.z(b.x() + b.y());  
    }
  }

  private static void testA(int classes) {
    for(int i = 0; i < loop; i++) {
      ClazzA a = cA[i % classes];
      a.c = a.a + a.b;
      a.z = a.x + a.y;
    }
  }


and generated code for both methods (inner loop only, rest is no important) is

testA
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
080   B6: #   B10 B7 <- B5 B8    Loop: B6-B8 inner stride: not constant  Freq: 7.51066
080      MOV    EAX,ESI
082      MOV    ECX,[ESP + #4]
086      CDQ
   IDIV   ECX
0a0      CMPu   EDX,EBX
0a2      Jge,us B10  P=0.000001 C=-1.000000
0a2
0a4   B7: #   B11 B8 <- B6  Freq: 7.51065
0a4      MOV    EBP,[EDI + #12 + EDX << #2]
0a8      MOV    EDX,[EBP + #8]
0ab      NullCheck EBP
0ab
0ab   B8: #   B6 B9 <- B7  Freq: 7.51065
0ab      MOVSS  XMM0a,[EBP + #24]
0b0      MOVSS  XMM2a,[EBP + #20]
0b5      MOV    ECX,[EBP + #12]
0b8      ADDSS  XMM2a,XMM0a
0bc      MOVSS  [EBP + #28],XMM2a
0c1      ADD    EDX,ECX
0c3      MOV    [EBP + #16],EDX
0c6      INC    ESI
0c7      CMP    ESI,#1048576
0cd      Jlt,s  B6  P=1.000000 C=7.509333
0cd


and testB

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
1b0   B18: #   B35 B19 <- B17 B32    Loop: B18-B32 inner stride: not constant  Freq: 14.1767
1b0      MOV    EAX,EBX
1b2      MOV    ECX,[ESP + #28]
1b6      CDQ
   IDIV   ECX
1d0      CMPu   EDX,[ESP + #36]
1d4      Jge,u  B35  P=0.000001 C=-1.000000
1d4
1da   B19: #   B34 B20 <- B18  Freq: 14.1766
1da      MOV    EDI,[ESP + #32]
1de      MOV    ECX,[EDI + #12 + EDX << #2]
1e2      MOV    [ESP + #8],ECX
1e6      MOV    ECX,[ECX + #8]
1e9      NullCheck ECX
1e9
1e9   B20: #   B64 B21 <- B19  Freq: 14.1766
1e9      TEST   ECX,ECX
1eb      Jlt    B64  P=0.000000 C=6.667333
1eb
1f1   B21: #   B62 B22 <- B20  Freq: 6.66733
1f1      CMP    ECX,[ESP + #60]
1f5      Jge    B62  P=0.000000 C=6.667333
1f5
1fb   B22: #   B59 B23 <- B21  Freq: 6.66733
1fb      MOV    ESI,ECX
1fd      INC    ESI
1fe      MOV    EDI,ECX
200      SHL    EDI,#2
203      MOV    EBP,[ESP + #4]
207      ADD    EBP,EDI
209      MOV    EDX,[EBP]
20c      TEST   ESI,ESI
20e      Jlt    B59  P=0.000000 C=6.667333
20e
214   B23: #   B57 B24 <- B22  Freq: 6.66733
214      CMP    ESI,[ESP + #60]
218      Jge    B57  P=0.000000 C=6.667333
218
21e   B24: #   B54 B25 <- B23  Freq: 6.66733
21e      MOV    EAX,[EBP + #4]
221      ADD    EAX,EDX
223      MOV    EDX,ECX
225      ADD    EDX,#2
228      TEST   EDX,EDX
22a      Jlt    B54  P=0.000000 C=6.667333
22a
230   B25: #   B52 B26 <- B24  Freq: 6.66733
230      CMP    EDX,[ESP + #60]
234      Jge    B52  P=0.000000 C=6.667333
234
23a   B26: #   B49 B27 <- B25  Freq: 6.66733
23a      MOV    [EBP + #8],EAX
23d      MOV    EAX,ECX
23f      ADD    EAX,#3
242      TEST   EAX,EAX
244      Jlt    B49  P=0.000000 C=6.667333
244
24a   B27: #   B47 B28 <- B26  Freq: 6.66733
24a      CMP    EAX,[ESP + #56]
24e      Jge    B47  P=0.000000 C=6.667333
24e
254   B28: #   B44 B29 <- B27  Freq: 6.66733
254      MOV    EBP,[ESP + #0]
257      ADD    EBP,EDI
259      MOVSS  XMM0a,[EBP + #12]
25e      MOV    EAX,ECX
260      ADD    EAX,#4
263      TEST   EAX,EAX
265      Jlt    B44  P=0.000000 C=6.667333
265
26b   B29: #   B42 B30 <- B28  Freq: 6.66733
26b      CMP    EAX,[ESP + #56]
26f      Jge    B42  P=0.000000 C=6.667333
26f
275   B30: #   B39 B31 <- B29  Freq: 6.66733
275      MOVSS  XMM2a,[EBP + #16]
27a      ADDSS  XMM2a,XMM0a
27e      ADD    ECX,#5
281      TEST   ECX,ECX
283      Jlt,s  B39  P=0.000000 C=6.667333
283
285   B31: #   B37 B32 <- B30  Freq: 6.66733
285      CMP    ECX,[ESP + #56]
289      Jge,s  B37  P=0.000000 C=6.667333
289
28b   B32: #   B18 B33 <- B31  Freq: 6.66733
28b      MOVSS  [EBP + #20],XMM2a
290      INC    EBX
291      CMP    EBX,#1048576
297      Jlt    B18   # Loop end  P=1.000000 C=7.509333


I'm quite surprised that speed difference is only about 1.4 (on my AMD machine with short loop).

Artur Biesiadowski
Offline abies

Senior Member





« Reply #34 - Posted 2005-10-21 15:58:29 »

Ok, I think I have found the problem. Replace the loops with
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
  private static void testB(int classes) {
    for(int i = 0; i < loop; i++) {
      ClazzB b = cB[i&0x2f];
      b.c(b.a() + b.b());
      b.z(b.x() + b.y());  
    }
  }

  private static void testA(int classes) {
    for(int i = 0; i < loop; i++) {
      ClazzA a = cA[i&0x2f];
      a.c = a.a + a.b;
      a.z = a.x + a.y;
    }
  }


We were basically measuring speed of CDQ IDIV ECX, not the field access.

Now, for the short loop, difference is 3.64 times.


Edit:
Still, there was something wrong. After changing test methods to
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
private static void testB(int classes) {
   
    for(int i = 0; i < loop; i++) {
      ClazzB b = cB[0];
      b.c(b.a() + b.b());
      b.z(b.x() + b.y());  
      b = cB[1];
      b.c(b.a() + b.b());
      b.z(b.x() + b.y());  
      b = cB[2];
      b.c(b.a() + b.b());
      b.z(b.x() + b.y());  
      b = cB[3];
      b.c(b.a() + b.b());
      b.z(b.x() + b.y());  
    }
  }

  private static void testA(int classes) {

    for(int i = 0; i < loop; i++) {
      ClazzA a = cA[0];
      a.c = a.a + a.b;
      a.z = a.x + a.y;
      a = cA[1];
      a.c = a.a + a.b;
      a.z = a.x + a.y;
      a = cA[2];
      a.c = a.a + a.b;
      a.z = a.x + a.y;
      a = cA[3];
      a.c = a.a + a.b;
      a.z = a.x + a.y;
    }
  }


difference is 11 -13 times, depending on jvm (5.0 or 6.0).

Artur Biesiadowski
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #35 - Posted 2005-10-21 16:13:48 »

After rearranging code a bit (to put loops inside their methods, so it will be easier to see what is happening) and simplifying loops (hotspot was doing everything on registers after few first memory accesses, plus it was again to much code to check), I have come up with following

After I simplicied the loops, I get performance factor 1.14

Not that you do: "& 0x2F" which is 47, while there are 64 elements in the array, so it must be "& 0x3F"


I already PMed you, but you haven't replied yet, so could you please tell my how to get that native code printed? Thanks.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #36 - Posted 2005-10-21 16:17:12 »

On my system:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
               for (int i = 0; i < loop; i++)
               {
                  ClazzB b = cB[i % classes];

                  b.c(b.a() + b.b());
                  b.b(b.c() + b.a());
                  b.a(b.b() - b.c());

                  b.z(b.x() + b.y());
                  b.y(b.z() + b.x());
                  b.x(b.y() - b.z());
               }


Factor = 1.14


1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
               for (int i = 0; i < loop; i++)
               {
                  ClazzB b = cB[i & 0x3F];

                  b.c(b.a() + b.b());
                  b.b(b.c() + b.a());
                  b.a(b.b() - b.c());

                  b.z(b.x() + b.y());
                  b.y(b.z() + b.x());
                  b.x(b.y() - b.z());
               }


Factor = 1.70


So it's :
ClazzB b = cB[i & 0x3F];         ---> 1.70
ClazzB b = cB[i % classes];  ---> 1.14



It seems we're benchmarking the wrong stuff here (the managing code)

I'll rewrite the benchmark!
         

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #37 - Posted 2005-10-21 16:26:45 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
         int classes = 64;
         
         ClazzA[] cA = new ClazzA[classes];
         ClazzB[] cB = new ClazzB[classes];

         for (int i = 0; i < classes; i++)
         {
            cA[i] = new ClazzA();
            cB[i] = new ClazzB();
         }

         //

         int loop = 1024 * 128;

         for (int p = 0; p < 8; p++)
         {
            long tA = 0;
            long tB = 0;

            for (int k = 0; k < 4; k++)
            {
               // calc
               long t0A = System.nanoTime();
               for (int i = 0; i < loop; i++)
               {
                  ClazzA a = cA[i & 0x3F];
                  for (int j = 0; j < 64; j++)
                     comp(a);
               }
               long t1A = System.nanoTime();

               long t0B = System.nanoTime();
               for (int i = 0; i < loop; i++)
               {
                  ClazzB b = cB[i & 0x3F];
                  for (int j = 0; j < 64; j++)
                     comp(b);
               }
               long t1B = System.nanoTime();

               tA += ((t1A - t0A) / 1000000);
               tB += ((t1B - t0B) / 1000000);
            }

            System.out.println("Performance factor: " + (float) tB / tA + " (Time: " + (tA + tB) + "ms)");
         }



   private static final void comp(ClazzA a)
   {
      a.c = a.a + a.b;
      a.b = a.c + a.a;
      a.a = a.b - a.c;

      a.z = a.x + a.y;
      a.y = a.z + a.x;
      a.x = a.y - a.z;
   }



   private static final void comp(ClazzB b)
   {
      b.c(b.a() + b.b());
      b.b(b.c() + b.a());
      b.a(b.b() - b.c());

      b.z(b.x() + b.y());
      b.y(b.z() + b.x());
      b.x(b.y() - b.z());
   }


1  
2  
3  
4  
5  
6  
7  
8  
Performance factor: 1.3216169 (Time: 1321ms)
Performance factor: 1.2138836 (Time: 1180ms)
Performance factor: 1.5396290 (Time: 1506ms)
Performance factor: 1.2973485 (Time: 1213ms)
Performance factor: 1.3528184 (Time: 1127ms)
Performance factor: 1.3636364 (Time: 1118ms)
Performance factor: 1.3215767 (Time: 1119ms)
Performance factor: 1.3859276 (Time: 1119ms)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Mark Thornton

Senior Member





« Reply #38 - Posted 2005-10-21 16:27:54 »

The perils of micro benchmarks!
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #39 - Posted 2005-10-21 16:29:13 »

Yup, that's why I posted the benchmark, so that others could point out the weaknesses. Very useful Smiley

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline abies

Senior Member





« Reply #40 - Posted 2005-10-21 16:52:04 »

I already PMed you, but you haven't replied yet, so could you please tell my how to get that native code printed? Thanks.

PMs used to pop up here on login, don't they ? Sorry - I have not noticed a new one, information is hidden on bottom of main page...

As for printing out native code, download debug 6.0 jvm and call it with
-XX:+PrintOptoAssembly
on command line.

For your current code, I'm getting factor of 1.1 on first few iterations, then some compilations kicks in and ration changes to 2.2 and stays there.
Try to access more than one object in same method (3-4 of them) - you will get a lot worse ratio.

Problem is, that there seems to be certain kind of operations in very simple cases which get totally optimized by hotspot (ratio of 1.1-1.3). But anything more complicated and we are back into lets-call-a-method mode, which gives ratio 10+.

Artur Biesiadowski
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #41 - Posted 2005-10-21 17:00:20 »

I'll make a real-world example:

An array of 3d vectors (float) multiplied by a 4x4 matrix...

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline abies

Senior Member





« Reply #42 - Posted 2005-10-21 18:07:03 »

Some piece of code which computes centers of triangles in big array.

Results are 4.6s for Buffer, 2.8s for Unsafe based.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
import java.lang.reflect.Field;
import java.nio.Buffer;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.FloatBuffer;

import sun.misc.Unsafe;

class BVertex {

  private final FloatBuffer fb;
  private int off;

  public BVertex(FloatBuffer fb) {
    this.fb = fb;
  }

  public void position(int vIndex) {
    off = vIndex*3;
  }

  public final float x() { return fb.get(off); }
  public final float y() { return fb.get(off + 1); }
  public final float z() { return fb.get(off + 2); }

  public final void x(float x) { fb.put(off, x); }
  public final void y(float y) { fb.put(off + 1, y); }
  public final void z(float z) { fb.put(off + 2, z);}

}

class UVertex {
  static Unsafe unsafe;
  static Field addressHack;
  static {
    try {
      ByteBuffer bb = ByteBuffer.allocateDirect(1);
      Field unsafeHack = bb.getClass().getDeclaredField("unsafe");
      unsafeHack.setAccessible(true);
      unsafe = (Unsafe) unsafeHack.get(bb);

      addressHack = Buffer.class.getDeclaredField("address");
      addressHack.setAccessible(true);
    } catch (Exception exc) {
      exc.printStackTrace();
    }
  }

  private long base;
  private int offset;


  public UVertex(FloatBuffer fb) {
    try {
      base = addressHack.getLong(fb);
    } catch (Exception exc) {
      exc.printStackTrace();
      throw new InternalError();
    }
  }

  public void position(int vIndex) {
    offset = vIndex*12;
  }

  public final float x() { return unsafe.getFloat(base+offset); }
  public final float y() { return unsafe.getFloat(base+offset + 4); }
  public final float z() { return unsafe.getFloat(base+offset + 8); }

  public final void x(float x) { unsafe.putFloat(base+offset, x); }
  public final void y(float y) { unsafe.putFloat(base+offset+4, y); }
  public final void z(float z) { unsafe.putFloat(base+offset+8, z); }

}

public class VertexTest {

  static final int TRIANGLES_COUNT = 10000;
  static final FloatBuffer triangles = ByteBuffer.allocateDirect(
      TRIANGLES_COUNT * 3 * 3 * 4).order(ByteOrder.nativeOrder())
      .asFloatBuffer();
  static final FloatBuffer center = ByteBuffer.allocateDirect(
      TRIANGLES_COUNT * 3 * 4).order(ByteOrder.nativeOrder()).asFloatBuffer();

  public static void main(String[] argv) {

    for (int i = 0; i < TRIANGLES_COUNT; i++) {
      triangles.put(i, i);
    }

    for (int i = 0; i < 10; i++) {
      mainX();
    }
  }

  private static void mainX() {
    long start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
      computeBufferCenters();
    }
    System.out.println("Buffer-based " + (System.currentTimeMillis() - start)
        + "ms");

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
      computeUnsafeCenters();
    }
    System.out.println("Unsafe-based " + (System.currentTimeMillis() - start)
        + "ms");
  }

  private static void computeUnsafeCenters() {
    UVertex a = new UVertex(triangles);
    UVertex b = new UVertex(triangles);
    UVertex c = new UVertex(triangles);
    UVertex d = new UVertex(center);

    for (int i = 0; i < TRIANGLES_COUNT; i++) {
      int tstart = i * 3;
      a.position(tstart);
      b.position(tstart + 1);
      c.position(tstart + 2);

      d.position(i);

      d.x((a.x() + b.x() + c.x()) / 3);
      d.y((a.y() + b.y() + c.y()) / 3);
      d.z((a.z() + b.z() + c.z()) / 3);
    }
  }

  private static void computeBufferCenters() {
    BVertex a = new BVertex(triangles);
    BVertex b = new BVertex(triangles);
    BVertex c = new BVertex(triangles);
    BVertex d = new BVertex(center);

    for (int i = 0; i < TRIANGLES_COUNT; i++) {
      int tstart = i * 3;
      a.position(tstart);
      b.position(tstart + 1);
      c.position(tstart + 2);

      d.position(i);

      d.x((a.x() + b.x() + c.x()) / 3);
      d.y((a.y() + b.y() + c.y()) / 3);
      d.z((a.z() + b.z() + c.z()) / 3);
    }
  }
}

Artur Biesiadowski
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #43 - Posted 2005-10-21 18:22:11 »

Woah, you're making a direct-buffer of 1 byte, then accessing N bytes from it... Grin

Not very neat huh? Smiley Meanwhile I'm working on the next version...

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline abies

Senior Member





« Reply #44 - Posted 2005-10-21 18:28:38 »

Woah, you're making a direct-buffer of 1 byte, then accessing N bytes from it... Grin

ByteBuffer bb = ByteBuffer.allocateDirect(1);
this is local variable, used only to get a field reference for reflection. I could probably just use ByteBuffer.class instead of bb.getClass(), but I just wanted to be sure that I'll resolve it against real class of direct buffer.

Real buffers used in test are 'triangles' and 'center'

Artur Biesiadowski
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #45 - Posted 2005-10-21 18:44:10 »

Hm.... hm.... hmmmmmm........  Shocked Grin

This benchmark is comparable with a real-world situation.

Benchmark does this:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
init 1024 3d vectors
init random 4x4 matrices

for(1024)
   for(field vectors)
      transform by field-matrix

for(1024)
   for(buffer vectors)
      transform by buffer-matrix

for(1024)
   for(unsafe buffer vectors)
      transform by unsafe-buffer-matrix




1  
2  
Performance factor field vs. buffer: 2.4653847
Performance factor field vs. unsafe: 0.60384613


Using unsafe-buffers is 67% faster than using fields..!
(1.0/0.6)-1.0 = 67%

Now these objects can be used to speedup any math operation. We'll need a good API for it so that it's guaranteed Safe and anyone can use it.

Unfortunately this only works for the Sun Server VM. (sun.misc.Unsafe)


source-code: structs
source-code: bench

I've written code that will generate struct-source-code automaticly, for:
1. fields
2. buffers
3. unsafe buffers

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline abies

Senior Member





« Reply #46 - Posted 2005-10-21 21:46:58 »

One note - in my previous benchmark, it was the 'position' method which made a major difference. Without it, unsafe version was 3-4 times faster !!! Unfortunately, it is needed to be able to reuse same structure wrapper for various positions in same buffer. Having one object per entry in native array is not possible (memory, gc hit).

Artur Biesiadowski
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #47 - Posted 2005-10-21 21:51:50 »

My latest implementations is:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
class VecC
{
   public static final int SIZEOF = 12;

   private static final Unsafe access = StructUtil.getAccess();

   private final long base;



   public VecC(ByteBuffer bb)
   {
      if (bb.remaining() < SIZEOF)
         throw new IllegalStateException("Not enough bytes remaining in buffer: " + bb.remaining() + "/" + SIZEOF);
      if (bb.order() != ByteOrder.nativeOrder())
         throw new IllegalStateException("ByteBuffer must be in native order");

      int pos = bb.position();
      base = StructUtil.getBase(bb) + pos;
      bb.position(pos + SIZEOF);
   }



   public final void x(float x)   {      access.putFloat(base + 0L, x);   }
   public final void y(float y)   {      access.putFloat(base + 4L, y);   }
   public final void z(float z)   {      access.putFloat(base + 8L, z);   }
   public final float x()   {      return access.getFloat(base + 0L);   }
   public final float y()   {      return access.getFloat(base + 4L);   }
   public final float z()   {      return access.getFloat(base + 8L);   }
}



No need to use position at all outside the constructor...

~~~~~~~~~~~~~~~~~~~~

Edit: okay, I looked through your code, and noticed you didn't mean the ByteBuffer.position()

You basicly use that method to move along the data, and 'land' where you like it, to re-use objects... I don't know if that's such a good idea... Think about this:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
MappedObject mo = new MappedObject(...)
mo.doSomething();

// "mo" points to other data now
// this is not like anything in java

float x = mo.x();


void doSomething(MappedObject obj)
{
    obj.position(...);
}


My implementation is 100% safe, as long as the ByteBuffer is floating around. With MappedObject.position(...) you can wreak havok and cause native crashes. You can't allow that to happen, ever. Checking input here (throwing exceptions) will disable inlining, which is kinda slow compared to non-struct classes.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #48 - Posted 2005-10-21 22:13:44 »

Quote
Having one object per entry in native array is not possible (memory, gc hit).

Eden-heap GC is lightning fast. Since Java 1.4 / 1.5 tiny objects (1long) are not really a problem anymore, especially on Server VM. (ok, >1000/s is troublesome)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline abies

Senior Member





« Reply #49 - Posted 2005-10-21 22:22:09 »

We are talking here about million objects per second. I can imagine such structure being used to fill out vertex data inside opengl buffers -  thousands of dynamic triangles each frame, giving 100-1000 thousands triangles per second. You certainly don't want to allocate anything to access single vertex. One allocation per array of vertices is probably acceptable, but maybe even switching nio buffers is not too far fetched.

Artur Biesiadowski
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #50 - Posted 2005-10-22 02:53:23 »

I'm working on an API that supports both my and your kind of structs... will post tomorrow, it's getting late Undecided

Here are the javadocs

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline AndersDahlberg

Junior Member





« Reply #51 - Posted 2005-10-22 13:10:58 »

http://lwjgl.org/forum/viewtopic.php?t=955

Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #52 - Posted 2005-10-22 13:19:16 »

First of all, you're doing this:

ByteBuffer.getFloat() which is very very slow
FloatBuffer.get() is still about 1,5-3x slower than class field access

The best performance you get with Javassist is 25-50% slower than 'normal' code. I have to say it's a nice transparant architechture though, but if you need raw-performance, it's unacceptable, and when you see that unsafe.getFloat() is about 15-20% *faster* than class field access, the choice is easy. You'll lose the transparancy, and have to change your code from fields to method-calls, but I think that's worth it, for the die-hards.


Second, you're using Lists in your benchmark, which will significantly influence performance, not to mention Math.sqrt()s in tight loops, which is very heavy, and Random.nextFloat(). Do you really want to measure that too?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline princec

JGO Kernel


Medals: 342
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #53 - Posted 2005-10-22 14:01:53 »

Enough of all this crazy hackery! I don't get why there is such opposition to the ultra-simple idea that MappedObjects are.

1. Abstract class in java.nio containing final reference to a ByteBuffer
2. Primitive fields mapped IN ORDER DECLARED, of size specified by the Java specs, no need for annotations
3. Reference fields held in heap section.
4. JVM is free to detect classes extending MappedObject and may either optimise directly into machine code, or rewrite bytecode to provide similar but less efficient access by proxy.

Why MappedObjects?

1. Clean, clear code, with no annotations, no caveats, no getters and setters to pollute OOP designs
2. High performance: Bounds check performed only ONCE on a setPosition() call
3. High performance: no need to create or destroy any objects, just use one and slide it around the buffer
4. High performance: no read-modify-write operations, it's all direct in memory access
5. Provides all the benefits of a C-struct but fits seamlessly in with Java's object-oriented paradigm and behaves just like any other reference type by virtue of being a real object on the heap

Why NOT MappedObjects?

1. NIH
2. Inertia at doing anything vaguely out-of-the-box
3. Misunderstanding about what OOP actually is
4. No idea why it's needed

Somebody give me a sound, concrete reason why MappedObjects, as I have described here, do not do everything we need to get us clean, clear, concise, fast, object-oriented, easily implemented, side-effect-free, interfacing with native data.

Cas Smiley

Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #54 - Posted 2005-10-22 14:15:43 »

Ofcourse! When Mapped Objects are in Java, I'd ditch my so called hackery immediately. It has quite a few disadvantages to it. No doubt about it.

But... we haven't got Mapped Objects.

What's the ETA? Noone knows!

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline abies

Senior Member





« Reply #55 - Posted 2005-10-22 14:17:47 »

Are you sure you will never need an annotation ? What about accessing data which is coming from network - some fields can be in different endianess. I agree, that default behaviour should not require any annotations - but allowing ones for not-so-trivial cases could simplify usage a lot.

You can get all of the benefits of your MappedObject with my idea of bytecode weaving - with single exception of having to put transformation class in classloader/on startup. It would even allow you to use field access directly, as it would be changed silently to use accessors. And it has a major benefit - it could be used out of the box, right now, without forcing particular construct on rest of world, which is more concerned about JSP v7.0 than native access to resources.

As far as hackery is concerned... it would be all invisible from client point of view, only library implementation would have to use few magic hacks. I vaguely recall certain library passing native pointers as ints/longs between the calls and doing direct pointer arithmetic on them Wink

Artur Biesiadowski
Offline Mark Thornton

Senior Member





« Reply #56 - Posted 2005-10-22 14:30:18 »

Enough of all this crazy hackery! I don't get why there is such opposition to the ultra-simple idea that MappedObjects are.

1. Abstract class in java.nio containing final reference to a ByteBuffer
2. Primitive fields mapped IN ORDER DECLARED, of size specified by the Java specs, no need for annotations
3. Reference fields held in heap section.
4. JVM is free to detect classes extending MappedObject and may either optimise directly into machine code, or rewrite bytecode to provide similar but less efficient access by proxy.

Because it is a change in the language specification. Especially after the less than ecstatic reception of recent changes, I think the chances of such a change being accepted are slim.

You might still want annotations if you wanted to be able to represent C structs that had been packed to boundaries other than 1 byte; i.e. where padding has been inserted to maintain appropriate alignment.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 743
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #57 - Posted 2005-10-22 14:52:38 »

You can get all of the benefits of your MappedObject with my idea of bytecode weaving - with single exception of having to put transformation class in classloader/on startup. It would even allow you to use field access directly, as it would be changed silently to use accessors.

Do you know how the sliding-window * feature, that you thought was absolutely required, would be implemented with bytecode transformation? I'm very curious.

* using 1 object and move it along the data

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline abies

Senior Member





« Reply #58 - Posted 2005-10-22 15:44:28 »

Do you know how the sliding-window * feature, that you thought was absolutely required, would be implemented with bytecode transformation? I'm very curious.
public class Vertex3f extends MemoryMappedObject {
  public float x;
  public float y;
  public float y;
 
  public Vertex3f(ByteBuffer bb) {
    super(bb);
  }

  // .. some Vertex3f specific methods
}

where MemoryMappedObject would implement position/sliding method, normally visible, without any tricks.

Bytecode weaver would do following:
1) If something extends MemoryMappedObject, remove the public fields, create correct getters/setters with any magic inside which is needed (depending on implementation), probably also pass SIZEOF as extra argument to super constructor
2) If something accesses any field from MemoryMappedObject, convert get/putfield to getter/setter calls.

On top of that, I could imagine few extra properties/annotations
a) possibility of explicitly giving sizeof parameter (passing to super constructor, or in annotation which would be weaved to be passed in constructor) - for easy alignment
b) specifying explicit offset of particular field
c) specifying endianess of particular field
d) (optionally, not sure about that, especially about multiple-dimensions) posibility to denote arrays of values, like

public Matrix4f extends MemoryMappedObject {
  @Dimension(4,4)
  public float[][] data;
}

with calls like matrix.data
  • [y] would be automatically converted to matrix.getData(x*4+y)



With enough magic, it could even work on stuff like

@Alignment(128)
public VertexData extends MemoryMappedObject {
  @Offset(16) public Color4f rgba;
  @Offset(32) public Vector3f position;
  @Offset(48)  public Vector3f normal;
   public float m1,z2,f3;
}

With Color4f and Vector3f being other MMO, expanded inline for VertexData. Alignment/Offset can be replaced with padding elements, so they are not required -I'm  just throwing ideas around.

Artur Biesiadowski
Offline Linuxhippy

Senior Member


Medals: 1


Java games rock!


« Reply #59 - Posted 2005-10-22 16:40:13 »

So, why do you want to do this task in java at all, add such optimizations to java and we are where we already where with C++.
Pages: 1 [2] 3
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

pw (26 views)
2014-07-24 01:59:36

Riven (25 views)
2014-07-23 21:16:32

Riven (20 views)
2014-07-23 21:07:15

Riven (22 views)
2014-07-23 20:56:16

ctomni231 (51 views)
2014-07-18 06:55:21

Zero Volt (46 views)
2014-07-17 23:47:54

danieldean (37 views)
2014-07-17 23:41:23

MustardPeter (40 views)
2014-07-16 23:30:00

Cero (56 views)
2014-07-16 00:42:17

Riven (55 views)
2014-07-14 18:02:53
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!