Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (538)
Games in Android Showcase (132)
games submitted by our members
Games in WIP (600)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  The thriller in Manilla, JNA vs JNI using OpenCL  (Read 2428 times)
0 Members and 1 Guest are viewing this topic.
Offline AI Guy

Senior Newbie





« Posted 2010-03-08 22:55:18 »

I personally would not make my OpenCL bindings decision based on this, but I think we all know this is not about OpenCL, rather OpenGL.  OpenCL should be discussed in it's thread.  It is just a proxy here. Michael, sorry if I am forcing your hand too early or messing up your schedule. Just say if you wish to postpone.  As long as you run both sides yourself on the same hardware, you do not have to worry about distribution before you had planned.

The JNA side is using OpenCL4Java.  This is what JavaCL is built on.  We have 2 names so we knew what we were referring to when talking.  I am probably the only person who uses OpenCL4Java directly.

I ran this on a Snow Leopard Macbook wt 2 GPU's & Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz.  I tried to put out some rules, listed right in the source, but they can be improved do so.  I am not the referee, my schedule is pretty tight.

When I did a run of 1000 loops (each loop is 5 calls) , I got an Avg msec per loop: 0.19629501.  When I did 1 million, I got 0.018008577.  The million is probably better, if you are doing 1000 OpenGL calls just for 1 frame, but play with it.
- - - - - - - - - - - - - - - - -
package whatever;
import java.nio.*;
import com.sun.jna.*;
import com.sun.jna.ptr.*;
import com.ochafik.lang.jnaerator.runtime.NativeSize;
import com.ochafik.lang.jnaerator.runtime.NativeSizeByReference;
import com.nativelibs4java.opencl.library.*;

/**
 *  JNA version, using OpenCL4Java(the low level bindings for JavaCL).  Add this jar to project to run.
 *  http://nativelibs4java.sourceforge.net/maven/com/nativelibs4java/opencl4java/1.0-SNAPSHOT/opencl4java-1.0-SNAPSHOT-shaded.jar
 *
 *  Goal: test call overhead of JNI vs JNA.  OpenCL has dev info calls which are
 *  short in duration.  They DO NOT touch GPU's.  The type of data returned can be found
 *  by running http://nativelibs4java.sourceforge.net/webstart/OpenCL/HardwareReport.jnlp
 *
 *  Not every possible query performed, only one of each return type.  Too much work for
 *  all.  Control using LOOP_COUNT.
 *
 *  Turn on the clock only after Platform, dev created.
 *
 *  Rules:
 *     - platform must be NVidia 195 or 196 if Windows.  Win7 64-bit if possible.
 *     - Do not even bothering to create a context or command queue.
 *     - The avg time/loop should be compared on exact same hardware.  The value itself is
 *       NOT important, only the difference in values.
 *     - MUST "look" at value, since this could be different.
 *     - MUST include any methods which one would reasonable need to do inside the
 *       loop.  e.g. getPointer() methods for JNA
 *     - assigning return code required, but can be actual checking can be commented out
 *
 */
public class JNIvsJNAviaOpenCL{
    static int LOOP_COUNT = 1000000; // 1M
    static float NANOS_PER_MILLI = 1000000F;

    public static void main(String[] argv){
        // get platform, usually only one, unless mixing NVidia & ATI GPU's
        OpenCLLibrary.cl_platform_id[] platformArray = new OpenCLLibrary.cl_platform_id[1];
        int err = OpenCLLibrary.INSTANCE.clGetPlatformIDs(1, platformArray, null);
        if (err != OpenCLLibrary.CL_SUCCESS)
            throw new RuntimeException("failed to get platform " + err);

        // get any device, the device itself not important, but need to do queries against something
        OpenCLLibrary.cl_device_id[] deviceArray = new OpenCLLibrary.cl_device_id[1];
        err = OpenCLLibrary.INSTANCE.clGetDeviceIDs(platformArray[0], OpenCLLibrary.CL_DEVICE_TYPE_ALL, 1, deviceArray, null);
        if (err != OpenCLLibrary.CL_SUCCESS)
            throw new RuntimeException("failed to get device " + err);

        // assorted vars declared out side the loop
        OpenCLLibrary.cl_device_id dev = deviceArray[0];  // do not want to index dev array every call
        long cummTime = 0L;
        long start;

        NativeSize szInt = new NativeSize(Native.LONG_SIZE);
        IntByReference valInt = new IntByReference();
        int lookedAtInt;

        NativeSize szLong = new NativeSize(8);
        LongByReference valLong = new LongByReference();
        long lookedAtLong;

        NativeSize szSizeT = new NativeSize(8);
        NativeSizeByReference valSizeT = new NativeSizeByReference();
        long lookedAtSizeT;

        NativeSize szString = new NativeSize();
        NativeSizeByReference nCharBuf = new NativeSizeByReference();
        ByteBuffer valStringBuf;
        int length;
        String lookedAtString;

        long force_JVM_to_do = 0;

        for(int i = 0; i < LOOP_COUNT; i++){
            start = System.nanoTime();

            // int based info queries
            err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_VENDOR_ID, szInt, valInt.getPointer(), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed int query " + err);
            lookedAtInt = valInt.getValue();

            // long based info queuies
            OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_MAX_MEM_ALLOC_SIZE, szLong, valLong.getPointer(), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed long query " + err);
            lookedAtLong = valLong.getValue();

            // tSize based info queuies
            OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_IMAGE2D_MAX_WIDTH, szSizeT, valSizeT.getPointer(), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed tsize query " + err);
            lookedAtSizeT = valSizeT.getValue().longValue();

            // string based info queuies  (2 calls, first to find out size; 2nd to get)
            err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DRIVER_VERSION, szString, null, nCharBuf);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException(ErrorDesc.getErrorDesc(err));

            length = nCharBuf.getValue().intValue();
            szString.setValue(length);
            valStringBuf = NIO_Utils.getByteBuffer(length);

            // call again to get the actual value
            err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DRIVER_VERSION, szString, Native.getDirectBufferPointer(valStringBuf), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed string query " + err);
//            else
                lookedAtString = NIO_Utils.toString(valStringBuf);

            cummTime += System.nanoTime() - start;
            force_JVM_to_do += lookedAtInt - lookedAtLong + lookedAtSizeT - lookedAtString.length();
        }


        System.out.println("Avg ms per loop: " + (cummTime/(LOOP_COUNT * NANOS_PER_MILLI)));
        System.out.println("ignore:  " + force_JVM_to_do);

    }
}


Offline bienator

Senior Devvie




OutOfCoffeeException


« Reply #1 - Posted 2010-03-09 02:22:39 »

I like the title Wink At least someone who don't take technology discussions religiously


I had to comment the last few lines out of your test sinc you forgot to include the NIO_Utils.

well I used the high level api since its already late (And I had to change the rules since I am on linux 64 / GTX295)
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
package whatever;

import com.mbien.opencl.CLContext;
import com.mbien.opencl.CLDevice;
import com.mbien.opencl.CLPlatform;

/**
 * @author mbien
 */

public class JOCLHLBench {

    static int LOOP_COUNT = 1000000; // 1M
    static float NANOS_PER_MILLI = 1000000F;

    public static void main(String[] args) {

        //init
        CLContext context = CLContext.create(CLPlatform.getDefault().listCLDevices()[0]);
        CLDevice device = context.getDevices()[0];

        long cummTime = 0L;
        long start;
        long force_JVM_to_do = 0;

        long lookedAtInt;
        long lookedAtLong;
        long lookedAtSizeT;
        String lookedAtString;

        for(int i = 0; i < LOOP_COUNT; i++){
            start = System.nanoTime();

            // int based info queries
            lookedAtInt = device.getVendorID(); // sorry, but this is an long in my case :)

            // long based info queuies
            lookedAtLong = device.getMaxMemAllocSize();

            // tSize based info queuies
            lookedAtSizeT = device.getMaxImage2dWidth();

            // string based info queuies  (2 calls, hidden in HL API)
            lookedAtString = device.getDriverVersion();

            cummTime += System.nanoTime() - start;
            force_JVM_to_do += lookedAtInt - lookedAtLong + lookedAtSizeT - lookedAtString.length();
        }

        System.out.println("Avg ms per loop: " + (cummTime/(LOOP_COUNT * NANOS_PER_MILLI)));
        System.out.println("ignore:  " + force_JVM_to_do);

        //deinit
        context.release();
    }
   
}


your code again:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
package whatever;

import java.nio.*;
import com.sun.jna.*;
import com.sun.jna.ptr.*;
import com.ochafik.lang.jnaerator.runtime.NativeSize;
import com.ochafik.lang.jnaerator.runtime.NativeSizeByReference;
import com.nativelibs4java.opencl.library.*;

/**
 *  JNA version, using OpenCL4Java(the low level bindings for JavaCL).  Add this jar to project to run.
 *  http://nativelibs4java.sourceforge.net/maven/com/nativelibs4java/opencl4java/1.0-SNAPSHOT/opencl4java-1.0-SNAPSHOT-shaded.jar
 *
 *  Goal: test call overhead of JNI vs JNA.  OpenCL has dev info calls which are
 *  short in duration.  They DO NOT touch GPU's.  The type of data returned can be found
 *  by running http://nativelibs4java.sourceforge.net/webstart/OpenCL/HardwareReport.jnlp
 *
 *  Not every possible query performed, only one of each return type.  Too much work for
 *  all.  Control using LOOP_COUNT.
 *
 *  Turn on the clock only after Platform, dev created.
 *
 *  Rules:
 *     - platform must be NVidia 195 or 196 if Windows.  Win7 64-bit if possible.
 *     - Do not even bothering to create a context or command queue.
 *     - The avg time/loop should be compared on exact same hardware.  The value itself is
 *       NOT important, only the difference in values.
 *     - MUST "look" at value, since this could be different.
 *     - MUST include any methods which one would reasonable need to do inside the
 *       loop.  e.g. getPointer() methods for JNA
 *     - assigning return code required, but can be actual checking can be commented out
 *
 */

public class JNIvsJNAviaOpenCL{
    static int LOOP_COUNT = 1000000; // 1M
    static float NANOS_PER_MILLI = 1000000F;

    public static void main(String[] argv){
        // get platform, usually only one, unless mixing NVidia & ATI GPU's
        OpenCLLibrary.cl_platform_id[] platformArray = new OpenCLLibrary.cl_platform_id[1];
        int err = OpenCLLibrary.INSTANCE.clGetPlatformIDs(1, platformArray, null);
        if (err != OpenCLLibrary.CL_SUCCESS)
            throw new RuntimeException("failed to get platform " + err);

        // get any device, the device itself not important, but need to do queries against something
        OpenCLLibrary.cl_device_id[] deviceArray = new OpenCLLibrary.cl_device_id[1];
        err = OpenCLLibrary.INSTANCE.clGetDeviceIDs(platformArray[0], OpenCLLibrary.CL_DEVICE_TYPE_ALL, 1, deviceArray, null);
        if (err != OpenCLLibrary.CL_SUCCESS)
            throw new RuntimeException("failed to get device " + err);

        // assorted vars declared out side the loop
        OpenCLLibrary.cl_device_id dev = deviceArray[0];  // do not want to index dev array every call
        long cummTime = 0L;
        long start;

        NativeSize szInt = new NativeSize(Native.LONG_SIZE);
        IntByReference valInt = new IntByReference();
        int lookedAtInt;

        NativeSize szLong = new NativeSize(8);
        LongByReference valLong = new LongByReference();
        long lookedAtLong;

        NativeSize szSizeT = new NativeSize(8);
        NativeSizeByReference valSizeT = new NativeSizeByReference();
        long lookedAtSizeT;

        NativeSize szString = new NativeSize();
        NativeSizeByReference nCharBuf = new NativeSizeByReference();
        ByteBuffer valStringBuf;
        int length;
        String lookedAtString;

        long force_JVM_to_do = 0;

        for(int i = 0; i < LOOP_COUNT; i++){
            start = System.nanoTime();

            // int based info queries
            err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_VENDOR_ID, szInt, valInt.getPointer(), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed int query " + err);
            lookedAtInt = valInt.getValue();

            // long based info queuies
            OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_MAX_MEM_ALLOC_SIZE, szLong, valLong.getPointer(), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed long query " + err);
            lookedAtLong = valLong.getValue();

            // tSize based info queuies
            OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DEVICE_IMAGE2D_MAX_WIDTH, szSizeT, valSizeT.getPointer(), null);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException("failed tsize query " + err);
            lookedAtSizeT = valSizeT.getValue().longValue();

            // string based info queuies  (2 calls, first to find out size; 2nd to get)
            err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DRIVER_VERSION, szString, null, nCharBuf);
//            if (err != OpenCLLibrary.CL_SUCCESS)
//                throw new RuntimeException(ErrorDesc.getErrorDesc(err));

            length = nCharBuf.getValue().intValue();
            szString.setValue(length);
//            valStringBuf = NIO_Utils.getByteBuffer(length);
//
//            // call again to get the actual value
//            err = OpenCLLibrary.INSTANCE.clGetDeviceInfo(dev, OpenCLLibrary.CL_DRIVER_VERSION, szString, Native.getDirectBufferPointer(valStringBuf), null);
////            if (err != OpenCLLibrary.CL_SUCCESS)
////                throw new RuntimeException("failed string query " + err);
////            else
//                lookedAtString = NIO_Utils.toString(valStringBuf);

            cummTime += System.nanoTime() - start;
            force_JVM_to_do += lookedAtInt - lookedAtLong + lookedAtSizeT /*- lookedAtString.length()*/;
        }


        System.out.println("Avg ms per loop: " + (cummTime/(LOOP_COUNT * NANOS_PER_MILLI)));
        System.out.println("ignore:  " + force_JVM_to_do);

    }
}


results:

OpenCL4Java:

Avg ms per loop: 0.022599893
ignore:  -234688290000000

JOCL, high level:
Avg ms per loop: 0.012990374
ignore:  5343065332166419968

(values are different since JNA version makes one less)

no guarantees, maybe i forgot something in the hurry... its already late in germany.
thanks for providing the testcase!

Offline olivier.chafik

Innocent Bystander





« Reply #2 - Posted 2010-03-09 08:40:48 »

Hi all,

I'm the author of JavaCL (a.k.a OpenCL4Java).
This benchmark is actually an excellent news for JavaCL's performance, because... it doesn't even use the fastest JNA mapping mode !

Indeed, JNA has two mapping modes (see https://jna.dev.java.net/) :
- interface mode (dynamic and slow because it's reflection-intensive), currently used by OpenCL4Java
- direct native mode (native methods are directly bound to native function callbacks, pretty much as in JNI).

The direct native mode can be up to 10 times faster than the interface mode, but has some (overridable) limitations that made me not to choose it for the pre-1.0 final version of JavaCL / OpenCL4Java.

Your post will obviously make me release a "direct-enabled" OpenCL4Java binding sooner, so stay tuned :-)
Cheers
--
Olivier
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline nsigma
« Reply #3 - Posted 2010-03-09 11:19:51 »

This benchmark is actually an excellent news for JavaCL's performance, because... it doesn't even use the fastest JNA mapping mode !
Well, I always knew JNA was going to end up as Joe Frazier in such a raw match up, but I had assumed those figures were using direct mapping.  That's better performance than I expected using interface mode.  While seeing some robust benchmarks between JNI and JNA is something I'd like to see, the other thing that interests me is seeing some "real" apps benchmarked using the two bindings.  I'm more interested in seeing at what point (if any) the extra overhead becomes statistically irrelevant.

@ Olivier - there was some discussion of direct vs interface mapping at the end of the "Catch 22 for JOGL" thread, in case you haven't seen it.  OT - The JNAJack binding I mentioned there (it's at http://code.google.com/p/java-audio-utils/) used an early version of JNAerator to create most of the low level binding.  Thanks, great tool!  Must try again and update with direct mapping mode.

Best wishes,

Neil

Praxis LIVE - open-source intermedia toolkit and live interactive visual editor
Digital Prisoners - interactive spaces and projections
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

rwatson462 (28 views)
2014-12-15 09:26:44

Mr.CodeIt (19 views)
2014-12-14 19:50:38

BurntPizza (35 views)
2014-12-09 22:41:13

BurntPizza (70 views)
2014-12-08 04:46:31

JscottyBieshaar (32 views)
2014-12-05 12:39:02

SHC (44 views)
2014-12-03 16:27:13

CopyableCougar4 (40 views)
2014-11-29 21:32:03

toopeicgaming1999 (108 views)
2014-11-26 15:22:04

toopeicgaming1999 (94 views)
2014-11-26 15:20:36

toopeicgaming1999 (29 views)
2014-11-26 15:20:08
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!