Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (524)
Games in Android Showcase (127)
games submitted by our members
Games in WIP (593)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Fastest texture generating algorithm  (Read 2665 times)
0 Members and 1 Guest are viewing this topic.
Offline tusaki

Junior Devvie


Medals: 1


In a mad world only the mad are sane.


« Posted 2005-05-27 08:18:30 »

This algorithm will produce a texture, given an image.

the result will both be in texImage, as well as in a textureCompatible buffer, which can be fed to glTexSubImage2D, glTexImage2D or gluBuild2DMipmaps

Basically, for any given image I, it will generate an ARGB image Y, which is a scaled ARGB version (to the nearest power of 2) of I. Testing shows it is about 16 times faster than a similar graphics.drawImage() scaling operation, even with all "dd" performance flags set on the VM. It uses fixed point arithmetics to achieve maximum speed.

I needed this because I was rendering video to a texture, and the java library algorithms were too slow.

in case you are wondering what the fastest way to update a texture in gl is, it is glTexSubImage2D, and you can use it as such, in combination with the algorithm:

1  
gl.glTexSubImage2D(GL.GL_TEXTURE_2D, 0, 0, 0, textureWidth, textureHeight, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, textureCompatibleBuffer);


anyway, I hope it is usefull to you, and if you find other optimizations I haven't thought of, share it with us here, please  Cool

you will need the "get2fold" algorithm, as mentioned in this post:
http://www.java-gaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=share;action=display;num=1117186907

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
// variables used

private BufferedImage       image;
private WritableRaster       raster;
private BufferedImage       texImage;
private byte[]                  iBuffer;
private byte[]                  tBuffer;
private ByteBuffer            textureCompatibleBuffer;
private int[]                   bankOffsets;
private int                        textureWidth = -1;
private int                        textureHeight = -1;
private int                        xScaleUnit;
private int                        yScaleUnit;
private ComponentSampleModel imageSampleModel;
private int                   scanLineStride;
private int                        pixelStride;

public static final ColorModel glAlphaColorModel =
      new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_sRGB),
        new int[] {8,8,8,8},
        true,
        false,
        ComponentColorModel.TRANSLUCENT,
        DataBuffer.TYPE_BYTE);

// start code

// only generate the raster and the buffer the first time
if(raster == null) {
      // generate a texture compatible image size, large enough to hold the image
      textureWidth = get2Fold(image.getWidth());
      textureHeight = get2Fold(image.getHeight());
     
      // generate an interleaved byte-based ARGB raster and image
      raster = Raster.createInterleavedRaster(DataBuffer.TYPE_BYTE,this.textureWidth,this.textureHeight,4,null);
      texImage = new BufferedImage(glAlphaColorModel,raster,false,new Hashtable());
   
      // get the pointers to the data of the [t]exture buffer
      // and the data of the [i]image buffer
      iBuffer = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
      tBuffer= ((DataBufferByte) texImage.getRaster().getDataBuffer()).getData();
     
      // get information on how the image is stored in the buffer
      ComponentSampleModel imageSampleModel = (ComponentSampleModel) image.getSampleModel();
      scanLineStride = imageSampleModel.getScanlineStride();
      pixelStride = imageSampleModel.getPixelStride();
      bankOffsets = imageSampleModel.getBandOffsets();
     
      // generate a fixed point floating point number
      // which will allow us to calculate for a given x or y on the texture
      // where the source pixel on the image is
      xScaleUnit = (int) (((double) image.getWidth() / (double) textureWidth) * 65536);
      yScaleUnit = (int) (((double) image.getHeight() / (double) textureHeight) * 65536);
           
      // generate a bytebuffer to store the resulting image
      textureCompatibleBuffer = ByteBuffer.allocateDirect(tBuffer.length);
      textureCompatibleBuffer.order(ByteOrder.nativeOrder());
}      


int adr = 0; // the address in the texture buffer
int xOffset = 0; // the x coordinate in the image
int bufferOffset = 0; // the final buffer offset in the image
int yOffset = 0; // the y coordinate in the image
int yBufferOffset = 0; // a temp value containing the start of the scanline in the image
if(bankOffsets.length > 3) { // RGBA or ABGR images
      for(int y=0; y<textureHeight; y++) {
            xOffset = 0;

            yBufferOffset = (yOffset >> 16) * scanLineStride;
           
            for(int x=0; x<textureWidth; x++) {                  
                  bufferOffset = yBufferOffset  + (xOffset >> 16) * pixelStride;

                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[0]];
                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[1]];
                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[2]];
                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[3]];
     
            xOffset += xScaleUnit;
      }
      yOffset += yScaleUnit;
}
} else { // RGB or BGR images
      for(int y=0; y<textureHeight; y++) {
            xOffset = 0;

            yBufferOffset = (yOffset >> 16) * scanLineStride;
           
            for(int x=0; x<textureWidth; x++) {                  
                  bufferOffset = yBufferOffset  + (xOffset >> 16) * pixelStride;

                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[0]];
                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[1]];
                  tBuffer[adr++] = iBuffer[bufferOffset + bankOffsets[2]];
                  tBuffer[adr++] = -1; // -1 signed = 255 unsigned
                 
                  xOffset += xScaleUnit;
            }
            yOffset += yScaleUnit;
      }
}
textureCompatibleBuffer.rewind();
textureCompatibleBuffer.put(tBuffer, 0, tBuffer.length);
Online kevglass

« JGO Spiffy Duke »


Medals: 208
Projects: 24
Exp: 18 years


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #1 - Posted 2005-05-27 09:07:14 »

Thanks for the code!  Nice work.

However, if you're looking for performance it might be better to write specific functions and take some of the branches out. The RGB vs RGBA branch for instance.

Kev

Offline tusaki

Junior Devvie


Medals: 1


In a mad world only the mad are sane.


« Reply #2 - Posted 2005-05-27 09:44:50 »

Thank you for your input.

I seperated the initialization and took out the (if/then ARGB) branches, but the speed increase is barely noticable. The results my test program:

5000x

with branches
run 1
19578 milliseconds
run 2
19953 milliseconds

without branches
run 1
19531 milliseconds
run 2
19640 milliseconds
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 833
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #3 - Posted 2005-05-27 10:23:40 »

If you want to increase your performance bigtime:

Compare these three ways of doing the same thing:

A (your array-filler)
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
             srcIndex = 0;
            dstIndex = 0;

            for (int i = 0; i < count; i++)
            {
               dst[dstIndex++] = src[srcIndex++];
               dst[dstIndex++] = src[srcIndex++];
               dst[dstIndex++] = src[srcIndex++];
               dst[dstIndex++] = src[srcIndex++];
            }



B (slightly adjusted, avoiding integer-increments)
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
             srcIndex = 0;
            dstIndex = 0;

            for (int i = 0; i < count; i++)
            {
               dst[dstIndex] = src[srcIndex];
               dst[dstIndex + 1] = src[srcIndex + 1];
               dst[dstIndex + 2] = src[srcIndex + 2];
               dst[dstIndex + 3] = src[srcIndex + 3];
               dstIndex += 4;
               srcIndex += 4;
            }


C (as B, but unrolled loop)
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
             srcIndex = 0;
            dstIndex = 0;

            for (int i = 0; i < count; i+=4)
            {
               dst[dstIndex] = src[srcIndex];
               dst[dstIndex + 1] = src[srcIndex + 1];
               dst[dstIndex + 2] = src[srcIndex + 2];
               dst[dstIndex + 3] = src[srcIndex + 3];
               dst[dstIndex + 4] = src[srcIndex + 4];
               dst[dstIndex + 5] = src[srcIndex + 5];
               dst[dstIndex + 6] = src[srcIndex + 6];
               dst[dstIndex + 7] = src[srcIndex + 7];
               dst[dstIndex + 8] = src[srcIndex + 8];
               dst[dstIndex + 9] = src[srcIndex + 9];
               dst[dstIndex + 10] = src[srcIndex + 10];
               dst[dstIndex + 11] = src[srcIndex + 11];
               dst[dstIndex + 12] = src[srcIndex + 12];
               dst[dstIndex + 13] = src[srcIndex + 13];
               dst[dstIndex + 14] = src[srcIndex + 14];
               dst[dstIndex + 15] = src[srcIndex + 15];
               dstIndex += 16;
               srcIndex += 16;
            }


Benchmark (Client VM)
  A: 1978.8ms
  B: 1179.7ms
  C: 0973.7ms

Benchmark (Server VM)
  A: 1525.5ms
  B: 0918.3ms
  C: 0718.7ms

Interesting huh? Shocked Grin

Seems like even the ServerVM cannot optimize it to the same (native) code.

Option C is a bit messy (and risky!), but the difference between A and B is huge, and doesn't take a lot of effort to implement.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline tusaki

Junior Devvie


Medals: 1


In a mad world only the mad are sane.


« Reply #4 - Posted 2005-05-27 10:37:12 »

Quote
If you want to increase your performance bigtime:

Compare these three ways of doing the same thing:

Benchmark
  A: 1978.8ms
  B: 1179.7ms
  C: 0973.7ms

Interesting huh? Shocked Grin
Very!

I also experimented with creating an lookup array of integers, basically pre-calculating the offsets in buffer b. However, this method was SLOWER than calculating the offsets on the fly somehow.

I'll try and tweak the copy loop based on the examples you have given me.

could you explain why is calling "src++" (example A) is slower  than using "src+0,src+1,src+1" and finally "src = src +3?" (example B). very odd.
Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 833
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #5 - Posted 2005-05-27 10:45:24 »

I think... (that's a big fat disclaimer... Grin)


1  
2  
3  
4  
index++
index++
index++
index++


does 4 assignments and native increments to [index], and they might be expensive, and cannot be executed in parallel (does a non-HT P4 anything in parallel anyway?)

whereas

1  
2  
3  
4  
5  
6  
index + 0
index + 1
index + 2
index + 3

index +=4;


has only 1 assignment, and could be done in parallel (again, i have no clue whether or not things like this can be done in parallel by my non-HT Intel cpu)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Alan_W

JGO Knight


Medals: 8
Projects: 3


Java tames rock!


« Reply #6 - Posted 2005-06-02 21:48:04 »

Very interesting indeed.  I did something similar but used short buffers (16bit colour) so I could write the full colour info with one instruction.  Have also done this with integer buffers for ARGB.  I don't have the code to hand but it was something like

1  
2  
3  
BufferedImage img = new BufferedImage(??);
DataBufferUShort db = (DataBufferUShort)img.getRaster().getDataBuffer();
short buffer[] = db.getData();


I wonder how this compares to the above.

Time flies like a bird. Fruit flies like a banana.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

toopeicgaming1999 (50 views)
2014-11-26 15:22:04

toopeicgaming1999 (43 views)
2014-11-26 15:20:36

toopeicgaming1999 (8 views)
2014-11-26 15:20:08

SHC (24 views)
2014-11-25 12:00:59

SHC (24 views)
2014-11-25 11:53:45

Norakomi (25 views)
2014-11-25 11:26:43

Gibbo3771 (23 views)
2014-11-24 19:59:16

trollwarrior1 (36 views)
2014-11-22 12:13:56

xFryIx (75 views)
2014-11-13 12:34:49

digdugdiggy (52 views)
2014-11-12 21:11:50
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!