Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (808)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (872)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 ... 5 6 [7] 8 9 ... 13
  ignore  |  Print  
  Java OpenGL Math Library (JOML)  (Read 218783 times)
0 Members and 1 Guest are viewing this topic.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #180 - Posted 2015-07-23 09:16:45 »

My vote goes to incrementing position for consistency with NIO.
NIO allows for both absolute and relative I/O. Absolute puts on buffers do not increment the position.


I'm with Spasi on this one: I never liked NIO buffers being stateful. These buffers were meant to allow access to off-heap memory, but they slapped a 'design philosophy' on top that is supposedly convenient. We should (IMHO) think of buffers like primitive arrays, but they designed it like a Stack/List hybrid. As a result it took Sun roughly a decade to get NIO performance in the same ballpark as primitive arrays, sometimes. The generated ASM is still a mess, rather inefficient (compared to primitive array access), but most of this overhead is hidden by memory latency.

Anyhoo, I'm getting offtopic, so let me end with a loaded question: who here would be in favor of stateful arrays? Kiss

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #181 - Posted 2015-07-23 10:58:48 »

Personally, I am with both @ra4king and @Spasi here.
I can see the use case of putting multiple subsequent matrices in a single UBO (for example) without manually incrementing the buffer position (for convenience), and also putting a single matrix in a buffer without then having to reset the position (also for convenience).
In fact, putting a series of matrices in a single UBO for uploading to OpenGL was the first use case I thought of when designing the get(ByteBuffer) method in Matrix4f. And I also had the same kind of talks with @Spasi before when tinkering about how to make the design play nicely with LWJGL and their users with not incrementing the buffer position, which would allow to reuse a single ByteBuffer for many uploads without ever touching the buffer position.
Another of course important aspect that drew the design in a certain direction was compatibility/alignment with existing APIs such as NIO, which, as you said, does increment the buffer position (in their relative put/get operations).
This aligning aspect was another reason I first wanted to make the get() operation behave like the relative get() operation of NIO.
Whether NIO's design is flawed or not, I don't know and don't allow myself to judge.
So in the end we can say, that it is a matter of which use case is the more likely and which one do we want to support with the least amount of client code necessary and the best comfort and convenience.
We could of course, like NIO, have both relative and absolute operations. But this would make the API more complicated by providing two different methods with two different semantics.
It is a hard decision to make.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #182 - Posted 2015-07-23 13:04:08 »

We could of course, like NIO, have both relative and absolute operations. But this would make the API more complicated by providing two different methods with two different semantics.
It is a hard decision to make.
I'm not so sure this 'complicates' the API. Choice is goodTM. It also wouldn't grow the codebase much, as the relative I/O would piggyback on the absolute I/O functionality.


If only we had structs, and this whole point was moot...

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #183 - Posted 2015-07-23 13:46:46 »

Okay. So, if for the majority of people it is good to have a relative get(ByteBuffer) and an absolute get(int, ByteBuffer), then I'd be fine with that, since it would be more in line with what NIO itself does today.
So, people wanting to upload a single matrix to OpenGL, then would do:
1  
2  
3  
Matrix4f m = ...;
FloatBuffer fb = ...;
m.get(0, fb);

and people wanting to load a bunch of matrices would do:
1  
2  
3  
4  
5  
6  
Matrix4f[] ms = ...;
FloatBuffer fb = ...;
for (Matrix4f m : ms) {
  m.get(fb);
}
fb.flip();

I would be fine with that. The price to pay however would be different semantics between JOML and LWJGL.
Offline princec

« JGO Spiffy Duke »


Medals: 1146
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #184 - Posted 2015-07-23 14:50:39 »

The original LWJGL vecmath library was just a temporary bodge that sort of got used a bit more than expected...

Cas Smiley

Offline KaiHH

JGO Kernel


Medals: 819



« Reply #185 - Posted 2015-07-23 15:04:40 »

Hm.. could you explain how that relates to the current debate over having relative vs. absolute NIO methods?
I am assuming that, since LWJGL's vecmath library used relative put/get operations, you are wishing that LWJGL hadn't used relative operations but absolute ones instead. So you would be in favour of the absolute methods?
Offline Spasi
« Reply #186 - Posted 2015-07-23 15:35:26 »

If you're looking for more trouble, you could also add "terminal" versions of the various methods that accept a Matrix4f dest. Using theagentd's earlier example:

1  
2  
matrix.translationRotateScale(...).mul(...).get(directBuffer); // get is the terminal operation
matrix.translationRotateScale(...).mul(..., directBuffer); // now mul is the terminal operation

The result of mul is stored in the buffer directly, which eliminates a copy and should result in better performance.
Offline princec

« JGO Spiffy Duke »


Medals: 1146
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #187 - Posted 2015-07-23 15:49:51 »

Hm.. could you explain how that relates to the current debate over having relative vs. absolute NIO methods?
I am assuming that, since LWJGL's vecmath library used relative put/get operations, you are wishing that LWJGL hadn't used relative operations but absolute ones instead. So you would be in favour of the absolute methods?
All I'm saying is that basing your API design decisions on what LWJGL programmers are used to is not really a sound foundation Smiley

Cas Smiley

Offline KaiHH

JGO Kernel


Medals: 819



« Reply #188 - Posted 2015-07-23 16:10:54 »

If you're looking for more trouble, you could also add "terminal" versions of the various methods that accept a Matrix4f dest.
The hell? What makes you think I was looking for trouble either for me as implementor or for users using JOML? As I see it, the current debate is also not so much about performance as it is about convenience and meeting a user's expectations.
Regarding the last point, it certainly is a good decision to base certain design decisions on those of widely and successfully used libraries to ease adoption for users.
And I also was not referring to LWJGL 2's vecmath but to LWJGL's usage of NIO buffers within all over its API.
Offline Spasi
« Reply #189 - Posted 2015-07-23 16:21:03 »

The hell? What makes you think I was looking for trouble either for me as implementor or for users using JOML? As I see it, the current debate is also not so much about performance as it is about convenience and meeting a user's expectations.

Hey, I was trying to be funny. Tongue

I know how hard it is being bombarded with requests and suggestions when doing open-source work. I contributed to that with one more suggestion, which is more stuff for you to consider (=trouble). It had nothing to do with the current debate.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #190 - Posted 2015-07-23 16:46:41 »

okay... I was like 'what is he proposing there?'  Grin
No, everything's cool.  Cool
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #191 - Posted 2015-07-23 16:58:32 »

As a design consideration, it might be a good idea to consider keeping all this I/O stuff out of the core classes, to keep them lean and mean. What are your thoughts on hijacking cylab's idea and moving all NIO related functions to their own, dedicated class. (no, I'm not suggesting involving the Collections API Pointing) At one point you might add a dedicated class for primitive array I/O without having to worry about bloating your core classes once again.

Just throwing it out there Smiley

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #192 - Posted 2015-07-23 17:17:48 »

Just throwing it out there Smiley
Oh nooo. Yet another one. I cannot stand it anymore.  Grin

But yes. I would think about what it means for JOML to be lean and mean by estimating how "central" the aspect of putting a matrix in a NIO buffer actually is for JOML. This would raise the question of how likely JOML is going to be used without a Java/OpenGL binding, in such a way that having NIO buffer methods in the matrix classes would actually "pollute" those classes. That I honestly cannot tell. But I guestimate: not very likely.
Then I would also contrast that with how likely it is to have additional things to convert a JOML matrix into, which I also don't see any use case currently with my very limited use-case-estimating-abilities. Smiley
So, I would currently like to leave the NIO buffer conversion methods in the matrix classes, because as I see it, that supports the anticipated use case of JOML being used with LWJGL/JOGL.
Offline ra4king

JGO Kernel


Medals: 508
Projects: 3
Exp: 5 years


I'm the King!


« Reply #193 - Posted 2015-07-23 23:57:05 »

Oh I like Riven's idea: a separate NIO Buffers class that deals with Buffer -> Vector/Matrix and Vector/Matrix -> Buffer. I support this idea, and we can then have both relative and absolute methods there for reading/writing.

After giving this much thought, I realized how annoying the code would be with a separate class. Calling a get(...) on the instance itself is much more elegant, at the cost of "bloat"... however....

This would raise the question of how likely JOML is going to be used without a Java/OpenGL binding, in such a way that having NIO buffer methods in the matrix classes would actually "pollute" those classes. That I honestly cannot tell. But I guestimate: not very likely.
...
So, I would currently like to leave the NIO buffer conversion methods in the matrix classes, because as I see it, that supports the anticipated use case of JOML being used with LWJGL/JOGL.

... Kai is right. This library is meant for use with OpenGL in Java, with the two major libraries, LWJGL and JOGL, using NIO buffers thus they should be integrated into the core math classes.

Offline ra4king

JGO Kernel


Medals: 508
Projects: 3
Exp: 5 years


I'm the King!


« Reply #194 - Posted 2015-07-24 00:19:45 »

1 flip() after filing my buffer is better than continuously setting the position, and it is especially less error prone in case I add another get(buffer) in there in complex code and forget to adjust the position increment properly.

There's probably not a right answer here and comes down to personal opinion. My personal opinion is:

- Having to deal with flip() means having to mentally track two variables; position and limit. This is more complex than dealing with position() only.
- Having a method call mutate arguments (i.e. changing a buffer's current position) is fundamentally bad practice and bad API design. I don't know about you, but I can reason better about method calls that are free of side-effects and any mutations to my objects are explicit.
- Last but not least, I've spent 12 years of my life reading posts on the LWJGL forum about people forgetting to .flip() a buffer.

I understand your points and see where you're coming from, but I'm coming from the standpoint of how NIO was supposed to be used (ignoring Riven's distaste for it Grin). Either way, providing both relative and absolute methods would be best in my opinion.

Ah you misunderstood me as you worded my intentions exactly. I did notice how this.m00 is read after dest.m00 has been set, which is why the conditional is there to make sure it's not written to before it is read again in the case the destination is the same as either operands. I wasn't arguing for the inlining, I was arguing that avoiding a function call by adding a branch is not worth it as the function call will most likely be inlined by Hotspot anyway.

I must still be misunderstanding you. As I explained above, you cannot use a method call there as that would lead to different semantics (and defeat the optimization).

Ahhh I missed that part. Hmm different semantics I agree, but how would the optimization be defeated? this.m00 is still read 3 times using the method call so that could be optimized just as well. Or did you refer to the optimization of less CPU registers being used as all the values wouldn't need to be stored in registers until they are all evaluated?

Offline Spasi
« Reply #195 - Posted 2015-07-24 09:05:35 »

I understand your points and see where you're coming from, but I'm coming from the standpoint of how NIO was supposed to be used (ignoring Riven's distaste for it Grin).

NIO made so many new things possible when it was released (including LWJGL). But it also is just a plain bad API. I mean no disrespect to the people that wrote it, they did what they had to do at the time. I mostly blame the platform/industry that makes it so hard to fix/replace standard APIs.

There are many high-profile developers that are looking for a NIO replacement these days. Look for the sun.misc.Unsafe drama that's been unfolding recently (it's being hidden away in Java 9, without a complete replacement) and you'll find several references to why NIO needs a redesign.

Or did you refer to the optimization of less CPU registers being used as all the values wouldn't need to be stored in registers until they are all evaluated?

This. More info:

The optimized code (without the method call) needs a couple of registers to do the whole thing. But it's not massively faster than the alias-sensitive call. One would think that the method call pushes the arguments to 16 registers or the stack. But it's much better than that. The JIT does a fantastic job with reordering operations, such that it works correctly (when argument = dest) and also the registers required are as few as possible. IIRC it does the job in 5 registers, last time I checked. This is also the reason that using Unsafe (e.g. via LibStruct) does not result in better performance*: the JVM never reorders Unsafe accesses and the resulting code is suboptimal.

* pure CPU performance, ignoring the better cache utilization that LibStruct enables
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #196 - Posted 2015-07-24 11:36:44 »

joml-2d
Since I observed that most of the games JGO people do are mostly in 2D, I thought about why not make a variant of JOML that is specifically geared and optimized for 2D. It's in the making and called joml-2d.
I know 2D is just a special case of 3D with some direction being projected to be expressible as the linear combination of the other two base axes.
However 2D math really only needs 2 classes: Matrix3f and Vector2f to represent all affine 2D transformations: translation, rotation, scaling and shearing.
There is a performance gain as well as being more memory efficient by now only having to upload 3x3 matrices as mat3 uniforms instead of doing it with 4x4 matrices and projecting (e.g. Z).
Some of you might always decompose the different affine transformations into some "Vector2f translation", "float rotationAngle" and "Vector2f scaling" parts. With joml-2d I am planning to do this like JOML by having a single representation of a transformation, namely the matrix, and then methods on that matrix to set/get the different transformation properties, as well as to "apply" new transformations to existing ones.
Offline ra4king

JGO Kernel


Medals: 508
Projects: 3
Exp: 5 years


I'm the King!


« Reply #197 - Posted 2015-07-25 21:25:51 »

I understand your points and see where you're coming from, but I'm coming from the standpoint of how NIO was supposed to be used (ignoring Riven's distaste for it Grin).

NIO made so many new things possible when it was released (including LWJGL). But it also is just a plain bad API. I mean no disrespect to the people that wrote it, they did what they had to do at the time. I mostly blame the platform/industry that makes it so hard to fix/replace standard APIs.

There are many high-profile developers that are looking for a NIO replacement these days. Look for the sun.misc.Unsafe drama that's been unfolding recently (it's being hidden away in Java 9, without a complete replacement) and you'll find several references to why NIO needs a redesign.

I really don't think NIO has that bad of an API design though.... but yeah I've been following the Unsafe drama. I hope Oracle puts in a suitable replacement if they do remove it.

Or did you refer to the optimization of less CPU registers being used as all the values wouldn't need to be stored in registers until they are all evaluated?

This. More info:

The optimized code (without the method call) needs a couple of registers to do the whole thing. But it's not massively faster than the alias-sensitive call. One would think that the method call pushes the arguments to 16 registers or the stack. But it's much better than that. The JIT does a fantastic job with reordering operations, such that it works correctly (when argument = dest) and also the registers required are as few as possible. IIRC it does the job in 5 registers, last time I checked. This is also the reason that using Unsafe (e.g. via LibStruct) does not result in better performance*: the JVM never reorders Unsafe accesses and the resulting code is suboptimal.

* pure CPU performance, ignoring the better cache utilization that LibStruct enables

Ahhh this makes sense. 5 registers is insane though, I'm curious how it does it!


@KaiHH
Nice idea with the JOML-2D branch.

Now about the earlier discussion with absolute vs relative, should we go ahead with having get(buffer)/set(buffer) be relative while get(index, buffer)/set(index, buffer) be absolute? I'm working on those changes and will submit a pull request if that's the final decision.

Offline KaiHH

JGO Kernel


Medals: 819



« Reply #198 - Posted 2015-07-25 22:38:35 »

Ahhh this makes sense. 5 registers is insane though, I'm curious how it does it!
Inlining, simple good old liveness analysis and instruction reordering. See here for liveness analysis: http://www.cs.colostate.edu/~mstrout/CS553/slides/lecture03.pdf.
Once the liveness of variables are known, the jit can allocate registers. And it only needs at max 'n' registers where 'n' is the maximum variables that are alive at any given moment.
Liveness analysis also outputs "gaps", that are instructions where a variable is actually not alive (i.e. not needed).
So before actually doing register allocation, other instructions can be reordered before the first or after the last usage of these identified live variables to allow those registers to become free for those instructions.

@KaiHH
Nice idea with the JOML-2D branch.
Thanks! Smiley

Now about the earlier discussion with absolute vs relative, should we go ahead with having get(buffer)/set(buffer) be relative while get(index, buffer)/set(index, buffer) be absolute? I'm working on those changes and will submit a pull request if that's the final decision.
I'm not quite sure if there is a final decision, or if there can be any.
Offline ra4king

JGO Kernel


Medals: 508
Projects: 3
Exp: 5 years


I'm the King!


« Reply #199 - Posted 2015-07-26 07:38:07 »

Ahhh this makes sense. 5 registers is insane though, I'm curious how it does it!
Inlining, simple good old liveness analysis and instruction reordering. See here for liveness analysis: http://www.cs.colostate.edu/~mstrout/CS553/slides/lecture03.pdf.
Once the liveness of variables are known, the jit can allocate registers. And it only needs at max 'n' registers where 'n' is the maximum variables that are alive at any given moment.
Liveness analysis also outputs "gaps", that are instructions where a variable is actually not alive (i.e. not needed).
So before actually doing register allocation, other instructions can be reordered before the first or after the last usage of these identified live variables to allow those registers to become free for those instructions.

Oh wow I need to remember to take some Data-Flow Analysis and Compiler classes! That was incredibly interesting, thanks for the link!

@KaiHH
Nice idea with the JOML-2D branch.
Thanks! Smiley

Now about the earlier discussion with absolute vs relative, should we go ahead with having get(buffer)/set(buffer) be relative while get(index, buffer)/set(index, buffer) be absolute? I'm working on those changes and will submit a pull request if that's the final decision.
I'm not quite sure if there is a final decision, or if there can be any.

The last thing mentioned in the topic was yourself showing the example API and sounding decided. I'll add the missing Byte/Float/Double-Buffer methods and wait for a decision Cheesy

Offline Roquen

JGO Kernel


Medals: 518



« Reply #200 - Posted 2015-07-30 08:58:25 »

To repeat myself for the n-th.  Understanding the basics of compilers is uber-useful and in ways that isn't necessarily obvious.
Offline theagentd
« Reply #201 - Posted 2015-07-31 20:21:55 »

Lightly started experimenting with converting WSW and Insomnia to JOML. I will be editing this post as I find features I'm missing. KaiHH, you may want to add me on Skype so we can communicate through chat there (same username as here).

 - Matrix3f/d.set(Matrix4f/d): copies rotation of 4D matrix.

 - Matrix4f/d.getTranslation(Vector3f/d): opposite of setTranslation(). Stores result in provided vector.

 - Matrix4f/d.getScale(Vector3f/d): gets the scale of the X, Y and Z axes. Stores result in provided vector.

 - Vector3f/d.distanceSquared(Vector3f/d): calculates the squared distance between two points.

 - Vector3f/d.lerp(Vector3f/d, float/double alpha): does linear interpolation between two vectors.

 - Vector3f/d.project(Matrix4f/d): same as mul(Matrix4f/d), but also calculates resulting w value and divides by it at the end. Proposed implementation:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
    public Vector3d prj(Matrix4d mat, Vector3d dest) {
       double w = mat.m03 * x + mat.m13 * y + mat.m23 * z + mat.m33;
        if (this != dest) {
            dest.x = (mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30) / w;
            dest.y = (mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31) / w;
            dest.z = (mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32) / w;
        } else {
            dest.set((mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30) / w,
                     (mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31) / w,
                     (mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32) / w);
        }
        return this;
   }


<<<FATAL BUG>>>: Vector3d.mul(Matrix4d mat, Vector3d test) DOES NOT ADD THE TRANSLATION! It does NOT assume that w=1.0 despite the JavaDoc saying so! Fixed version:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
    public Vector3d mul(Matrix4d mat, Vector3d dest) {
        if (this != dest) {
            dest.x = mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30;
            dest.y = mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31;
            dest.z = mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32;
        } else {
            dest.set(mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30,
                     mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31,
                     mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32);
        }
        return this;
    }


 - Vector3f/d.setLength(float/double): normalizes the vector to a given length (lengthArgument/sqrt(x*x+y*y+z*z)).

 - Matrix*f/d.add(Matrix*f/d): sums up each element in the matrices.
 - Matrix*f/d.sub(Matrix*f/d): you get it.
 - Matrix*f/d.fma(Matrix*f/d): multiply and add version (EXTREMELY useful for skeleton animation).

 - Matrix*f/d.scale(Vector*f/d): scale and scaling function with vector parameter.
 - Matrix*f/d.scaling(Vector*f/d): scale and scaling function with vector parameter.

 - Matrix*f/d.get() functions that work on ByteBuffers instead of Float/DoubleBuffers.

 - Vector3f.mul(Matrix3/4d): to multiply float vectors by double matrices.
 - Vector3f.project(Matrix4d): to project float vectors by double matrices.

 - A function to normalize the rotation part of a 3D or 4D matrix. This is useful since after generating a normal matrix, the axes are probably not unit.

<<<FATAL BUG>>>: Matrix4d.rotate() resets the scale of the matrix somehow. I can't figure out how, but it does.
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
System.out.println(new Matrix4d().scaling(0.0000001).rotate(0, 0, 1, 0));
/*
WRONG:
  1,000E+0  0,000E+0  0,000E+0  0,000E+0
  0,000E+0  1,000E+0  0,000E+0  0,000E+0
  0,000E+0  0,000E+0  1,000E+0  0,000E+0
  0,000E+0  0,000E+0  0,000E+0  1,000E+0
*/


System.out.println(new Matrix4d().rotation(0, 0, 1, 0).scale(0.0000001));
/*
CORRECT:
  1,000E-7  0,000E+0  0,000E+0  0,000E+0
  0,000E+0  1,000E-7  0,000E+0  0,000E+0
  0,000E+0  0,000E+0  1,000E-7  0,000E+0
  0,000E+0  0,000E+0  0,000E+0  1,000E+0
*/



 - Vector3f/d.rotate(Matrix4f/d): Multiplies a vector by only the rotational part of a 4D matrix (= assume w=0) (= what mul() does now due to the bug).

<<<FATAL BUG>>>: The Matrix4d.normal(Matrix3d dest) fast path produces incorrect results. Commenting out the fast path causes the function to calculate correct normal matrices.

<<<FATAL BUG>>>: The Quaternionf.slerp() tends to produce NaN every now and then. I replaced the implementation with that of LibGDX and my performance sky-rocketed. You DEFINITELY need a better implementation.

Myomyomyo.
Offline theagentd
« Reply #202 - Posted 2015-08-01 04:39:13 »

Preliminary performance results:

 - LibGDX: 48 FPS
 - JOML: 68 FPS

Skeleton animation seems to be almost exactly twice as fast. JOML uses slerp() from LibGDX. Once I integrate the frustum culling optimizations JOML had I should be able to cram out a tiny bit more performance.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #203 - Posted 2015-08-02 11:59:28 »

Thanks for your detailed feature requests!

So far the following has been implemented:
- Matrix3.<init> and set() taking Matrix4 - many thanks to @ra4king for contribution!
- Matrix4.getTranslation()
- Vector3.distanceSquared()
- Vector3.lerp()
- Vector3.mulProject() - I did not want to name it "project" since that can be confused with Matrix4.project, which also takes viewport into account
- Matrix.scale/scaling with vector parameter
- Missing Matrix.get(ByteBuffer) methods
- Matrix4.normalize3x3()

The following has been fixed (thanks for spotting!):
- Vector3.mul(Matrix4)
- Matrix4d.rotate()
- Matrix4.normal(Matrix3) should be fixed now. Fast path should be even faster now. Smiley

The following has been improved:
- I rewrote the implementation of Quaternion.slerp(). Please try.

For the following I would like to propose "workarounds":
- Vector3.setLength(factor) - Please use v.normalize().mul(factor)

For the following I have further questions:
- Matrix.add/sub/fma - I originally removed those methods from JOML since I did not see any geometrical meaning to component-wise summing/diffing two matrices. Maybe you could explain why you need those. Probably you want to use those to compose the translation of two matrices (one of which is otherwise zero everywhere else) together?
Offline theagentd
« Reply #204 - Posted 2015-08-02 13:54:07 »

Vector3.setLength(factor) would be faster than v.normalize().mul(factor). In GLSL-speak: (vector.xyz / length * factor) is 6 instructions. vector.xyz * (factor/length) is only 4. It's a small detail, I guess. Something that would be more useful would be a Vector3*.clampLength(float maximum), which makes sure that the length of the vector isn't above the given value. I use that for clamping motion vectors and velocity vectors to a maximum length.

Matrix.fma() is probably not as useful as it was with LibGDX, but it's still my preferred way of doing skeleton animation. In skeleton animation you have up to 4 bone matrices and 4 bone weights. With some GLSL syntax to make it clearer what's happening:
1  
2  
3  
4  
5  
Vector3f inputPosition = ...;
Vector3f result = new Vector3f();
for(int i = 0; i < boneCount; i++){
    result += (boneMatrix[i] * inputPosition) * boneWeight[i];
}

The boneMatrix*inputPosition can be done using a 4x3 multiplication. We do 3 dot products (3 muls + 2 adds) plus add a translation, which equals 6*3=18 instructions per 4x3 matrix multiplication. We also multiply by the bone weight, another 3 muls, and add the result to the result vector, for a total of 18+3+3 = 24 instructions per bone. That's a total of 96 instructions per vertex.

Another way of calculating this is to do something like this:
1  
2  
3  
4  
5  
6  
Vector3f inputPosition = ...;
Matrix4f skinningMatrix = new Matrix4f().zero();
for(int i = 0; i < boneCount; i++){
    skinningMatrix += boneMatrix[i] * boneWeight[i]; //Scalar multiply of matrix
}
Vector3 result = skinningMatrix * inputPosition;

This translates to 16 muls with the bone weight plus 16 adds to the skinning matrix per bone plus 18 instructions for the final multiply at the end, for a total of 146 instructions. Why would you do this? Normals. If you also have normals, the first one needs to do another 4 matrix multiplies for a total of 8. In the second one, you can reuse the calculated skinning matrix for both positions and normals, so in total you do 4 matrix fma()s and 2 bone multiplies, which is faster (192 vs 164 instructions). In addition, as you know LibGDX was not as fast as JOML so this actually translated to a really big performance increase.

Fun fact: GPUs actually have a MULADD (fma) instruction to do a multiply and add in the same clock cycle. A matrix4x3 * vec4(x, y, z, 1.0) multiply therefor only takes 9 instructions to accomplish. The same works for fma'ing matrices together as the weight multiplication can be done together with the summing. In addition, I use a matrix4x3 for the skinning matrix. On a GPU, this means that (with normal transformation) the first technique takes 96 instructions vs 66 instructions for the second technique.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #205 - Posted 2015-08-02 14:15:33 »

So, if I understand you correctly, you would like to have a method like the following?
1  
2  
3  
4  
5  
6  
7  
8  
9  
Matrix4f fma4x3(Matrix4f other, float factor) {
  m00 += other.m00 * factor;
  m01 += other.m01 * factor;
  m02 += other.m02 * factor;
  m10 += other.m10 * factor;
  ...
  m32 += other.m32 * factor;
  return this;
}


EDIT:

Matrix4.add/.add4x3, .sub/.sub4x3, .mulComponentWise/.mul4x3ComponentWise and .fma4x3 are implemented (with usual 'this' and 'dest' variants).
Regarding naming, the component-wise mul method is the only one being a bit of an "outsider", because proper matrix multiplication with "mul" and the same signature already existed.
Offline theagentd
« Reply #206 - Posted 2015-08-02 14:18:29 »

Yep! That looks perfect! Functions to add, subtract and multiply values per element would be useful too, both normal and 4x3 versions.

Myomyomyo.
Offline theagentd
« Reply #207 - Posted 2015-08-05 23:32:51 »

Sorry, been busy with WSW's release. I will pick this up again once things have calmed down, but it'll probably take a week or two. We haven't abandoned it; the decision to switch over is still made.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 819



« Reply #208 - Posted 2015-08-06 07:13:09 »

Yeah, no worries. Smiley
Get to it whenever you have time. I'm on vacation for the next two weeks, so I too won't be able to do anything on it during that time.
Offline Roquen

JGO Kernel


Medals: 518



« Reply #209 - Posted 2015-08-06 08:41:16 »

I'm bored (raining..boo), so some comments:

* toString - I assume you want an exact representation - {Float/Double}.toHexString
* rescaling by multiple divides, multiple by recip..the extra rounding step isn't a worry.
* likewise for methods over single types which promoting to doubles...do you really care about the extra rounding steps?
* compositing of operations which are disallowed transforms for the compiler.
* arccos and arcsin are always removable.
* you might want to let the various inliners do the work for you.  Smaller classfiles, lower load/link/verify times and the first compile will happen faster (assuming more than one variant is actually used in a project). Examples various normalize() and normalize(..)
* although ugly, static methods with manipulate values in a float/double array at offsets.
* 2d vectors - maybe abuse the math and include complex number functions (1).
* slerp/nlerp - let the user pass in the closest pair (forget checking the dot and branch)
* nlerp - renormalize instead of normalize.
* branching to insure input != output for most types is a model based on when hardware had few registers.  This is not longer the case (like haswell has 168 AVX registers, sandy bridge 144).  So instead compute results to local vars and store to dest.

1) I can quickly bang out some missing functionality...but minimally I won't have the free time to do so for at least 4 weeks.
Pages: 1 ... 5 6 [7] 8 9 ... 13
  ignore  |  Print  
 
 

 
Riven (845 views)
2019-09-04 15:33:17

hadezbladez (5786 views)
2018-11-16 13:46:03

hadezbladez (2602 views)
2018-11-16 13:41:33

hadezbladez (6202 views)
2018-11-16 13:35:35

hadezbladez (1498 views)
2018-11-16 13:32:03

EgonOlsen (4732 views)
2018-06-10 19:43:48

EgonOlsen (5788 views)
2018-06-10 19:43:44

EgonOlsen (3274 views)
2018-06-10 19:43:20

DesertCoockie (4174 views)
2018-05-13 18:23:11

nelsongames (5500 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!