Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (806)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (868)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 2 [3] 4 5 ... 13
  ignore  |  Print  
  Java OpenGL Math Library (JOML)  (Read 216449 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Reply #60 - Posted 2015-06-25 19:58:17 »

Personally prefer degrees over radians, just find them easier to wrap my head round when thinking about rotations. Guess it ultimately boils down to personal preference.
Radians is what Java uses. Anything else is a visualization. If you want to use degrees with Java, you should use Math.toRadians(degrees) IMO.

This is so that OpenGL programmers comfortable with the legacy matrix stack immediately feel home when using JOML when they decide to switch from OpenGL's legacy matrix stack to doing the math in Java with JOML.
Well, either have OpenGL programmers feel comfortable or Java programmers feel comfortable. In my opinion, Java programmers is the bigger group here, especially since most tutorials nowadays completely skip legacy OpenGL.

Myomyomyo.
Offline Roquen

JGO Kernel


Medals: 518



« Reply #61 - Posted 2015-06-26 13:29:11 »

If anyone feels like implementing any quaternion functions...then please start a thread.  Because virtually every paper on the subject is full of shit and I've never seen a publicly available library implement anything remotely useful.  For real...I'm totally not joking.

Offline KaiHH

JGO Kernel


Medals: 798



« Reply #62 - Posted 2015-06-26 13:57:07 »

Thanks for your feedback!
As said earlier, I would like for anyone having any feature requests, enhancements or bugs, to post them as issues on GitHub. I would like to close that whole topic on JGO now.
Thanks again to all people that did provide valuable and constructive input to the development of JOML!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #63 - Posted 2015-07-08 08:26:48 »

Just one note: JOML now uses radians consistently everywhere. See this issue.
Offline theagentd
« Reply #64 - Posted 2015-07-08 13:34:53 »

Awesome!

Myomyomyo.
Offline gouessej
« Reply #65 - Posted 2015-07-09 13:17:06 »

Hi

It would be interesting to run those tests on your library to compare its performance to others:
http://lessthanoptimal.github.io/Java-Matrix-Benchmark/

There are already numerous libraries similar to JOML.

Julien Gouesse | Personal blog | Website | Jogamp
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #66 - Posted 2015-07-09 13:27:33 »

It looks like those are all general-purpose linear algebra libraries, with functions to solve linear systems of equations, doing QR- and LU-decompositions and such, which JOML does not feature.
JOML is a special-purpose library for 4x4 and 3x3 single and double-precision floating point matrices with a limited set of functions operating on them that are generally useful in 3D applications.
In that regard, JOML is rather comparable to javax.vecmath or the math classes provided in libGDX.
Offline gouessej
« Reply #67 - Posted 2015-07-09 20:46:38 »

In my humble opinion, a benchmark would still be welcome.

Julien Gouesse | Personal blog | Website | Jogamp
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #68 - Posted 2015-07-09 21:06:43 »

Well, I would certainly be happy if someone conducts one. Smiley

But of course I do some testing myself all the time and so far I expect every (non-trivial) method in JOML to beat the counterpart in libGDX by some factor > 2 (which is not hard actually) and sometimes even orders of magnitudes.

The latter is especially the case for methods in libGDX's Frustum class, because most methods in JOML are very cache- and inline-friendly and have low register pressure and contain no method invocations themselves.

Especially the now heavily optimized, inlined and unrolled Matrix4.isPointInsideFrustum() and isSphereInsideFrustum() methods can handle about 50 million! invocations in under 6 milliseconds for both cases where the point/sphere is inside the frustum and where it is not.

With isAabInsideFrustum() the numbers are about 100 milliseconds for 5 million invocations for boxes that intersect the frustum and about 40 milliseconds for boxes that do not.
Currently, that method is a modified implementation of "2.4 Basic intersection test" from this site.
Offline theagentd
« Reply #69 - Posted 2015-07-09 23:09:59 »

I'm trying to write a small benchmark, but I'm missing the create-matrix-from-translation+orientation+scale function I need for my skeleton animation. Without it I'll have to manually construct the matrix which will be much slower.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #70 - Posted 2015-07-10 03:16:42 »

Some initial benchmarks:

TestLibGDXJOML
Construct matrix from translation+quat+scale77 248k bps65 133k bps
Full bone construction (mul)12 660k bps25 146k bps
Full bone construction (mul4x3)12 660k bps29 842k bps
Construct matrix + invert11 431k bps15 610k bps
*bps = Bones per second

The lack of an optimized function to create a matrix from a translation, orientation and scale makes LibGDX slightly faster at that test, but once you throw in a multiply with a bind-pose matrix to construct a "full" bone JOML wins easily, being ~2x faster. The optimized mul4x3() function of JOML gets us even further, giving us 2.36x better performance than LibGDX. That's really surprising considering LibGDX actually has native code for accelerating stuff like this. Seems like the overhead outweighs the gains there. In the matrix inversion test, JOML is 1.37x faster. My guess is that most of these gains come from the fact that LibGDX matrices store their values in an array while JOML uses normal variables. That's one less cache miss and apparently less overhead when accessing each matrix element.

The results were identical between the two libraries as far as I could tell.

I'm not a big fan of JOML's way of visualizing floats in toString() methods. It always shows them in the power of 10 form, which is a bit confusing to get an overview of at a glance.

Gonna compare the frustum culling code you have with the one I've written myself and see if there are any improvements there tomorrow. I suspect there are.

Further suggestions:
 - That matrix construction function would also be useful for getting up to par with LibGDX in the first test.
 - I see you made an multiply-and-add function (fma())! Awesome! It only takes in two vectors though, so one that takes in a float for the multiplier would be nice, fma(Vector3, float).
 - The arguments of fma() could be given better names. They're currently (v1, v2), which says nothing about what they do. May I suggest "add" and "multiplier" for example?
 - Scalar version of Vector*.add() and sub() would be nice too. I have at least one place in my code that does that.
 - Many functions in quaternion also implicitly normalizes the quaternion. The weirdest one is invert(), but many others do too. I believe the user should be in charge of normalizing quaternions and providing normalized inputs to functions that need it.

Questions:
 - A number of quaternion functions seem to internally use doubles. Is there a reasoning behind that?

Myomyomyo.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #71 - Posted 2015-07-10 07:34:34 »

Hotspot regularly hoists fields into local variables. Once that has happened, it will optimize localvar access aggressively, converting lots of raw memory-operations into purely register-operations. This 'hoisting' is not performed as often (in Hotspot) for array operations, as their access patterns are harder to analyze and predict. My guess is that that causes the majority of the difference between conceptually similar operations in LibGDX and JOML. The additional indirection is most likely hidden behind main-memory/cache latency, which is orders of magnitude greater.

TL;DR: try to get Hotspot to keep intermediary results into registers by using instance fields and (therefore) local variables, as long as the cache-trashing caused by objects does not become the bottleneck.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #72 - Posted 2015-07-10 08:34:01 »

Matrix4.translationRotateScale() is in.

It's the same as doing translation().rotate().scale(), just in a single method where I built the method by doing the whole three transformations first and then reducing operations that were known to produce ones or zeroes.

(edit: thanks for suggesting to add this little method! It's about 60x faster than doing the three steps manually. 90 million invocations with some translation, rotation and scaling now only take 40 milliseconds... compared to 2.4 seconds with the manual approach.)

And as Riven pointed out, the only reason why JOML is slightly faster is likely due to that field/register optimization. JOML doesn't do anything special or clever there, because there is simply no other way that you can implement 4x4 matrix multiplication or inversion differently than what JOML or libGDX or any other library does, on the arithmetics side. Smiley

However, I do believe that with the is*InsideFrustum() methods, JOML took the fastest possible road (not needing to build a Frustum class or a Plane class or normalizing the plane 'normals', or building the planes out of NDC-unprojected points, etc.), while still being generic (coping with arbitrary matrices) and not making use of temporal coherency (that would be the next step in optimization, also proposed by the paper I implemented the algorithm from).
But that optimization falls more in the realm of a real game engine.
So, make sure to use the latest HEAD when benchmarking JOML on these functions.
Offline Roquen

JGO Kernel


Medals: 518



« Reply #73 - Posted 2015-07-10 08:47:17 »

implicity normalizing quaternions is a bad idea.
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #74 - Posted 2015-07-10 12:28:06 »

The ground works for temporal coherency caching of AABB-frustum intersection tests is in now!

There is an additional method `isAabInsideFrustumMasked()` which takes a bitmask of the 6 possible planes to check against the box.
It linearly scales in the number of active planes in that bitmask.

Now, the semantics of `isAabInsideFrustum()` has changed to not return just true or false, but instead return the index of the first tested plane that culled the box.
This index can now serve as the plane mask (applied via 1<<index) into `isAabInsideFrustumMasked()` which will then only check if that plane "still" culls the box.
Ogre also does something like this and it has the potential to dramatically speed up AABB-frustum culling when applied in a game.
Offline theagentd
« Reply #75 - Posted 2015-07-10 13:15:02 »

Hotspot regularly hoists fields into local variables. Once that has happened, it will optimize localvar access aggressively, converting lots of raw memory-operations into purely register-operations. This 'hoisting' is not performed as often (in Hotspot) for array operations, as their access patterns are harder to analyze and predict. My guess is that that causes the majority of the difference between conceptually similar operations in LibGDX and JOML. The additional indirection is most likely hidden behind main-memory/cache latency, which is orders of magnitude greater.

TL;DR: try to get Hotspot to keep intermediary results into registers by using instance fields and (therefore) local variables, as long as the cache-trashing caused by objects does not become the bottleneck.
I intentionally work on a public static array of 4096 translations, scales, etc to get around that, since that's more like what I do in practice. That being said, it seemed much easier for JOML to get stack allocation than LibGDX. Before I made them static variables, JOML could easily be 10x faster than LibGDX.

And as Riven pointed out, the only reason why JOML is slightly faster is likely due to that field/register optimization. JOML doesn't do anything special or clever there, because there is simply no other way that you can implement 4x4 matrix multiplication or inversion differently than what JOML or libGDX or any other library does, on the arithmetics side. Smiley
I'm fairly sure that because LibGDX's matrices rely on an internal 16-element array to hold its matrix elements, it gets some overhead from that. Hotspot can most likely optimize JOML's function better.

It'd be nice if you could give me some feedback on my other suggestions and questions as well.

Myomyomyo.
Offline theagentd
« Reply #76 - Posted 2015-07-10 14:17:13 »

Just took a minute to try out the new translationRotateScale() method.
 - The name is inconsistent. translationRotationScale() would be more in line with the other method names that set the matrix to a given value.
 - The arguments are weird.
float tx, float ty, float tz, Quaternionf quat, float sx, float sy, float sz
should be
float tx, float ty, float tz, float qx, float qy, float qz, float qw, float sx, float sy, float sz
, and you should add another convenience method that takes in Vector3f and Quaternionf arguments instead:
Vector3f translation, Quaternionf rotation, Vector3f scale
.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #77 - Posted 2015-07-10 14:26:20 »

translationRotateScale was chosen because it reflects the intermediate operations translation().rotate().scale() very nicely, and thus can be thought of as a condense form of doing these three methods, which I find quite nice to read.
And if in the future there will be more condense methods of other combinations, this can follow that scheme, too. So people switching from the "chain" of intermediate operations to the "condense" form just need to erase those dots. Smiley
I agree that the method should use all-primitives and all-objects overloads, though.
Offline theagentd
« Reply #78 - Posted 2015-07-10 14:52:15 »

New results using the new translationRotateScale()!
TestLibGDXJOMLSpeedup
Construct matrix from translation+quat+scale78 310k bps86 339k bps10.3%
Full bone construction (mul4x3)12 171k bps32 110k bps163.8%
Construct matrix + invert11 386k bps16 633k bps46.1%

LibGDX's matrix multiply was slower today... >___> Anyway, the gains are real. Looking at the source code of the LibGDX version of translationRotateScale(), the math done is identical. The only difference is that the matrix elements are in an array instead of simple fields, and that apparently gives a 10.3% boost in performance alone. Real nice.

Myomyomyo.
Offline theagentd
« Reply #79 - Posted 2015-07-10 16:53:23 »

Culling results:
TestMy culler% visibleJOML% visible
Point culling67 010k1.243%68 326k1.243%
Sphere culling65 962k1.6414%66 992k10.3498%
AABB culling40 193k1.483%48 327k98.517006%

Well, it's a tiny bit faster than mine, but also extremely prone to false positives for sphere and AABB culling.

I'm starting to doubt the usefulness of this. Many of the values can be precomputed for better performance in a separate Culler class. The improved performance comes from unrolling the plane loop (I store each plane as a small Plane object), and you can get even further if you precompute the planes I think. In addition, you often want to do a distance based test first to eliminate 90% of all points, and when I enable that my culler actually wins easily. I also believe that to correctly cull volumes (spheres/AABBs) you need to have normalized planes or you'll get both false positives and negatives.

EDIT: Spheres had a diameter of 2 and AABBs a side of 2.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #80 - Posted 2015-07-10 17:02:47 »

Could you please provide an example of your test fixture or a frustum setup where the sphere and the AABB culling provides a wrong result? Thanks alot for all that testing!
EDIT: Oh, yes you are totally right with the plane equations needing to be normalized when doing calculations with distances! I just reproduced it with a single plane equation and it totally did not work. Smiley
Yes, in that case it is not really useful when the plane normals would always be recomputed for each invocation. A cached normal in a Plane class would be way better then! Thanks for spotting!
EDIT2: I fix the methods that they all work in the same "measure." They may not be the fastest methods then, but maybe still useful as some "convenience" methods.
Offline theagentd
« Reply #81 - Posted 2015-07-10 17:42:15 »

If possible, keep the planes as field variables. Manually unrolling the loop did have an impact as you can see. Simply creating a FrustumCuller class that you can create from/update with a matrix (or more than 1 matrices) would be useful. That's what I have. Another really important thing is distance-based culling as well. Here's my FrustumCuller class for reference: http://www.java-gaming.org/?action=pastebin&id=1307. Scavenge what you want from it, but like I said, making the planes field variables instead of Plane objects does have a noticeable impact.

Note that the ability to pass in a max distance is extremely useful in many cases, for example when culling objects with a maximum render distance. Another vital feature for me is the ability to push planes a certain distance, which is useful when rendering shadow maps. If you perfectly fit a shadow map to the frustum, you may still want to render things outside the frustum using GL_DEPTH_CLAMP to make sure they still cast shadows without wasting precision in the shadow map.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #82 - Posted 2015-07-10 17:48:02 »

Okay, thanks!
I fixed the Matrix4.isSphereInsideFrustum() at least. From what I see, it now works for unnormalized plane equations, too, albeit of course a lot slower due to Math.sqrt().
Could you check it again for correctness?

Also, I cannot seem to find an issue with the AABB test. It's literally implemented by the algorithm described in http://www.cescg.org/CESCG-2002/DSykoraJJelinek/ (section 2.4).
And it also does not make use of distance measures.
Note that this method now returns -1 (as declared by the JavaDocs) if the box is inside the frustum and a value greater equal 0 for the plane index that culled the box if it does not intersect the frustum.
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #83 - Posted 2015-07-10 17:56:18 »

Note that this method now returns -1 (as declared by the JavaDocs) if the box is inside the frustum and a value greater equal 0 for the plane index that culled the box if it does not intersect the frustum.

Haha! That could be the reason why your percentage measure of JOML's AABB is actually the oppositve of your algorithm Cheesy both 98.517% and 1.483% give 100% together. Smiley
Could you invert the measure of JOML's test?
Offline pitbuller
« Reply #84 - Posted 2015-07-10 19:26:11 »

Planes don't actually need to be normalized if you never ask actual distance from the plane.
http://iquilezles.org/www/articles/frustum/frustum.htm
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #85 - Posted 2015-07-10 19:42:06 »

Cool, thanks for that link, pitbuller.
Another thing: I just built a simple Culler class storing the plane normals as four Vector4f instances and then doing the frustum culling methods on them. Turned out okay.
Then just for the heck I eliminated those four Vector4f instances and stored the 16 float values as instance fields. That alone consistently brought a speedup of roughly 17%...
that 4 additional GC marks and clazz pointers trashed the L1 cache...? unbelievable Smiley
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #86 - Posted 2015-07-10 19:48:40 »

This is most likely not related to cache trashing. Get yourself a debug JDK build and dump the (x86) ASM. That 17% diff most likely resembles completely different native code, not just a few changes.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #87 - Posted 2015-07-10 19:52:56 »

Yeah, I am already using hsdis-amd64.dll to show the generated code and once cheered at first sight as I saw that HotSpot was emitting SSE code, but then this was just scalar code... so not really SIMD.
I sooo wish for a (possibly internal) API exposing SSE intrinsics as Java methods. As vector types maybe float[], which are internally converted to SSE vector types.
Offline theagentd
« Reply #88 - Posted 2015-07-10 19:56:25 »

Sphere culling has been fixed in your latest release and produce identical culling as my own code. AABB culling was indeed an inverted culling error on my end. All three methods now cull the exact same amount as my code. Point culling is ~1.5% faster than my code, but sphere culling is almost half as fast as my code (~58% as fast as my code). Your AABB culling code is ~20% faster than mine though.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 798



« Reply #89 - Posted 2015-07-10 19:59:33 »

Thanks for checking again!
Yes, that sphere culling code is now really really bad Smiley , as with non-cached plane normals the plane equations have to be renormalized on every invocation. Smiley
You might give the new Culler class a try, which has an identical interface but caches the plane normals now on creation and updates it in its set(Matrix4f) method.
Pages: 1 2 [3] 4 5 ... 13
  ignore  |  Print  
 
 

 
Riven (587 views)
2019-09-04 15:33:17

hadezbladez (5533 views)
2018-11-16 13:46:03

hadezbladez (2411 views)
2018-11-16 13:41:33

hadezbladez (5794 views)
2018-11-16 13:35:35

hadezbladez (1233 views)
2018-11-16 13:32:03

EgonOlsen (4669 views)
2018-06-10 19:43:48

EgonOlsen (5688 views)
2018-06-10 19:43:44

EgonOlsen (3205 views)
2018-06-10 19:43:20

DesertCoockie (4104 views)
2018-05-13 18:23:11

nelsongames (5125 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!