Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (477)
Games in Android Showcase (109)
games submitted by our members
Games in WIP (536)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Smoothing Algorithm Question  (Read 29230 times)
0 Members and 1 Guest are viewing this topic.
Offline Wiki Duke

?





« Posted 2013-05-28 02:57:32 »

COULD SOMEONE MOVE THIS TO THE GENERAL DISCUSSION SECTION. @_@

Hello! This is a question that's for all ya'll who have more experience in graphics and algorithms than myself. I'm working on a project (For work, and I did get permission to ask around) that requires the smoothing of a large set of data.

The basics of is that there is a data set that is a 3D matrix filled with 'slowness' values which is basically 'When moving through this cell how many units do you move per time step?' Initially, these values are filled in with a interpolated gradient from some min at the top to some max at the bottom. Then, this data set has a function applied to it repeatedly that takes the data set and another set of inputs, and updates certain cells in the matrix to new values. The updated data set is then fed back into the algorithm again and again until it converges or some maximum number of iterations is achieved.

This is where the major problems begin. For the most part, the function expects that there is some degree of smoothness to the data set, IE- there is some finite, but unknown, allowed difference between values in adjacent cells for which the algorithm will produce meaningful data. This is due to the fact that there is no guarantee that each application of the function will touch the same cells, only that it should touch cells that are close to those touched in the prior step. Due to the fact that the starting estimation (The gradient) is so bad the updates result in large discrepancies between updated and unupdated cells. To help combat this a filtering operation, namely a smoothing, is applied to the the entire data set before it is fed back into the algorithm. Currently, the smoothing operation is a Sliding Windowed Mean with the window size that changes as the iterations continue and as the estimation becomes more exact. Typically it starts out at a window size that is the size of the data set with a slide of 1 (In each direction).

For the size of data set that we're typically work on, this takes a large portion of time (Think a data set that's about 100x50x50, with a sliding windowed mean of 100x50x50 and a slide of in the (x,y,z) direction). This has two major problems: First, it takes a long time to perform this operations, and Second, at least in early iterations there is a 'large' contribution to the value of cells from those that are relatively far away.


So, while it would be nice to figure out a better initial data set, IE- One that will result in a smaller degree in discrepancies between early applications of the function, I've been asked to work on the problem of making  the filtering itself more efficient. So many, I'm asking about various filters and whether anyone has any resources for the use of them in 3D (As in smooth across depth instead of just height and width) resources, anything about how to figure out when the application of a filter will result in little-to-no change (IE- figuring out when I can exclude a cell from having the filter applied to it) and things like that?
This wiki entry has had 1 revisions with contributions from 1 members. (more info)
Offline Roquen
« Reply #1 - Posted 2013-05-28 10:09:25 »

Are you describing some fixed problem...or is this a description how you think you want to address whatever your problem is?  If the problem is fixed, then your description if too vague.
Offline UprightPath
« Reply #2 - Posted 2013-05-28 10:44:19 »

Are you describing some fixed problem...or is this a description how you think you want to address whatever your problem is?  If the problem is fixed, then your description if too vague.
I posted it in the wrong place accidentally. That's what the big bold part at the top was for.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Roquen
« Reply #3 - Posted 2013-05-28 12:12:55 »

I don't understand...the bold says to move not delete the topic.
Offline Jono
« Reply #4 - Posted 2013-05-28 15:19:09 »

As the 100x50x50 block slides around you don't need to recalculate all the values. A simple optimisation is to only calculate the next 50x50 slice that it is shifting to then update the previous mean.

If you can afford to have a duplicate of the data for writing to then you can do better again. Just calculate the mean for whole block once for the very first position. Then replace the remaining cells with their contribution to  change in mean (the difference between their value and one 50 or 100 places distant, depending on their axis). Then you should be able to walk the block around using only 50 (or 100) calculations per step.
Offline philfrei
« Reply #5 - Posted 2013-05-29 00:37:08 »

Does a "sliding windowed mean" use each value in the window equally? I bet there is a nifty convolution filter algorithm that would work, where the "distant" values in the window are weighted a lot less than the center value. Unfortunately I don't have a link to specific convolution filters--this is an area I've only just touched on.

Another thing that might be interesting, check out the possibility of doing your interpolation via the "simplex" shape for 3D--an equilateral tetrahedron [edit: "a slightly skewed tetrahedron, six of which make a cube that has been squashed along its main diagonal."] The math for this is explained in the following excellent article by Stefan Gustavson:

http://staffwww.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf

The benefit of a simplex form is that one is interpolating from only four points, not six. This reduction could potentially do a lot to make a basic spacial setup more processing efficient.

"Greetings my friends! We are all interested in the future, for that is where you and I are going to spend the rest of our lives!" -- The Amazing Criswell
Offline UprightPath
« Reply #6 - Posted 2013-05-29 04:20:02 »

As the 100x50x50 block slides around you don't need to recalculate all the values. A simple optimisation is to only calculate the next 50x50 slice that it is shifting to then update the previous mean.

If you can afford to have a duplicate of the data for writing to then you can do better again. Just calculate the mean for whole block once for the very first position. Then replace the remaining cells with their contribution to  change in mean (the difference between their value and one 50 or 100 places distant, depending on their axis). Then you should be able to walk the block around using only 50 (or 100) calculations per step.
Does a "sliding windowed mean" use each value in the window equally? I bet there is a nifty convolution filter algorithm that would work, where the "distant" values in the window are weighted a lot less than the center value. Unfortunately I don't have a link to specific convolution filters--this is an area I've only just touched on.

Another thing that might be interesting, check out the possibility of doing your interpolation via the "simplex" shape for 3D--an equilateral tetrahedron [edit: "a slightly skewed tetrahedron, six of which make a cube that has been squashed along its main diagonal."] The math for this is explained in the following excellent article by Stefan Gustavson:
Yeah, I think I did a terrible job of explaining this (I blame coffee and the annoyance of waiting for Linux to update and then dealing with my laptop not liking it). Ahem!


Sliding windowed mean might not be the correct term for what's going on here. Basically, during the smoothing step, the average of some number of cells around each cell (For some rectangular prism of size X,Y,Z centered on the cell) becomes that cells value when it's fed back into the function. The contribution is currently the same for each, which means that corners/edges/sides of the prism have the same contribution a value that's directly adjacent. We're trying to think of a better way to do that already (Such as using a different contribution-coefficient based on distance). There's another optimization in the works where we compute the sums for rectangles from the origins, but that doesn't help so much with the variable contribution coefficients.

For the initial data set we're just using a simple linear interpolation from Zmin to Zmax, with each Z plane having a uniform value across its surface. I think that there's a polynomial version, but it still causes the values across a given Z-plane to be uniform.


So, I guess I should explain why (beyond the need for speed) optimizing the smoothing step is important. The initial data set that we use can either result in an eventual convergence of values in the data set (After some number of iterations the average change in values after the smoothing step is smaller than some number) or the data set ends up exploding (The average starts increasing or stays above the 'some number). If it explodes, we'd like to know early on (With as small a number of iterations as possible) so that we can tweak the initial data set and try again. And again, until we get something that might be worth the time it took.

So, currently we're doing that heavy duty, brute force method of smoothing that I described above. But, if we can get a smoothing operation that might take longer for a single iteration, but shorter than two or three, and results in a data set that's something like what we'd after two or three, or more, of the brute force smoothing operation we'd like to use that. The biggest problem there is that until we actually implement and test the smoother we can't really tell whether or not it results in something similar to the brute force method, something that converges quicker for our starting data set or just something that gives us bunch.

Offline Roquen
« Reply #7 - Posted 2013-05-29 07:40:12 »

Again...this sounds like something that you think you want to do rather than a real problem statement.  What do you really what to do?
Offline UprightPath
« Reply #8 - Posted 2013-05-29 10:51:40 »

Yes, probably complicating it with the explanation of why. What I was wondering is whether anyone has any suggestions about smoothing/blurring filters which can efficiently be extended into the third dimension?

Offline Roquen
« Reply #9 - Posted 2013-05-29 11:31:20 »

The real problem is:  you're not defining the real problem.  You can't really expect help getting to useful solution when the problem statement is effectively unknown.

So the best I can do is this:  You have some 3D data set in which you want a smooth value at some sample point.  If by smooth you mean first order continuous, then one solution is cubic Hermite interpolation..if second order continuous then a quintic Hermite interpolation.  Both are cheap and easy.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline UprightPath
« Reply #10 - Posted 2013-05-29 12:10:40 »

Sorry, to me the problem is well stated (But then I think it's a problem of me knowing what I mean, but sucking at explaining it).

I have a large 3D Matrix filled with discrete values that I would like to have the following characteristics at the end of each iteration of my algorithm:
1) For any two adjacent cells in this Matrix the absolute difference between values will be less than some threshold (Which I do not know).
2) That for any straight line sample of cells it is highly unlikely (Another threshold I don't know) that there is going to be some nice curve that will fit the data points.

The problem is that the first step in the algorithm updates a subset of the cells of the matrix with new values which are (hopefully) closer to the result I want. The new values are not always close to the old values, especially in early iterations. As such, the updated cells might break characteristic 1 and lines of cells containing multiple updated cells might break characteristic 2.

So, the second, very naive step, is to attempt to smooth/blur these changes so that a major change in a single cell is spread out across several. This is the step that I'm asking for help with.

Offline Roquen
« Reply #11 - Posted 2013-05-29 13:40:56 »

We're having faulty communication.  You keep describing parts of an algorithm...not the root problem itself.  So let's pretend like this was working.  What would this working thing do?
Offline UprightPath
« Reply #12 - Posted 2013-05-29 23:13:50 »

Yes, I am describing an algorithm. That is because I was tasked with solving only a small part of the problem rather than the whole thing. And while proposing a better way of solving the entire problem isn't a bad thing, it's outside of the scope of my work.

The big problem (Of which I'm working on a sub problem of) that of refining a discrete tomography model. From a starting model of density values (The 3D Matrix of discrete points), rays segments (which typically describes a curve) are traced from known start and end points and the time it would take is calculated. The calculated time is compared to the 'real' time and then the cells touched by the ray trace then have their densities updated so that, in theory, the next time that start-end set is traced the result will be closer to the the actual value. This is done for many sets of start-end points and when there is a conflict between adjustments it's just averaged out before it's applied. What this actually means is that often the next time the ray is traced it won't follow the exact same path, which is expected unless the model is close to real thing.

The ray segment tracing expects there to be some degree of 'smoothness' in the discrete model. If not, when its tracing and it passes from a high to a low or a low to a high value the next segment is shot off at a much steeper angle than we want. Since the update step only updates cells which are intersected by the tracing it can leave behind places where there is a large difference between adjacent cells.

So, we're currently treating the discrete model as an image, but one described by (X,Y,Z). Using this abstraction the densities become grayscale values where the min of the set is white and the max is black and the rest are spread out uniformly through the possible values. We'd like to keep the black pixels from being up against the white ones. So, we apply a blurring filter to the image to try to make the black pixels lighter, the white ones darker and then smooth the transition between them.

This blurring filter is what I'm asking for advice about.

Offline Jono
« Reply #13 - Posted 2013-05-30 09:08:34 »

If you use a rectangular filter (i.e. all cells are equally weighted) then just calculating the differences as the block shifts should be the easiest to program and reasonably efficient.

If you're doing a non-rectangular filter then philfrei's suggestion of performing it as a convolution is probably the right way. This requires a 3D Fourier transform, but a quick Google gives http://www.mathworks.com/matlabcentral/fileexchange/24504

Which should run in Octave if you don't have Matlab.
Offline UprightPath
« Reply #14 - Posted 2013-05-30 10:01:37 »

We're doing the rectangular filter right now (And it's being done through a Fortran program), but due to the poor quality of the starting data set (It looks very, very little like we want it to) we're using a large rectangular window. This allows up to spread out the changes from the initial few iterations to create a map that looks something like what we want.

I know this is sort of confusing the issue since I said we were just using it to smooth out the changes. But for at least the first few iterations it's more of a propagating operation. However, we're looking at trying to do this propagation through other types of filters since this one really ends up overwriting a lot of what happens in prior steps (Due to the large window). So what we'd like it several different filters at different points. So I'm looking for other filters that we can use that might provide us with other results.

Offline Roquen
« Reply #15 - Posted 2013-05-30 10:06:47 »

Excellent.  See I had it in my head that this was game related so that was throwing me off.  Okay, you have a set of experiments (input sets) from which you're building a 3D data set (output set) of impedance/index-of-refraction values (or something physically similar).  And you're attempting to iteratively converge on an output data set that matches the experiments.

I know absolutely zero about this subject matter, so take that in to account for my comments.

My first thought is that the medical imaging folks work on problems like this so I'd take a peek at that literature to start with.

Also it seems like you could use principal component analysis to generate an initial data set.

Another thought is to start with a smaller output data set, build up a solution, expand the size and repeat.

@convolution:  insanely expensive.

...opp's a reply...So you're currently using a Haar wavelet...is this what we mean by rectangular filter here?
Offline Jono
« Reply #16 - Posted 2013-05-30 12:59:00 »

I think we both mean rectangular function though in 3 dimensions. As a filter in the frequency domain that's ... sinc (thanks wikipedia!). This is also the Haar scaling function, so the roughly opposite effect of the Haar wavelet that turns discontinuities into high value spikes.

I'm still only thinking smoothing, but since the filter is compact you can process the data in overlapping pieces and ignore the edge values. I'd guess this would be in "run overnight" territory even for very large data sets, much less for less than a billion cells.


Offline UprightPath
« Reply #17 - Posted 2013-05-30 13:30:49 »

Excellent.  See I had it in my head that this was game related so that was throwing me off.  Okay, you have a set of experiments (input sets) from which you're building a 3D data set (output set) of impedance/index-of-refraction values (or something physically similar).  And you're attempting to iteratively converge on an output data set that matches the experiments.

Sorry Roquen, that's why I didn't post this in game questions (I meant to post it in General Discussion or somewhere similar). It's exactly what you just described.

Also, I think that my use of smoothing function has lead to a bit of confusion. Edge blurring would probably be a better description of it.

I might suggest the idea of applying the method to a smaller grid size for a couple of iterations then using a stretched/expanded version for the initial data set of the main version (Might help tell whether a given set of parameters will help).

The Principal Component Analysis might or might not work, but from my quick reading I'm not sure that it will (Since there is a high dependence between values). But then, it's something I haven't dealt with before so I could be flat out wrong about that assumption.

By rectangular, I mean that we're taking the average of a rectangular region and assigning that value to the center cell (For cases where the region extend outside of the data set, we only take the average of the region inside). At the start, we use a region that's about the size of the data set Which is why it destroys small artifacts and can lead to a very slow convergence if at all.

And the whole Haar Wavelet scaling thing? I have no idea. I haven't heard anyone refer to it as that, but then again the people I'm working with (Well, my employer) has mainly been working on the mentioned ray-tracing algorithm and the only mentions I've heard about other filters is that they wanted to look into using a Gaussian Blur method and a Tent Filter.

I'm still only thinking smoothing, but since the filter is compact you can process the data in overlapping pieces and ignore the edge values. I'd guess this would be in "run overnight" territory even for very large data sets, much less for less than a billion cells.
from what I understand a single early iteration is taking somewhere in the order of 20-30 minutes, but it's taking several iterations before we can tell whether the starting data lead to a convergence. Since a lot of it is just number crunching (It's run off a script that just chugs away at it until it's all done) we're hoping that we might be able to find a way to either make our current filter run faster (Optimize it or parallelize it) or find a different filter, or method of reconciling the results the update function with the prior data set, that results in a better data set after fewer iterations (Even if each iteration ends up taking longer).

Offline Roquen
« Reply #18 - Posted 2013-05-30 13:38:13 »

One quick option that comes to mind is to use fuzzy or greyscale mathematical morphology "close" on the data.
Offline Roquen
« Reply #19 - Posted 2013-05-30 15:49:50 »

I'm going to toss in a couple of comments that I intended to do two post back but forgot about. 

1) By "discreet" I'm assuming that the values are integers, perhaps even bytes, which map to some range of refraction values.
2) Although the data set isn't small, the individual extent in each dimension is.

These two together are leaning my thinking toward non-standard low-pass filtering techniques.  Besides there are plenty of resources on standard methods.  The first leans my brain toward integer-to-integer transforms and the second toward the fact it's probably undesirable to have a wide filter....even 5 is 10% of extent of 50.  I might be way off base on both counts.

Second disclaimer: I've super rusty on implementation of signal processing, but my comments have made the assumption that the reader as some experience.  Like when I mentioned Haar I meant a Haar low-pass filter and Jono made the logical assumption that I was talking about a Haar transform bank...and I was really thinking more along the lines of the S-transform (I think that's the right name).  Likewise for morphological "close"...descriptions are probably going to be a pair of transforms, where I'm thinking along the lines of "fake it".
Offline UprightPath
« Reply #20 - Posted 2013-05-31 00:49:41 »

I'm going to toss in a couple of comments that I intended to do two post back but forgot about. 

1) By "discreet" I'm assuming that the values are integers, perhaps even bytes, which map to some range of refraction values.
2) Although the data set isn't small, the individual extent in each dimension is.
1) Nope, I just was just using the word to help express the fact that while values are related, and you could probably fit a function to them, that they're to be taken as individual discrete points in the grid. The values are positive, non-zero floating point values (Unsure of the precision).
2) This is somewhat true, at least for our real world test set it. We typically run it at about 96x26x66 or something like that.

My only knowledge about signal processing is having written a parallel FFT. Which is to say I honestly don't know much about anything that you're currently talking about in that area.

Offline Jono
« Reply #21 - Posted 2013-05-31 08:55:53 »

One last shot on the basic sliding rectangle. Instead of smoothing the whole volume just apply the function when a cell's value is requested. Because the rays will be continuous you can still use some of the simple optimisations and slide the rectangle down the ray.

This would require three copies of the data though: the original, the sparse smoothed version, and the result.

The efficiency would then depend on how many cells are typically hit by rays between smoothing operations.
Offline Roquen
« Reply #22 - Posted 2013-05-31 11:32:34 »

Okay: not being an integer set mean the s-transform and related are not interesting.  My thought on "close" is that for early iterations a classic low-pass is likely to be helpful in convergence as well as preventing too great of impedance changes.  However when parts of the data set are close it will end up being counterproductive.  A 1D example of a close is to visualize the signal has a height field, then you roll a circle of some given radius over the surface of the field.  If the there are any sample point below the radius the samples value is increased to where it would touch the surface.  A circle is just an example structure element...so it removes holes that are smaller than the structuring elements.  (Don't bother looking at the wikipedia page).
Pages: [1]
  ignore  |  Print  
 
 

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

CogWheelz (18 views)
2014-07-30 21:08:39

Riven (24 views)
2014-07-29 18:09:19

Riven (15 views)
2014-07-29 18:08:52

Dwinin (12 views)
2014-07-29 10:59:34

E.R. Fleming (33 views)
2014-07-29 03:07:13

E.R. Fleming (12 views)
2014-07-29 03:06:25

pw (43 views)
2014-07-24 01:59:36

Riven (43 views)
2014-07-23 21:16:32

Riven (30 views)
2014-07-23 21:07:15

Riven (31 views)
2014-07-23 20:56:16
List of Learning Resources
by SilverTiger
2014-07-31 18:29:50

List of Learning Resources
by SilverTiger
2014-07-31 18:26:06

List of Learning Resources
by SilverTiger
2014-07-31 13:54:12

HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!