Okay so you guys don't like the NetBeans profiler, can you suggest me another one that is good and doesn't cost money? I would run the test again on it. I am just using the netbeans one since it's easy to use and integrates into my project, it seemed to work fine for everything I used it.
I found this post with the profiling results for theora C version and it seems similar to my results:
http://osdir.com/ml/multimedia.ogg.theora.devel/2004-02/msg00078.htmlThe method thats taking all the time... is the method that calls the iDCT (Hence DCT in the name)
No you're wrong. Here's the method ReconInterHalfPixel2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| public static final void ReconInterHalfPixel2(short[] ReconPtr, int idx1, short[] RefPtr1, int idx2, short[] RefPtr2, int idx3, short[] ChangePtr, int LineStep ) { int coff=0, roff1=idx1, roff2=idx2, roff3=idx3, i;
for (i = 0; i < 8; i++ ){ ReconPtr[roff1+0] = clamp255(((RefPtr1[roff2+0] + RefPtr2[roff3+0]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+1] = clamp255(((RefPtr1[roff2+1] + RefPtr2[roff3+1]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+2] = clamp255(((RefPtr1[roff2+2] + RefPtr2[roff3+2]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+3] = clamp255(((RefPtr1[roff2+3] + RefPtr2[roff3+3]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+4] = clamp255(((RefPtr1[roff2+4] + RefPtr2[roff3+4]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+5] = clamp255(((RefPtr1[roff2+5] + RefPtr2[roff3+5]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+6] = clamp255(((RefPtr1[roff2+6] + RefPtr2[roff3+6]) >> 1) + ChangePtr[coff++]); ReconPtr[roff1+7] = clamp255(((RefPtr1[roff2+7] + RefPtr2[roff3+7]) >> 1) + ChangePtr[coff++]); roff1 += LineStep; roff2 += LineStep; roff3 += LineStep; } } |
It doesn't look like a DCT to me.
I found where it was used in ExpandBlock and the comment says this:
/* Fractional pixel reconstruction. */
/* Note that we only use two pixels per reconstruction even for
the diagonal. */
just talked to some of the theora guys on irc. It was suggested that testing the full Cortado would give pretty messy profiling results as its multi threaded with a bunch of complicated locks.
Okay but I am not using the cortado one, I am not allowed to use it since it's under the GPL, I am just using Jheora that comes with it. Also I profiled with root method being the video decode function, so even if those locks were there, their effects would not be included in the results.
EDIT: Okay I asked my friend to profile an HD 720p video on his mac, using YourKit Java profiler. Here are the results:
YUVConv - 24%
loadFrame - 12%
LoopFilter (deblocking) - 12%
ReconInterHalfPixel2 - 9%
ReconInter - 7%
IDct1 - 3%
IDct10 - 3%
IDctSlow - 3%
He's using Mac and the java on the mac is probably not that good in optimizing as the Sun one, that might explain the differences. Also like you said maybe its the profiler. I don't have YourKit profiler but I am gonna get it tomorrow and test this again.