Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (477)
Games in Android Showcase (107)
games submitted by our members
Games in WIP (536)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 2 [3]
  ignore  |  Print  
  CottAGE goes LWJGL/JWS  (Read 11324 times)
0 Members and 1 Guest are viewing this topic.
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #60 - Posted 2004-07-27 22:08:49 »

Quote
Yes, that would be *really* interesting although I wouldn't know where to start Cheesy


Cool. Can you IM me? I'm on Yahoo as rreyelts.

Quote
I did have a look at a site which explains how to do static binary translation (so the original machine code is translated to java, and removing dead flags and such), and that seems to me the way to speed up CPU emulation the most, but I'm not sure (not knowing much about dynamic recompilation if anything)...


I think static retargetting would probably be ideal, but I think it's probably easier to do dynamic recompilation - it may also be more amenable to some of the tricks certain programs play.

Quote
When I look at the profiler output, CPU emulation is quite expensive (although it's only ~5-15% on my PC, but that's a quite fast PC), but video emulation is even more expensive in many games... I guess because of lots of array access.


Hmm... It sounds like you're not offloading any work onto the graphics processor. I guess that makes sense for the earlier programs, but I bet that the burden shifts in later systems/programs where you can discern graphics primitives calls.

Quote
When I would know how to do 'dynarec' or static binary translation, I could have a go at emulating an 68000.


You mean cause it would run too slow otherwise?

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Online erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #61 - Posted 2004-07-27 23:03:04 »

I sent you a PM.

Quote
I think static retargetting would probably be ideal, but I think it's probably easier to do dynamic recompilation - it may also be more amenable to some of the tricks certain programs play.


Yeah, I think you're probably right.

Quote
Hmm... It sounds like you're not offloading any work onto the graphics processor. I guess that makes sense for the earlier programs, but I bet that the burden shifts in later systems/programs where you can discern graphics primitives calls.


All graphics emulation is done in software inside the CPU. Only the final frame is rendered using openGL. I think there's a big speed gain as well although the current method might be more accurate.

Quote

You mean cause it would run too slow otherwise?


Well, look at it this way: The 68k has a *huge* number of instruction and it would not make sense to do every single instruction in a huge switch/case. It might make more sense to do it more programatically, and use dynamic recompilation to get some performance out of it.
I've seen one 68k emulator done in java (used in an Atari ST emulator) and that one was done by generating a java source with huge switch/cases. The source of that generated 68k emulator was about 4Mb large...

Offline swpalmer

JGO Coder




Where's the Kaboom?


« Reply #62 - Posted 2004-07-28 02:39:32 »

Quote
All graphics emulation is done in software inside the CPU. Only the final frame is rendered using openGL. I think there's a big speed gain as well although the current method might be more accurate.


The big advantage for graphics would be in emulating hardware sprites with OpenGL quads or something.  You could also emulate hardware scrolling registers and that sort of thing.  Even hardware playfield layers like the Amiga had.


For 68k emulation the trick would be to decompose the instruction word.. they have a fairly regular structure, so you could do a much smaller switch on the bits that define the instruction type, then extract the register IDs from the other bits etc...

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #63 - Posted 2004-07-28 03:07:17 »

Quote
The big advantage for graphics would be in emulating hardware sprites with OpenGL quads or something.  You could also emulate hardware scrolling registers and that sort of thing.  Even hardware playfield layers like the Amiga had.


Exactly.

Quote
For 68k emulation the trick would be to decompose the instruction word.. they have a fairly regular structure, so you could do a much smaller switch on the bits that define the instruction type, then extract the register IDs from the other bits etc...


This Z80 instruction set is a farking nightmare in that regard. No consistency whatsoever from an opcode point of view. Anyway, I'm still not sure why people don't dispatch via an array of interfaces instead, i.e:

interface OpcodeHandler {
 public void execute();
}

OpcodeHandler[] opcodeHandlers = new OpcodeHandler[] {
 new Op0Handler(),
 new Op1Handler(),
 new Op2Handler(),
 ...
};

public void execute() {
 while ( true ) {
    int opcode = readOp();
    opcodeHandlers[ opcode ].execute();
 }
}

Erik had to already pull all of the inline code out of the switch statements into functions (it performed worse), so I can't see how an array dereference + virtual lookup is going to be so much more expensive than a switch. Especially considering that the dynamic recompiler is going to strive to minimize these loop iterations anyway.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline swpalmer

JGO Coder




Where's the Kaboom?


« Reply #64 - Posted 2004-07-28 03:16:01 »

Quote
...so I can't see how an array dereference + virtual lookup is going to be so much more expensive than a switch.


Theoretically, the switch could be implemented internally by the compiler as a lookup anyway.  It might even be able to optimize the case for no match so that it overlaps with the time that would have been taken to do the array bounds check in your example.

Practically speaking this is worth profiling in a real-world case on different VMs.  And you can of course disassemble the bytecode to see what javac does with it.  Maybe it isn't so easy to optimize with the JIT compiler.

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #65 - Posted 2004-07-28 03:42:56 »

Quote
Theoretically, the switch could be implemented internally by the compiler as a lookup anyway.

I don't know what you mean. The instruction is tableswitch. It is indeed a table lookup. It's part of the VM spec. The JIT implementation should be straightforward. The point is that you still pay a branch penalty (what really hurts the modern processors today), and you still pay a function call penalty, because you end up outlining all of the code anyway.

One of the greatest benefits of dynarec is that you don't pay a branch penalty per instruction anymore.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Online erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #66 - Posted 2004-07-28 08:12:06 »

Quote
Erik had to already pull all of the inline code out of the switch statements into functions (it performed worse)

Yeah, manual inlining was a stupid thing to do although on the MSVM it did got faster that way and pulling the inline code caused a performance hit on that VM.

Quote
For 68k emulation the trick would be to decompose the instruction word.. they have a fairly regular structure, so you could do a much smaller switch on the bits that define the instruction type, then extract the register IDs from the other bits etc...

Yes, that is the smarter but slower way to do it. I started an 68k emulator that way once but at the time I wouldn't be able to get it fast enough on my machine I had then.

Quote
The big advantage for graphics would be in emulating hardware sprites with OpenGL quads or something.  You could also emulate hardware scrolling registers and that sort of thing.  Even hardware playfield layers like the Amiga had.


Yes, a major speed gain would be possible.
The thing is when you use filtering that you can see the edges of the sprites/characters which doesn't look very good.
i.e. when 2 sprites or characters are directly next to eachother, the pixels of the outer edge of one quad are not filtered with the pixels next to them from the other quad so you see a sharp line between them.
So I think for best looking results, I should not render the quads directly, but blit them to another texture first (one texture per video layer). It's all doable, but not really trivial to rewrite the current renderer that way.
If only bounds check removal would be implemented (in the client), that would cause a huge performance boost as well.

Online erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #67 - Posted 2004-07-28 09:58:10 »

Quote

Anyway, I'm still not sure why people don't dispatch via an array of interfaces instead, i.e:


The 1st version of the 6809 emulator was written like that. We went for a switch and it got faster. The performance gain was worth the uglier code.

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #68 - Posted 2004-07-28 15:23:02 »

Quote
If only bounds check removal would be implemented (in the client), that would cause a huge performance boost as well.

Why can't people run with server? Is this not an option for a webstarted app?

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #69 - Posted 2004-07-28 15:26:00 »

Quote
The 1st version of the 6809 emulator was written like that. We went for a switch and it got faster. The performance gain was worth the uglier code.

Was that before or after you outlined all of your case code? I have a feeling you might not see much difference with all of the code outlined now.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline swpalmer

JGO Coder




Where's the Kaboom?


« Reply #70 - Posted 2004-07-28 15:34:35 »

Quote

I don't know what you mean. The instruction is tableswitch. It is indeed a table lookup.

Well I don't know much about the VM spec/bytecodes.. but I was assuming that a table lookup, and by that I meant using the switch value as an index into a table of function pointers, would be a bad idea in some cases, as the switch values could be very sparse and the table size would need to be 4 gig.

Quote
The point is that you still pay a branch penalty (what really hurts the modern processors today), and you still pay a function call penalty, because you end up outlining all of the code anyway.

So you are saying that there are effectively two branches in the switch case?  One in the switch decision and another that is part of the function call? (the call being unconditional, so not nearly as bad for the processor's instruction pipeline)
So the theory is that the array lookup only has the one branch for the call that is always taken.  If I have that right, I was supposing that the JIT of the 'tableswitch' would end up producing machine code that effectively did the same thing, given the nature of the switch statement it could be a special case that the JIT compiler was able to convert into the native equivalent of the array of interfaces.


Quote
One of the greatest benefits of dynarec is that you don't pay a branch penalty per instruction anymore.

Yep, obviously JIT compiling is the proof that it works.  The emulator after all is nothing more than a VM with the virtual machine opcodes (byte code) being instruction codes of an actual processor. I guess that makes it a NVM (non-virtual machine)?

The scary thing is dealing with the practice of self-modifying code in an environment with dynamic re-compilation.  Something that was fairly common in that day.  I guess arcade machine ROM based programs would have less of that, so you are at an advantage there.

Online erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #71 - Posted 2004-07-28 16:18:59 »

Quote
I guess that makes it a NVM (non-virtual machine)?

:-)
Well, the machine is still virtual as it's software. The only difference is that the machine also exists for real.

Quote
I guess arcade machine ROM based programs would have less of that, so you are at an advantage there.

Yes, you could simply only JIT the code which is in ROM and always interpret code in RAM.

Offline fbi

Senior Newbie





« Reply #72 - Posted 2004-07-29 07:06:29 »

Dear Erikd,
1943 simply rocks!  Grin
We spent some time in my lab to re-play it  Wink
Very cool job  Cool

PhD Student@Virtual ArtificiaL Intelligent Systems (VALIS) Laboratory

Fatti non foste a viver come bruti...ma per seguir vertute e canoscenza.
Offline princec

JGO Kernel


Medals: 343
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #73 - Posted 2004-07-29 10:22:15 »

Sounds like it will all get far too complex for a very very minimal gain...

Cas Smiley

Online erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #74 - Posted 2004-07-29 21:02:11 »

Well, with dynarec you will get rid of loads of branches and I think hotspot will have a much easier job removing dead flags and other optimizations. I think it will make a difference, although the difference might not be as big as when doing it in C (static binary translation in C with dead flag removal has been reported to get as much as 25-30x faster than interpretation.).
Maybe for the Z80 emulation, it might not be too important on recent PC's since a Z80 doesn't run on high clock speeds anyway. But once it works, the same technique can be applied to 68k emulation and there it will matter.

There are of course more optimizations possible in other areas (mainly rendering), but I'm looking at that as well.

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #75 - Posted 2004-07-30 02:17:43 »

Quote
Sounds like it will all get far too complex for a very very minimal gain...

I don't think you know what you're talking about Cas. You translate X number of impossible to optimize interpreter loops into a single optimizable function. That should be a huge win. I could easily see it running 10x faster. As far as it being "far too complex", I already almost have a working prototype ready.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline princec

JGO Kernel


Medals: 343
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #76 - Posted 2004-07-30 10:03:12 »

I sure don't know Smiley But for the 8bit chips it's certainly a wasted effort... it ain't going to run any faster than 60Hz.

Cas Smiley

Online erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #77 - Posted 2004-07-30 11:19:26 »

Well, there are arcades which run as much as 5 Z80's simultaneously. If we can keep the system requirements down that would be welcome.
And once we (well, Toby that is :-)) got it working, the same technique can be applied to faster CPU's and there it will surely be important.

Offline princec

JGO Kernel


Medals: 343
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #78 - Posted 2004-07-30 13:16:55 »

Cool. Who can explain the techniques in laymans' terms? I've always been massively interested in this stuff. I wrote my first VM over 20 years ago!

Cas Smiley

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #79 - Posted 2004-07-30 17:05:32 »

Quote
Cool. Who can explain the techniques in laymans' terms?


Immediately after the ROM is loaded, I walk the Z80 machine instructions looking for a window of instructions that is uninterrupted by jumps. Once I have that window established, I generate Java bytecode that is the equivalent to what the interpreter would run, given the same sequence of instructions. I then replace the first machine instruction of that window in ROM with a "dynarecTrap instruction" which has an opcode that falls outside the valid set of Z80 opcodes. The instruction contains a pointer to the Java bytecode I just generated. So, when the interpreter is running, if it sees a dynarecTrap, it runs the corresponding Java bytecode, instead of looping the series of machine instructions it would have normally.

This should optimize out really well, because it takes a set of instructions that would have normally required a dynamic branch (for each instruction) to execute, and instead places them back to back, where you can, all of the sudden, perform local optimizations like inlining, dead flag detection, redundant instruction collapsing (i.e. multiple updates to the PC or cycle counter), etc...

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline princec

JGO Kernel


Medals: 343
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #80 - Posted 2004-07-30 22:24:55 »

Ahhh, I get you. Sounds like it could be incredibly fast indeed.

Cas Smiley

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #81 - Posted 2004-07-31 04:23:58 »

Quote
Ahhh, I get you. Sounds like it could be incredibly fast indeed.

Well, it turns out I was a little dumb. I learned that there was no way to distingush between program code and data in the ROM unless I actually execute the code and follow the jumps. *sigh*

It's the same principle I outlined before - it's just that the work now has to be deferred until runtime, like Hotspot, instead of load time.

This is farking buttloads easier for something like JET, that can know which classfile data is actually bytecode instructions.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline princec

JGO Kernel


Medals: 343
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #82 - Posted 2004-07-31 14:24:38 »

Maybe you can flag 256 byte or 1k pages in memory as being recently written to and upon attempting to execute code in such a page, run it through the compiler?

Cas Smiley

Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #83 - Posted 2004-08-10 15:32:48 »

Quote
Maybe you can flag 256 byte or 1k pages in memory as being recently written to and upon attempting to execute code in such a page, run it through the compiler?


I think there's some miscommunication.

The point I'm trying to make, is that I can't translate the machine code, unless I can follow the jumps. This is because the bits immediately following a jump may be anything - data/garbage/whatever. Once I hit that, there's no way to continue. I considered trying to statically follow the jumps, but this seems nearly impossible for half the jumps, which do things like jump to an address stored in a register. (You can't even do clever stuff, like look for a CALL before a RET, because the crazy programmer may have done something insane like do a RET without having ever executed a CALL). So the only recourse I have is to track instructions as they are executed.

This is annoying compared to something like a Java class file, where there is no difficulty in understanding what parts are data and what parts are bytecode. It gets even nicer than that, because Java mandates that you do have to be able to statically analyse where a program can jump to.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Pages: 1 2 [3]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Riven (20 views)
2014-07-29 18:09:19

Riven (13 views)
2014-07-29 18:08:52

Dwinin (12 views)
2014-07-29 10:59:34

E.R. Fleming (31 views)
2014-07-29 03:07:13

E.R. Fleming (12 views)
2014-07-29 03:06:25

pw (42 views)
2014-07-24 01:59:36

Riven (42 views)
2014-07-23 21:16:32

Riven (28 views)
2014-07-23 21:07:15

Riven (29 views)
2014-07-23 20:56:16

ctomni231 (60 views)
2014-07-18 06:55:21
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!