Yes I can tell you the grotty details if you like....
Those are the ones i prefer, so please, yes.
Ahhh.. it's a big story. I'll make shortcuts. You can ask for more details.
Background: it's a desktop or server environment, so pre-emptive multi-threaded is normal, not cooperative scheduling. Blocking native I/O. Precise & moving GC (conservative non-moving GC allows for faster JNI calls). Many of these steps do not apply in an embedded system, or one with cooperative suspension, or single-threaded applications, or single CPU or the native compiler is known & trusted, or the native call promises not to block (or at least statically known to block or not) etc...
Since a native call might block or otherwise take a long time, and since other threads are running and allocating, a GC must be allowed while the thread is in native code. A moving GC makes the mutators more efficient - but there's no way to find & move the pointers in held by the native code/compiler. So you don't hand out any raw heap pointers, you hand out
Handles. The native code is never allowed touch a heap pointer lest an ill-timed GC move things the native code is also touching, crashing the native code. Upon return from the native code a GC *may* be in progress. The regular Java code threads have been stopped for the GC, but the threads in native code have not - so they have to do a lock-like thing to assure that a GC is not in progress (or doesn't start up). Basically, you must CAS on exit from the native or you have a race condition where a GC might think all Java threads have been stopped & start moving objects, and a thread returning from a native (where GC is ok) into Java code (where GC is not) starts touching objects.
Also, you have to be able to find all the objects - including those passed as arguments on the stack and Handlized. Generally this is done by maintaining a mapping between PC's and which registers hold oops - but HotSpot doesn't know where the native compiler will put the JNI call's PC. So we need the return PC (return from the native code back into the wrapper code) jammed down before we act "as if" we're in native code and allow a GC. Same for the stack pointer.
Finally, argument calling conventions between JIT'd code and native code are usually different - the JIT doesn't need var-args support, or legacy calling-convention support, or a JNIEnv. Putting it together we get:
push a frame
store any objects passed in registers down (prior to handlizing)
if a sync method, lock
make a copy of the arguments:
add the JNIEnv argument
copy the rest from JIT convention to native convention (generally reg-reg moves on a RISC, stack-stack moves on X86)
wrap a handle around objects (requires a null-test/branch)
store the return-pc somewhere (generally thread-local storage)
maybe memory fence, in case GC is somewhat concurrent
store the sp nearby - which also allows a GC, and requires the objects & PC be coherent in memory
do the native call
reclaim the GC lock (generally a CAS) or block
if sync, unlock
if returning object, de-handlize
pop frame
return
Clear as mud, I hope?