Refine fast jit frontend translation of opcode br_if and br_table:
for br_if, no need to clear jit frame after handling new basic block,
so as to re-use registers of current basic block,
for br_table, no need to create a new basic block to jump if there is
no parameters/results to copy to new block, just jumping to current
existing basic block.
Move jit spill cache to the end of interp frame to reduce footprint
Fix codegen compare float issue: should not overwritten the source registers
Fix float to int conversion check integer overflow issue
Unify the float compare
Fix get_global issue
- use native functions to do f.eq and f.ne
- only use ZF=0 and CF=0 to do f.lt and f.gt
- only use CF=0 to do f.le and f.ge
could use comiss and setCC to replace comiss and jmpCC
be able to pass f32_cmp and f64_cmp
```
cmp_eq:
xor eax, eax
ucomisd xmm0, xmm1
mov edx, 0
setnp al
cmovne eax, edx
ret
cmp_ne:
xor eax, eax
ucomisd xmm0, xmm1
mov edx, 1
setp al
cmovne eax, edx
ret
```
In one instruction, if one or multiple operands tending to lock some
hardware registers in IR phase, like EAX, EDX for DIV, ECX for SHIFT,
it leads to two known cases.
case 1: allocate VOID
`SHRU i250,i249,i3`. if pr_3 was allocated to vr_249 first, incoming
allocation of vr_3 leads a spill out of `vr_249` and clear the value
of `vr->hreg` of vr_249. When applying allocation result in FOREACH
in L732, a NULL will be assigned to.
case 2: unexpected spill out
`DIV_U i1,i1,i44`. if allocation of vr_44 needs to spill out one
hardware register, there is a chance that `hr_4` will be selected.
If it happens, codegen will operate EDX and overwrite vr_44 value.
The reason of how `hr_4` will be spilled out is a hidden bug that
both information of `rc->hreg[]` and `rc->vreg` can be transfered
from one block to the next one. It means even there is no vr binds
to a hr in current block, the hr may still be thought as a busy one
becase of the left infroamtion of previous blocks
Workaround for cases:
- Add `MOV LOCKED_hr LOCKED_hr` just after the instruction. It prevents
case 1
- Add `MOV LOCKED_hr LOCKED_hr` just before the instruction. It prevents
case 2
Implement bitwise 64-bit operations in codegen
Fix and refine shift IRs
Zero local variables
Remove ref-type/bulk-memory macros
Implement set aux stack
Refine clear mem registers
Translate WASM_OP_CALL into JIT IR in the frontend, and translate
JIT_OP_CALLBC and JIT_OP_CALLNATIVE in the backend.
For calling wasm native API, simply call wasm_interp_call_func_native
to reduce the complexity.
And fix some issues, including wasm loader, frontend, register allocator,
and code gen.