Update build wasm app document, add how to set buildflags for Rust
project to reduce the footprint.
Clear Windows warnings and a shadow warning in aot_emit_numberic.c
Refine the generated LLVM IRs at the beginning of each LLVM AOT/JIT function
to fasten the LLVM IR optimization:
- Only create argv_buf if there are func calls in this function
- Only create native stack bound if stack bound check is enabled
- Only create aux stack info if there is opcode set_global_aux_stack
- Only create native symbol if indirect_mode is enabled
- Only create memory info if there are memory operations
- Only create func_type_indexes if there is opcode call_indirect
Add a new options to control the native stack hw bound check feature:
- Besides the original option `cmake -DWAMR_DISABLE_HW_BOUND_CHECK=1/0`,
add a new option `cmake -DWAMR_DISABLE_STACK_HW_BOUND_CHECK=1/0`
- When the linear memory hw bound check is disabled, the stack hw bound check
will be disabled automatically, no matter what the input option is
- When the linear memory hw bound check is enabled, the stack hw bound check
is enabled/disabled according to the value of input option
- Besides the original option `--bounds-checks=1/0`, add a new option
`--stack-bounds-checks=1/0` for wamrc
Refer to: https://github.com/bytecodealliance/wasm-micro-runtime/issues/1677
Currently we initialize and destroy LLVM environment in aot_create_comp_context
and aot_destroy_comp_context, which are called in wasm_module_load/unload,
and the latter may be invoked multiple times, which leads to duplicated LLVM
initialization/destroy and may result in unexpected behaviors.
Move the LLVM init/destroy into runtime init/destroy to resolve the issue.
Add macro WASM_ENABLE_WORD_ALING_READ to enable reading
1/2/4 and n bytes data from vram buffer, which requires 4-byte addr
alignment reading.
Eliminate XIP AOT relocations related to the below ones:
i32_div_u, f32_min, f32_max, f32_ceil, f32_floor, f32_trunc, f32_rint
The general optimizations may create some intrinsic function calls
like llvm.memset, so we move indirect mode optimization after them
to remove these function calls at last.
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
Translate call_indirect opcode by calling wasm functions with Fast JIT IRs instead of
calling jit_call_indirect runtime API, so as to improve the performance.
Translate call native function process with Fast JIT IRs to validate each pointer argument
and convert it into native address, and then call the native function directly instead
of calling jit_invoke_native runtime API, so as to improve the performance.
Refactor LLVM JIT for some purposes:
- To simplify the source code of JIT compilation
- To simplify the JIT modes
- To align with LLVM latest changes
- To prepare for the Multi-tier JIT compilation, refer to #1302
The changes mainly include:
- Remove the MCJIT mode, replace it with ORC JIT eager mode
- Remove the LLVM legacy pass manager (only keep the LLVM new pass manager)
- Change the lazy mode's LLVM module/function binding:
change each function in an individual LLVM module into all functions in a single LLVM module
- Upgraded ORC JIT to ORCv2 JIT to enable lazy compilation
Refer to #1468
Refactor the layout of interpreter and AOT module instance:
- Unify the interp/AOT module instance, use the same WASMModuleInstance/
WASMMemoryInstance/WASMTableInstance data structures for both interpreter
and AOT
- Make the offset of most fields the same in module instance for both interpreter
and AOT, append memory instance structure, global data and table instances to
the end of module instance for interpreter mode (like AOT mode)
- For extra fields in WASM module instance, use WASMModuleInstanceExtra to
create a field `e` for interpreter
- Change the LLVM JIT module instance creating process, LLVM JIT uses the WASM
module and module instance same as interpreter/Fast-JIT mode. So that Fast JIT
and LLVM JIT can access the same data structures, and make it possible to
implement the Multi-tier JIT (tier-up from Fast JIT to LLVM JIT) in the future
- Unify some APIs: merge some APIs for module instance and memory instance's
related operations (only implement one copy)
Note that the AOT ABI is same, the AOT file format, AOT relocation types, how AOT
code accesses the AOT module instance and so on are kept unchanged.
Refer to:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/1384
Lookup table for i32.const and i64.const for xtensa XIP
Lookup const offset from table for load/store opcodes for xtensa XIP
Fill capability flags for xtensa XIP
Enable lower switch pass for xtensa XIP
Since legacy binding for loop unswitch pass was removed and we can't get
it back. Implement its equivalent in `aot_llvm_extra.cpp` and use it in
`aot_compiler.c`.
Follow up to #1183.
Fix the issue reported in #1282.
When i32/i64 rotate (rotl/rotr) with 0, the LLVM IRs translated are:
left<<0 | left>>64 and left >>0 | left<<64
The value of left >> 64 and left <<64 in LLVM are treated as poison,
which causes invalid result when executing the aot function.
Directly return left when right is 0 to fix the issue.
Enable aot compiler and jit based on llvm-14.0 and llvm-15.0git,
replace LLVMBuildLoad/LLVMBuildInBoundsGEP/LLVMBuildCall with
LLVMBuildLoad2/LLVMBuildInBoundsGEP2/LLVMBuildCall2, and pass
them with related types, so as to meet the requirements of opaque
pointers.
And fix several compilation errors for llvm-14.0/15.0git.
Most spec cases and standalone cases are tested.
Support integrating 3rd-party toolchain llc compiler or asm compiler
into wamrc by setting environment variable WAMRC_LLC_COMPILER
or WAMRC_ASM_COMPILER, wamrc will use these tools to generate
object file from LLVM IR firstly, and then refactor the object file into
aot file.
Automatically dump memory/performance profiling data in
wasm_application_execute_main and wasm_application_execute_func when
the related feature is enabled.
And remove unused aot_compile_wasm_file func declaration in aot_compiler.h.
wasm_c_api.c: add more checks, fix LOG_WARNING invalid specifier
aot_emit_aot_file: fix strncpy max size length to copy
posix.c: fix potential socket not close issue
wasm-c-api samples: add return value checks for fseek/ftell
cJSON.c: remove dead code
module_wasm_app.c: add return value check for wasm_runtime_call_wasm
aot_runtime.c: add return value check for aot_get_default_memory
aot_runtime.c: add return value check before calling wasm app malloc/free func
wasm_runtime_common.c: fix dead code warning in wasm_runtime_load_from_sections
aot_emit_memory.c: fix potential integer overflow issue
wasm_runtime.c: remove dead code in memory_instantiate, add assertion for globals
samples simple/gui/littlevgl: fix fields of struct sigaction initialization issue
host-tool: add return value check for sendto
Fix the symbol resolving failure with recent version of wamrc:
```
AOT module load failed: resolve symbol .Lswitch.table.aot _func#82.2 failed
```
Replace the relocations for such symbols with .rodata section.
Refine opcode br_table for classic interpreter as there may be a lot of
leb128 decoding when the br count is big:
1. Use the bytecode itself to store the decoded leb br depths if each
decoded depth can be stored with one byte
2. Create br_table cache to store the decode leb br depths if the decoded
depth cannot be stored with one byte
After the optimization, the class interpreter can access the br depths array
with index, no need to decode the leb128 again.
And fix function record_fast_op() return value unchecked issue in source
debugging feature.
Add aot relocation for ".rodata.str" symbol to support more cases
Fix some coding style issues
Fix aot block/value stack destroy issue
Refine classic/fast interpreter codes
Clear compile warning of libc_builtin_wrapper.c in 32-bit platform
Fix handle OP_TABLE_COPY issue
Fix loader handle OP_BLOCK/IF/LOOP issue if type_index is larger than 256
Fix loader handle OP_GET_GLOBAL, allow to change to GET_GLOBAL_64 for
aot compiler similiar to handling OP_SET_GLOBAL
Refine loader handle OP_GET/SET/TEE_LOCAL, disable changing opcode when
source debugging is enabled, so as no need to record the change of opcode
Refine wasm_interp_interp_frame_size to reduce the wasm operand stack usage
Signed-off-by: Wenyong Huang <wenyong.huang@intel.com>
When calling native function from AOT code, current implementation is to return
back to runtime to call aot_invoke_native, which calls wasm_runtime_invoke_native
and the latter calls assembly code. We did it before as there may be pointer and
string arguments to check and convert if the native function's registered signature
has character '*' and '$'.
As the built-in native function's signatures can be gotten in compilation time, we
check the pointer/string arguments and convert them into native address in AOT
code, and then invoke the native function directly, so as to improve performance.
Use LLVM new pass manager for wamrc to replace the legacy pass manger,
so as to gain better performance and reduce the compilation time.
Reference links:
- https://llvm.org/docs/NewPassManager.html
- https://blog.llvm.org/posts/2021-03-26-the-new-pass-manager
And add an option to use the legacy pm mode when building wamrc:
cmake .. -DWAMR_BUILD_LLVM_LEGACY_PM=1
For JIT mode, keep it unchanged as it only runs several function passes and
using new pass manager will increase the compilation time.
And refactor the codes of applying LLVM passes.
Refactor LLVM Orc JIT to actually enable the lazy compilation and speedup
the launching process:
https://llvm.org/docs/ORCv2.html#laziness
Main modifications:
- Create LLVM module for each wasm function, wrap it with thread safe module
so that the modules can be compiled parallelly
- Lookup function from aot module instance's func_ptrs but not directly call the
function to decouple the module relationship
- Compile the function when it is first called and hasn't been compiled
- Create threads to pre-compile the WASM functions parallelly when loading
- Set Lazy JIT as default, update document and build/test scripts
Currently when calling wasm_runtime_call_wasm() to invoke wasm function
with externref type argument from runtime embedder, developer needs to
use wasm_externref_obj2ref() to convert externref obj into an internal ref
index firstly, which is not convenient to developer.
To align with GC feature in which all the references passed to
wasm_runtime_call_wasm() can be object pointers directly, we change the
interface of wasm_runtime_call_wasm() to allow to pass object pointer
directly for the externref argument, and refactor the related codes, update
the related samples and the document.
Put Vectorize passes before GVN/LICM passes as normally the former
gains more performance improvement and the latter might break the
optimizations for the former. Can improve performance of several
sightglass cases.
And don't check exception throw after calling an AOT function if it is
and recursive call, similar to handing of Spec tail call opcode.
Fix some issues on MacOS platform
- Enable libc-wasi by default
- Set target abi to "gnu" if it is not set for wamrc to avoid generating
object file of unsupported Mach-O format
- Set `<vendor>-<sys>` info according to target abi for wamrc to support
generating AOT file for other OSs but not current host
- Set cpu name if arch/abi/cpu are not set to avoid checking SIMD
capability failed
- Set size level to 1 for MacOS/Windows platform to avoid relocation type
unsupported warning
- Clear posix_memmap.c compiling warning
- Fix spec case test script issues, enable test spec cases on MacOS
Signed-off-by: Wenyong Huang <wenyong.huang@intel.com>
Use `PRIxxx` related macros to format the output strings so as to clear
compile warnings, e.g. PRIu32, PRId32, PRIX32, PRIX64 and so on.
And add the related macro definitions in platform_common.h if they
are not defined, as some compilers might not support them.