Enhance the GC subtyping checks:
- Fix issues in the type equivalence check
- Enable the recursive type subtyping check
- Add a equivalence type flag in defined types of aot file, if there is an
equivalence type before, just set it true and re-use the previous type
- Normalize the defined types for interpreter and AOT
- Enable spec test case type-equivalence.wast and type-subtyping.wast,
and enable some commented cases
- Enable set WAMR_BUILD_SANITIZER from cmake variable
- Add new API wasm_runtime_load_ex() in wasm_export.h
and wasm_module_new_ex in wasm_c_api.h
- Put aot_create_perf_map() into a separated file aot_perf_map.c
- In perf.map, function names include user specified module name
- Enhance the script to help flamegraph generations
Fix the warnings and issues reported:
- in Windows platform
- by CodeQL static code analyzing
- by Coverity static code analyzing
And update CodeQL script to build exception handling and memory features.
This reverts commit 0e8d949440.
Because it doesn't make much sense anymore after we disabled debug info
processing on C++ functions in:
"aot debug: process lldb_function_to_function_dbi only for C".
The symbols in windows 32-bit may start with '_' and can not be found
when resolving the relocations to them. This PR ignores the underscore
when handling the relocation name of AOT_FUNC_INTERNAL_PREFIX, and
redirect the relocation with name "_aot_stack_sizes" to the relocation with
name ".aot_stack_sizes" (the name of the data section created).
ps.
https://github.com/bytecodealliance/wasm-micro-runtime/issues/3216
The stack profiler `aot_func#xxx` calls the wrapped function of `aot_func_internal#xxx`
by using symbol reference, but in some platform like xtensa, it’s translated into a native
long call, which needs to resolve the indirect address by relocation and breaks the XIP
feature which requires the eliminating of relocation.
The solution is to change the symbol reference into an indirect call through the lookup
table, the code will be like this:
```llvm
call_wrapped_func: ; preds = %stack_bound_check_block
%func_addr1 = getelementptr inbounds ptr, ptr %func_ptrs_ptr, i32 75
%func_tmp2 = load ptr, ptr %func_addr1, align 4
tail call void %func_tmp2(ptr %exec_env)
ret void
```
Implement the GC (Garbage Collection) feature for interpreter mode,
AOT mode and LLVM-JIT mode, and support most features of the latest
spec proposal, and also enable the stringref feature.
Use `cmake -DWAMR_BUILD_GC=1/0` to enable/disable the feature,
and `wamrc --enable-gc` to generate the AOT file with GC supported.
And update the AOT file version from 2 to 3 since there are many AOT
ABI breaks, including the changes of AOT file format, the changes of
AOT module/memory instance layouts, the AOT runtime APIs for the
AOT code to invoke and so on.
This increases the chance to use "short" calls.
Assumptions:
- LLVM preserves the order of functions in a module
- The wrapper function are smaller than the wrapped functions
- The target CPU has "short" PC-relative variation of call/jmp instructions
and they are preferrable over the "long" ones.
A motivation:
- To avoid some relocations for XIP, I want to use xtensa PC-relative
call instructions, which can only reach ~512KB.
When the original wasm contains multiple compilation units, the current
logic uses the first one for everything. This commit tries to use a bit more
appropriate ones.
The content in custom name section is changed after loaded since the strings
are adjusted with '\0' appended, the emitted AOT file then cannot be loaded.
The PR disables changing the content for AOT compiler to resolve it.
And disable emitting custom name section for `wamrc --enable-dump-call-stack`,
instead, use `wamrc --emit-custom-sections=name` to emit it.
Allow to invoke the quick call entry wasm_runtime_quick_invoke_c_api_import to
call the wasm-c-api import functions to speedup the calling process, which reduces
the data copying.
Use `wamrc --invoke-c-api-import` to generate the optimized AOT code, and set
`jit_options->quick_invoke_c_api_import` true in wasm_engine_new when LLVM JIT
is enabled.
And refactor the original perf support
- use WAMR_BUILD_LINUX_PERF as the cmake compilation control
- use WASM_ENABLE_LINUX_PERF as the compiler macro
- use `wamrc --enable-linux-perf` to generate aot file which contains fp operations
- use `iwasm --enable-linux-perf` to create perf map for `perf record`
The popped reachable block may be if block whose else branch hasn't been
translated, and should push the params for the else block if there are.
And use LLVMDisposeMessage to free memory allocated in is_win_platform.
Currently, `data.drop` instruction is implemented by directly modifying the
underlying module. It breaks use cases where you have multiple instances
sharing a single loaded module. `elem.drop` has the same problem too.
This PR fixes the issue by keeping track of which data/elem segments have
been dropped by using bitmaps for each module instances separately, and
add a sample to demonstrate the issue and make the CI run it.
Also add a missing check of dropped elements to the fast-jit `table.init`.
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/2735
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/2772
Error is reported when executing `wamrc --target=thumb -o <aot_file> <wasm_file>`:
```
LLVM ERROR: failed to perform tail call elimination on a call site marked musttail
Aborted (core dumped)
```
Set `abi` to "gnu" for the bare-metal target when `abi` is NULL,
or the below `bh_assert` and `bh_memcpy` may deference a NULL
pointer. Error is reported when running wamrc compiled with
`cmake .. -DCMAKE_BUILD_TYPE=Debug`:
```
core/iwasm/compilation/aot_llvm.c:2584:13: runtime error:
null pointer passed as argument 1, which is declared to never be null
```
Set the vendor-sys of bare-metal targets to "-unknown-none-",
and currently only add "thumbxxx" to the bare-metal target list.
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
- Fix potential invalid push param phis and add incoming phis to a un-existed basic block
- Fix potential invalid shift count int rotl/rotr opcodes
- Resize memory_data_size to UINT32_MAX if it is 4G when hw bound check is enabled
- Fix negative linear memory offset is used for 64-bit target it is const and larger than INT32_MAX
When doing more investigations related to this PR:
https://github.com/bytecodealliance/wasm-micro-runtime/pull/2619
We found that in some scenarios the constant might not be directly
available to the LLVM IR builder, e.g.:
```
(func $const_ret (result i32)
i32.const -5
)
(func $foo
(i32.shr_u (i32.const -1) (call $const_ret))
(i32.const 31)
)
```
In that case, the right parameter to `i32.shr_u` is not constant, therefore
the `SHIFT_COUNT_MASK` isn't applied. However, when the optimization
is enabled (`--opt-level` is 2 or 3), the optimization passes resolve the
call into constant, and that constant is poisoned, causing the compiler to
resolve the whole function to an exception.
According to the description of `buildPerModuleDefaultPipeline()` and
`buildLTOPreLinkDefaultPipeline()`, it is not allowed to call them with `O0` level.
Use `buildO0DefaultPipeline` instead when the opt-level is 0.
The LLVM zext IR may be inserted after the terminator of a basic block
when popping the arguments of a wasm block. Change to insert the
zext IR before the terminator of the basic block to resolve the issue.
Reported in #2620.
According to the specification,
- fNxM_pmin/max returns v1 or v2 based on flt(v1,v2) result
- fNxM_min/max returns +/-NaN, +/-Inf, v1 or v2 based on more than
flt(v1,v2) result
Fixes issue #2561.
Only when the value kind is LLVMConstantIntValueKind and the value
is not undef and not poison can we extract the value of a constant int.
Fixes#2557 and #2559.
Adapt API usage to new interfaces where applicable, including LLVM function
usage, obsoleted llvm::Optional type and removal of unavailable headers.
Know issues:
- AOT static PGO isn't enabled
- LLVM JIT may run failed due to llvm_orc_registerEHFrameSectionWrapper
isn't linked into iwasm
Unaligned store v128 value to the AOT function argument of the pointer for
the extra return value may cause segmentation fault.
Fix the issue reported in #2556.
When AOT out of bound linear memory access or stack overflow occurs, the call stack of
AOT functions cannot be unwound currently, so from the exception handler, runtime
cannot jump back into the place that calls the AOT function.
We temporarily skip the current instruction and let AOT code continue to run and return
to caller as soon as possible. And use the zydis library the decode the current instruction
to get its size.
And remove using RtlAddFunctionTable to register the AOT functions since it doesn't work
currently.
AOT relocation to aot_func_internal#n is generated by wamrc --bounds-checks=1.
Resolve the issue by applying the relocation in the compilation stage by wamrc and
don't generate these relocations in the AOT file.
Fixes#2471.
- Fix windows wamrc link error: aot_generate_tempfile_name undefined.
- Clear windows compile warnings.
- And rename folder `samples/bh_atomic` and `samples/mem_allocator` to
`samples/bh-atomic` and `samples/mem-allocator`.
The old method may not work for some cases. This PR iterates over all instructions
in the function, looking for memcpy, memmove and memset instructions, putting
them into a set, and finally expands them into a loop one by one.
And move this LLVM Pass after building the pipe line of pass builder to ensure that
the memcpy/memmove/memset instrinsics are generated before applying the pass.
Fix some build errors when building wamrc with LLVM-13, reported in #2311
Fix some build warnings when building wamrc with LLVM-16:
```
core/iwasm/compilation/aot_llvm_extra2.cpp:26:26: warning:
‘llvm::None’ is deprecated: Use std::nullopt instead. [-Wdeprecated-declarations]
26 | return llvm::None;
```
Fix a maybe-uninitialized compile warning:
```
core/iwasm/compilation/aot_llvm.c:413:9: warning:
‘update_top_block’ may be used uninitialized in this function [-Wmaybe-uninitialized]
413 | LLVMPositionBuilderAtEnd(b, update_top_block);
```
Move the native stack overflow check from the caller to the callee because the
former doesn't work for call_indirect and imported functions.
Make the stack usage estimation more accurate. Instead of making a guess from
the number of wasm locals in the function, use the LLVM's idea of the stack size
of each MachineFunction. The former is inaccurate because a) it doesn't reflect
optimization passes, and b) wasm locals are not the only reason to use stack.
To use the post-compilation stack usage information without requiring 2-pass
compilation or machine-code imm rewriting, introduce a global array to store
stack consumption of each functions:
For JIT, use a custom IRCompiler with an extra pass to fill the array.
For AOT, use `clang -fstack-usage` equivalent because we support external llc.
Re-implement function call stack usage estimation to reflect the real calling
conventions better. (aot_estimate_stack_usage_for_function_call)
Re-implement stack estimation logic (--enable-memory-profiling) based on the new
machinery.
Discussions: #2105.
LLVM PGO (Profile-Guided Optimization) allows the compiler to better optimize code
for how it actually runs. This PR implements the AOT static PGO, and is tested on
Linux x86-64 and x86-32. The basic steps are:
1. Use `wamrc --enable-llvm-pgo -o <aot_file_of_pgo> <wasm_file>`
to generate an instrumented aot file.
2. Compile iwasm with `cmake -DWAMR_BUILD_STATIC_PGO=1` and run
`iwasm --gen-prof-file=<raw_profile_file> <aot_file_of_pgo>`
to generate the raw profile file.
3. Run `llvm-profdata merge -output=<profile_file> <raw_profile_file>`
to merge the raw profile file into the profile file.
4. Run `wamrc --use-prof-file=<profile_file> -o <aot_file> <wasm_file>`
to generate the optimized aot file.
5. Run the optimized aot_file: `iwasm <aot_file>`.
The test scripts are also added for each benchmark, run `test_pgo.sh` under
each benchmark's folder to test the AOT static pgo.
Segue is an optimization technology which uses x86 segment register to store
the WebAssembly linear memory base address, so as to remove most of the cost
of SFI (Software-based Fault Isolation) base addition and free up a general
purpose register, by this way it may:
- Improve the performance of JIT/AOT
- Reduce the footprint of JIT/AOT, the JIT/AOT code generated is smaller
- Reduce the compilation time of JIT/AOT
This PR uses the x86-64 GS segment register to apply the optimization, currently
it supports linux and linux-sgx platforms on x86-64 target. By default it is disabled,
developer can use the option below to enable it for wamrc and iwasm(with LLVM
JIT enabled):
```bash
wamrc --enable-segue=[<flags>] -o output_file wasm_file
iwasm --enable-segue=[<flags>] wasm_file [args...]
```
`flags` can be:
i32.load, i64.load, f32.load, f64.load, v128.load,
i32.store, i64.store, f32.store, f64.store, v128.store
Use comma to separate them, e.g. `--enable-segue=i32.load,i64.store`,
and `--enable-segue` means all flags are added.
Acknowledgement:
Many thanks to Intel Labs, UC San Diego and UT Austin teams for introducing this
technology and the great support and guidance!
Signed-off-by: Wenyong Huang <wenyong.huang@intel.com>
Co-authored-by: Vahldiek-oberwagner, Anjo Lucas <anjo.lucas.vahldiek-oberwagner@intel.com>
- Translate all the opcodes of threads spec proposal for Fast JIT
- Add the atomic flag for Fast JIT load/store IRs to support atomic load/store
- Add new atomic related Fast JIT IRs and translate them in the codegen
- Add suspend_flags check in branch opcodes and before/after call function
- Modify CI to enable Fast JIT multi-threading test
Co-authored-by: TianlongLiang <tianlong.liang@intel.com>
In LLVM AOT/JIT compiler, only need to check the suspend_flags when memory is
a shared memory since the shared memory must be enabled for multi-threading,
so as not to impact the performance in non-multi-threading memory mode. Also
refine the LLVM IRs to check the suspend_flags.
And fix an issue of multi-tier jit for multi-threading, the instance of the child thread
should be removed from the instance list before it is de-instantiated.
- Implement atomic.fence to ensure a proper memory synchronization order
- Destroy exec_env_singleton first in wasm/aot deinstantiation
- Change terminate other threads to wait for other threads in
wasm_exec_env_destroy
- Fix detach thread in thread_manager_start_routine
- Fix duplicated lock cluster->lock in wasm_cluster_cancel_thread
- Add lib-pthread and lib-wasi-threads compilation to Windows CI
Raising "wasi proc exit" exception, spreading it to other threads and then
clearing it in all threads may result in unexpected behavior: the sub thread
may end first, handle the "wasi proc exit" exception and clear exceptions
of other threads, including the main thread. And when main thread's
exception is cleared, it may continue to run and throw "unreachable"
exception. This also leads to some assertion failed.
Ignore exception spreading for "wasi proc exit" and don't clear exception
of other threads to resolve the issue.
And add suspend flag check after atomic wait since the atomic wait may
be notified by other thread when exception occurs.