LLVM PGO (Profile-Guided Optimization) allows the compiler to better optimize code
for how it actually runs. This PR implements the AOT static PGO, and is tested on
Linux x86-64 and x86-32. The basic steps are:
1. Use `wamrc --enable-llvm-pgo -o <aot_file_of_pgo> <wasm_file>`
to generate an instrumented aot file.
2. Compile iwasm with `cmake -DWAMR_BUILD_STATIC_PGO=1` and run
`iwasm --gen-prof-file=<raw_profile_file> <aot_file_of_pgo>`
to generate the raw profile file.
3. Run `llvm-profdata merge -output=<profile_file> <raw_profile_file>`
to merge the raw profile file into the profile file.
4. Run `wamrc --use-prof-file=<profile_file> -o <aot_file> <wasm_file>`
to generate the optimized aot file.
5. Run the optimized aot_file: `iwasm <aot_file>`.
The test scripts are also added for each benchmark, run `test_pgo.sh` under
each benchmark's folder to test the AOT static pgo.
Segue is an optimization technology which uses x86 segment register to store
the WebAssembly linear memory base address, so as to remove most of the cost
of SFI (Software-based Fault Isolation) base addition and free up a general
purpose register, by this way it may:
- Improve the performance of JIT/AOT
- Reduce the footprint of JIT/AOT, the JIT/AOT code generated is smaller
- Reduce the compilation time of JIT/AOT
This PR uses the x86-64 GS segment register to apply the optimization, currently
it supports linux and linux-sgx platforms on x86-64 target. By default it is disabled,
developer can use the option below to enable it for wamrc and iwasm(with LLVM
JIT enabled):
```bash
wamrc --enable-segue=[<flags>] -o output_file wasm_file
iwasm --enable-segue=[<flags>] wasm_file [args...]
```
`flags` can be:
i32.load, i64.load, f32.load, f64.load, v128.load,
i32.store, i64.store, f32.store, f64.store, v128.store
Use comma to separate them, e.g. `--enable-segue=i32.load,i64.store`,
and `--enable-segue` means all flags are added.
Acknowledgement:
Many thanks to Intel Labs, UC San Diego and UT Austin teams for introducing this
technology and the great support and guidance!
Signed-off-by: Wenyong Huang <wenyong.huang@intel.com>
Co-authored-by: Vahldiek-oberwagner, Anjo Lucas <anjo.lucas.vahldiek-oberwagner@intel.com>
Add nightly (UTC time) checks with asan and ubsan, and also put gcc-4.8 build
to nightly run since we don't need to run it with every PR.
Co-authored-by: Maksim Litskevich <makslit@amazon.co.uk>
For some platforms WAMR gets compiled with `CONFIG_HAS_CLOCK_NANOSLEEP=1`,
while `clock_nanosleep` is not present at the platform, which causes compilation error.
Add check for macro `DISABLE_CLOCK_NANOSLEEP` to resolve the issue, only when
the macro isn't defined can the macro `CONFIG_HAS_CLOCK_NANOSLEEP` take effect.
Add VX delegation as an external delegation of TFLite, so that several NPU/GPU
(from VeriSilicon, NXP, Amlogic) can be controlled via WASI-NN.
Test Code can work with the X86 simulator.
Fix issue reported in #2172: wasm-c-api `wasm_func_call` may use a wrong exec_env
when multi-threading is enabled, with error "invalid exec env" reported
Fix issue reported in #2149: main instance's `c_api_func_imports` are not passed to
the counterpart of new thread's instance in wasi-threads mode
Fix issue of invalid size calculated to copy `c_api_func_imports` in pthread mode
And refactor the code to use `wasm_cluster_dup_c_api_imports` to copy the
`c_api_func_imports` to new thread for wasi-threads mode and pthread mode.
Currently, if a thread is spawned and raises an exception after the main thread
has finished, iwasm returns with success instead of returning 1 (i.e. error).
Since wasm_runtime_get_wasi_exit_code waits for all threads to finish and only
returns the wasi exit code, this PR performs the exception check again and
returns error if an exception was raised.
Since the Tensorflow library is already installed in many cases(especially in the
case of the embedded system), move the installation code to find_package.
According to the 1999 ISO C standard (C99), size_t is an unsigned integer type of
at least 16 bit (see sections 7.17 and 7.18.3), it may be uint32 in 32-bit platforms:
https://en.cppreference.com/w/cpp/types/size_t
Calling function `size_t min(size_t, size_t)` with two uint64 arguments may get
invalid result.
Co-authored-by: Georgii Rylov <godjan@amazon.co.uk>
- Translate all the opcodes of threads spec proposal for Fast JIT
- Add the atomic flag for Fast JIT load/store IRs to support atomic load/store
- Add new atomic related Fast JIT IRs and translate them in the codegen
- Add suspend_flags check in branch opcodes and before/after call function
- Modify CI to enable Fast JIT multi-threading test
Co-authored-by: TianlongLiang <tianlong.liang@intel.com>
In LLVM AOT/JIT compiler, only need to check the suspend_flags when memory is
a shared memory since the shared memory must be enabled for multi-threading,
so as not to impact the performance in non-multi-threading memory mode. Also
refine the LLVM IRs to check the suspend_flags.
And fix an issue of multi-tier jit for multi-threading, the instance of the child thread
should be removed from the instance list before it is de-instantiated.
Load memory data size in each time memory access boundary check in
multi-threading mode since it may be changed by other threads when
memory growing.
And use `memory->memory_data_size` instead of
`memory->num_bytes_per_page * memory->cur_page_count` to refine
the code.
When ref.func opcode refers to a function whose function index no smaller than
current function, the destination func should be forward-declared: it is declared
in the table element segments, or is declared in the export list.
In multi-threading, this line will eventually call `wasm_cluster_wait_for_all_except_self`:
`DEINIT_VEC(store->instances, wasm_instance_vec_delete)`
As the threads are joining they can call `wasm_interp_dump_call_stack` which tries to
use the module frames but they were already freed by this line:
`DEINIT_VEC(store->modules, wasm_module_vec_delete)`
This PR swaps the order that these are deleted so module is deleted after the instances.
Co-authored-by: Andrew Chambers <ncham@amazon.com>
Try using existing exec_env to execute wasm app's malloc/free func and
execute post instantiation functions. Create a new exec_env only when
no existing exec_env was found.
POLLRDNORM/POLLWRNORM may be not defined in uClibc, so replace them
with the equivalent POLLIN/POLLOUT.
Refer to https://www.man7.org/linux/man-pages/man2/poll.2.html
POLLRDNORM Equivalent to POLLIN
POLLWRNORM Equivalent to POLLOUT
Signed-off-by: Thomas Devoogdt <thomas.devoogdt@barco.com>
Update wasi-libc version to resolve the hang issue when running wasi-threads cases.
Implement custom sync primitives as a counterpart of `pthread_barrier_wait` to
attempt to replace pthread sync primitives since they seem to cause data races
when running with the thread sanitizer.
Use pre-created exec_env for instantiation and module_malloc/free,
use the same exec_env of the current thread to avoid potential
unexpected behavior.
And remove unnecessary shared_mem_lock in wasm_module_free,
which may cause dead lock.
Use the shared memory's shared_mem_lock to lock the whole atomic.wait and
atomic.notify processes, and use it for os_cond_reltimedwait and os_cond_notify,
so as to make the whole processes actual atomic operations:
the original implementation accesses the wait address with shared_mem_lock
and uses wait_node->wait_lock for os_cond_reltimedwait, which is not an atomic
operation.
And remove the unnecessary wait_map_lock and wait_lock, since the whole
processes are already locked by shared_mem_lock.
`wasi-sdk-20` pre-release can be used to avoid building `wasi-libc` to enable threads.
It's not possible to use `wasi-sdk-20` pre-release on Ubuntu 20.04 because of
incompatibility with the glibc version:
```bash
/opt/wasi-sdk/bin/clang: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
(required by /opt/wasi-sdk/bin/clang)
```
- Remove notify_stale_threads_on_exception and change atomic.wait
to be interruptible by keep waiting and checking every one second,
like the implementation of poll_oneoff in libc-wasi
- Wait all other threads exit and then get wasi exit_code to avoid
getting invalid value
- Inherit suspend_flags of parent thread while creating new thread to
avoid terminated flag isn't set for new thread
- Fix wasi-threads test case update_shared_data_and_alloc_heap
- Add "Lib wasi-threads enabled" prompt for cmake
- Fix aot get exception, use aot_copy_exception instead
Fix a data race for test main_proc_exit_wait.c from #1963.
And fix atomic_wait logic that was wrong before:
- a thread 1 started executing wasm instruction wasm_atomic_wait
but hasn't reached waiting on condition variable
- a main thread calls proc_exit and notifies all the threads that reached
waiting on condition variable
Which leads to thread 1 hang on waiting on condition variable after that
Now it's atomically checked whether proc_exit was already called.
In the WASI thread test modified in this PR, malloc was used in multiple threads
without a lock. But wasi-libc implementation of malloc is not thread-safe.
Remove restrictions:
- Only 1 WASM app at a time
- Only 1 model at a time
- `graph` and `graph-execution-context` are ignored
Refer to previous document:
e8d718096d/core/iwasm/libraries/wasi-nn/README.md
- Implement atomic.fence to ensure a proper memory synchronization order
- Destroy exec_env_singleton first in wasm/aot deinstantiation
- Change terminate other threads to wait for other threads in
wasm_exec_env_destroy
- Fix detach thread in thread_manager_start_routine
- Fix duplicated lock cluster->lock in wasm_cluster_cancel_thread
- Add lib-pthread and lib-wasi-threads compilation to Windows CI
In wasm_cluster_create_thread, the new_exec_env is added into the cluster's
exec_env list before the thread is created, so other threads can access the
fields of new_exec_env once the cluster->lock is unlocked, while the
new_exec_env's handle is set later inside the thread routine. This may result
in the new_exec_env's handle be invalidly accessed by other threads.
- CMakeLists.txt: add lib_export.h to install list
- Fast JIT: enlarge spill cache size to enable several standalone cases
when hw bound check is disabled
- Thread manager: wasm_cluster_exit_thread may destroy an invalid
exec_env->module_inst when exec_env was destroyed before
- samples/socket-api: fix failure to run timeout_client.wasm
- enhance CI build wasi-libc and sample/wasm-c-api-imports CMakeLlist.txt
Support collecting code coverage with wamr-test-suites script by using
lcov and genhtml tools, eg.:
cd tests/wamr-test-suites
./test_wamr.sh -s spec -b -P -C
The default code coverage and html files are generated at:
tests/wamr-test-suites/workspace/wamr.lcov
tests/wamr-test-suites/workspace/wamr-lcov.zip
And update wamr-test-suites scripts to support testing GC spec cases to
avoid frequent synchronization conflicts between branch main and dev/gc.
Raising "wasi proc exit" exception, spreading it to other threads and then
clearing it in all threads may result in unexpected behavior: the sub thread
may end first, handle the "wasi proc exit" exception and clear exceptions
of other threads, including the main thread. And when main thread's
exception is cleared, it may continue to run and throw "unreachable"
exception. This also leads to some assertion failed.
Ignore exception spreading for "wasi proc exit" and don't clear exception
of other threads to resolve the issue.
And add suspend flag check after atomic wait since the atomic wait may
be notified by other thread when exception occurs.
Fix issues in the libc-wasi `poll_oneoff` when thread manager is enabled:
- The exception of a thread may be cleared when other thread runs into
`proc_exit` and then calls `clear_wasi_proc_exit_exception`, so should not
use `wasm_runtime_get_exception` to check whether an exception was
thrown, use `wasm_cluster_is_thread_terminated` instead
- We divided one time poll_oneoff into many times poll_oneoff to check
the exception to avoid long time waiting in previous PR, but if all events
returned by one time poll are all waiting events, we need to continue to
wait but not return directly.
Follow-up on #1951. Tested with multiple timeout values, with and without
interruption and measured the time spent sleeping.
- Use execute_post_instantiate_functions to call start, _initialize,
__post_instantiate, __wasm_call_ctors functions after instantiation
- Always call start function for both main instance and sub instance
- Only call _initialize and __post_instantiate for main instance
- Only call ___wasm_call_ctors for main instance and when bulk memory
is enabled and wasi import functions are not found
- When hw bound check is enabled, use the existing exec_env_tls
to call func for sub instance, and switch exec_env_tls's module inst
to current module inst to avoid checking failure and using the wrong
module inst
Add shared memory lock when accessing the address to atomic wait/notify
inside linear memory to resolve its data race issue.
And statically initialize the goto table of interpreter labels to resolve the
data race issue of accessing the table.
The problem was found by a `Golang + WAMR (as CGO)` wrapped by EGO
in SGX Enclave.
`fstat()` in EGO returns dummy values:
- EGO uses a `mount` configuration to define the mount points that apply
the host file system presented to the Encalve.
- EGO has a different programming model: the entire application runs inside
the enclave. Manual ECALLs/OCALLs by application code are neither
required nor possible.
Add platform ego and add macro control for the return value checking of
`fd_determine_type_rights` in libc-wasi to resolve the issue.
The function has been there for long. While what it does look a bit unsafe
as it calls a function which may be not wasm-wise exported explicitly, it's
useful and widely used when implementing callback-taking APIs, including
our pthread_create's implementation.
Destroy child thread's exec_env before destroying its module instance and
add the process into cluster's lock to avoid possible data race: if exec_env
is removed from custer's exec_env_list and destroyed later, the main thread
may not wait it and start to destroy the wasm runtime, and the destroying
of the sub thread's exec_env may free or overread/written an destroyed or
re-initialized resource.
And fix an issue in wasm_cluster_cancel_thread.
The start/initialize functions of wasi module are to do some initialization work
during instantiation, which should be only called one time in the instantiation
of main instance. For example, they may initialize the data in linear memory,
if the data is changed later by the main instance, and re-initialized again by
the child instance, unexpected behaviors may occur.
And clear a shadow warning in classic interpreter.
Multiple threads generated from the same module should use the same
lock to protect the atomic operations.
Before this PR, each thread used a different lock to protect atomic
operations (e.g. atomic add), making the lock ineffective.
Fix#1958.
Add APIs to help prepare the imports for the wasm-c-api `wasm_instance_new`:
- wasm_importtype_is_linked
- wasm_runtime_is_import_func_linked
- wasm_runtime_is_import_global_linked
- wasm_extern_new_empty
For wasm-c-api, developer may use `wasm_module_imports` to get the import
types info, check whether an import func/global is linked with the above API,
and ignore the linking of an import func/global with `wasm_extern_new_empty`.
Sample `wasm-c-api-import` is added and document is updated.
When de-instantiating the wasm module instance, remove it from the module's
instance list before freeing func_ptrs and fast_jit_func_ptrs of the instance, to avoid
accessing these freed memory in the JIT backend compilation threads.
Enable setting running mode when executing a wasm bytecode file
- Four running modes are supported: interpreter, fast-jit, llvm-jit and multi-tier-jit
- Add APIs to set/get the default running mode of the runtime
- Add APIs to set/get the running mode of a wasm module instance
- Add running mode options for iwasm command line tool
And add size/opt level options for LLVM JIT
The definitions `enum WASMExceptionID` in the compilation of wamrc and the compilation
of Fast JIT are different, since the latter enables the Fast JIT macro while the former doesn't.
This causes that the exception ID in AOT file generated by wamrc may be different from
iwasm binary compiled with Fast JIT enabled, and may result in unexpected behavior.
Remove the macro control to resolve it.
Change an error to warning when checking wasi abi compatibility in loader, for rust case below:
#[no_mangle]
pub extern "C" fn main() {
println!("foo");
}
compile it with `cargo build --target wasm32-wasi`, a wasm file is generated with wasi apis imported
and a "void main(void)" function exported.
Other runtime e.g. wasmtime allows to load it and execute the main function with `--invoke` option.
- Split logic in several dockers
- runtime: wasi-nn-cpu and wasi-nn- Nvidia-gpu.
- compilation: wasi-nn-compile. Prepare the testing wasm and generates the TFLites.
- Implement GPU support for TFLite with Opencl.
- Reorganize the library structure
- Use the latest version of `wasi-nn` wit (Oct 25, 2022):
0f77c48ec1/wasi-nn.wit.md
- Split logic that converts WASM structs to native structs in a separate file
- Simplify addition of new frameworks
This syscall doesn't need allocating stack or TLS and it's expected from the application
to do that instead. E.g. WASI-libc already does this for `pthread_create`.
Also fix some of the examples to allocate memory for stack and not use stack before
the stack pointer is set to a correct value.
Because stack grows from high address towards low address, the value
returned by malloc is the end of the stack, not top of the stack. The top
of the stack is the end of the allocated space (i.e. address returned by
malloc + cluster size).
Refer to #1790.
The original CI didn't actually run wasi test suite for x86-32 since the `TEST_ON_X86_32=true`
isn't written into $GITHUB_ENV.
And refine the error output when failed to link import global.
According to the [WASI thread specification](https://github.com/WebAssembly/wasi-threads/pull/16),
some thread identifiers are reserved and should not be used. In fact, only IDs between `1` and
`0x1FFFFFFF` are valid.
The thread ID allocator has been moved to a separate class to avoid polluting the
`lib_wasi_threads_wrapper` logic.
Should use import_function_count but not import_count to calculate
the func_index in handle_name_section when custom name section
feature is enabled.
And clear the compile warnings of mini loader.
Support modes:
- run a commander module only
- run a reactor module only
- run a commander module and a/multiple reactor modules together
commander propagates WASIArguments to reactors
Implement 2-level Multi-tier JIT engine: tier-up from Fast JIT to LLVM JIT to
get quick cold startup by Fast JIT and better performance by gradually
switching to LLVM JIT when the LLVM JIT functions are compiled by the
backend threads.
Refer to:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/1302
Allow to add watchpoints to variables for source debugging. For instance:
`breakpoint set variable var`
will pause WAMR execution when the address at var is written to.
Can also set read/write watchpoints by passing r/w flags. This will pause
execution when the address at var is read:
`watchpoint set variable -w read var`
Add two linked lists for read/write watchpoints. When the debug message
handler receives a watchpoint request, it adds/removes to one/both of these
lists. In the interpreter, when an address is read or stored to, check whether
the address is in these lists. If so, throw a sigtrap and suspend the process.
When a wasm module is duplicated instantiated with wasm_instance_new,
the function import info of the previous instantiation may be overwritten by
the later instantiation, which may cause unexpected behavior.
Store the function import info into the module instance to fix the issue.
This PR allows reusing thread ids once they are released. That is done by using
a stack data structure to keep track of the used ids.
When a thread is created, it takes an available identifier from the stack. When
the thread exits, it returns the id to the stack of available identifiers.
Implement 2-level Multi-tier JIT engine: tier-up from Fast JIT to LLVM JIT to
get quick cold startup by Fast JIT and better performance by gradually
switching to LLVM JIT when the LLVM JIT functions are compiled by the
backend threads.
Refer to:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/1302
For now this implementation uses thread manager.
Not sure whether thread manager is needed in that case. In the future there'll be likely another syscall added (for pthread_exit) and for that we might need some kind of thread management - with that in mind, we keep thread manager for now and will refactor this later if needed.
Allow to add watchpoints to variables for source debugging. For instance:
`breakpoint set variable var`
will pause WAMR execution when the address at var is written to.
Can also set read/write watchpoints by passing r/w flags. This will pause
execution when the address at var is read:
`watchpoint set variable -w read var`
Add two linked lists for read/write watchpoints. When the debug message
handler receives a watchpoint request, it adds/removes to one/both of these
lists. In the interpreter, when an address is read or stored to, check whether
the address is in these lists. If so, throw a sigtrap and suspend the process.
When a wasm module is duplicated instantiated with wasm_instance_new,
the function import info of the previous instantiation may be overwritten by
the later instantiation, which may cause unexpected behavior.
Store the function import info into the module instance to fix the issue.
Use sha256 to hash binary file content. If the incoming wasm binary is
cached before, wasm_module_new() simply returns the existed one.
Use -DWAMR_BUILD_WASM_CACHE=0/1 to control the feature.
OpenSSL 1.1.1 is required if the feature is enabled.
Record the store number of current thread with struct thread_local_stores
or tls thread_local_stores_num to fix the issue:
- Only call wasm_runtime_init_thread_env() in the first wasm_store_new of
current thread
- Only call wasm_runtime_destroy_thread_env() in the last wasm_store_delete
of current thread
And remove the unused store list in the engine.