As a part of stress-testing we want to ensure that mutex implementation is working
correctly and protecting shared resource to be allocated from other threads when
mutex is locked.
This test covers the most common situations that happen when some program uses
mutexes like locks from various threads, locks from the same thread etc.
When AOT out of bound linear memory access or stack overflow occurs, the call stack of
AOT functions cannot be unwound currently, so from the exception handler, runtime
cannot jump back into the place that calls the AOT function.
We temporarily skip the current instruction and let AOT code continue to run and return
to caller as soon as possible. And use the zydis library the decode the current instruction
to get its size.
And remove using RtlAddFunctionTable to register the AOT functions since it doesn't work
currently.
AOT relocation to aot_func_internal#n is generated by wamrc --bounds-checks=1.
Resolve the issue by applying the relocation in the compilation stage by wamrc and
don't generate these relocations in the AOT file.
Fixes#2471.
We need to apply some bug fixes that were merged to wasi-libc because wasi-sdk-20
is about half a year old.
It is a temporary solution and the code will be removed when wasi-sdk 21 is released.
- Fix windows wamrc link error: aot_generate_tempfile_name undefined.
- Clear windows compile warnings.
- And rename folder `samples/bh_atomic` and `samples/mem_allocator` to
`samples/bh-atomic` and `samples/mem-allocator`.
## Context
Some native libraries may want to explicitly delete an externref object without
waiting for the module instance to be deleted.
In addition, it may want to add a cleanup function.
## Proposed Changes
Implement:
* `wasm_externref_objdel` to explicitly delete an externeref'd object.
* `wasm_externref_set_cleanup` to set a cleanup function that is called when
the externref'd object is deleted.
In macro bh_memcpy_s, bh_memcy_wa and bh_memmove_s, no need to do extra check
for length is zero or not because it was already done inside of the functions called.
- Inherit shared memory from the parent instance, instead of
trying to look it up by the underlying module. The old method
works correctly only when every cluster uses different module.
- Use reference count in WASMMemoryInstance/AOTMemoryInstance
to mark whether the memory is shared or not
- Retire WASMSharedMemNode
- For atomic opcode implementations in the interpreters, use
a global lock for now
- Update the internal API users
(wasi-threads, lib-pthread, wasm_runtime_spawn_thread)
Fixes https://github.com/bytecodealliance/wasm-micro-runtime/issues/1962
- Avoid destroying module instance repeatedly in pthread_exit_wrapper and
wasm_thread_cluster_exit.
- Wait enough time in pthread_join_wrapper for target thread to exit and
destroy its resources.
We need to make a test that runs longer than the tests we had before to check
some problems that might happen after running for some time (e.g. memory
corruption or something else).
Tests were failing because the right permissions were not provided to iwasm.
Also, test failures didn't trigger build failure due to typo - also fixed in this change.
In addition to that, this PR fixes a few issues with the test itself:
* the `server_init_complete` was not reset early enough causing the client to occasionally
assume the server started even though it didn't yet
* set `SO_REUSEADDR` on the server socket so the port can be reused shortly after
closing the previous socket
* defined receive-send-receive sequence from server to make sure server is alive at the
time of sending message
The old method may not work for some cases. This PR iterates over all instructions
in the function, looking for memcpy, memmove and memset instructions, putting
them into a set, and finally expands them into a loop one by one.
And move this LLVM Pass after building the pipe line of pass builder to ensure that
the memcpy/memmove/memset instrinsics are generated before applying the pass.
And return ENOSYS. We do that so we can at least compile the code on CI.
We'll be gradually enabling more and more functions.
Also, enabled `proc_raise()` for windows.
* disable translations of errno codes that aren't defined on Windows
* undef `min()` macro if it is defined to not conflict with the `min()` function we define
* implement `shed_yield` wasi call
* disable some of the features in the config for windows by default
There is no standard `realpath` function in the C/C++ standard libraries for Windows,
use `_fullpath` function instead to get absolute path of a directory.
We have observed a significant performance degradation after merging
https://github.com/bytecodealliance/wasm-micro-runtime/pull/1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
esp32-s3's instruction memory and data memory can be accessed through mutual mirroring way,
so we define a new feature named as WASM_MEM_DUAL_BUS_MIRROR.
Build wasi-libc library on Windows since libuv may be not supported. This PR is a first step
to make it working, but there's still a number of changes to get it fully working.
Allow to use `cmake -DWAMR_CONFIGURABLE_BOUNDS_CHECKS=1` to
build iwasm, and then run `iwasm --disable-bounds-checks` to disable the
memory access boundary checks.
And add two APIs:
`wasm_runtime_set_bounds_checks` and `wasm_runtime_is_bounds_checks_enabled`
Calling `__wasi_sock_addr_resolve` syscall causes native stack overflow.
Given this is a standard function available in WAMR, we should have at least
the default stack size large enough to handle this case.
The socket tests were updated so they also run in separate thread, but
the simple retro program is:
```C
void *th(void *p)
{
struct addrinfo *res;
getaddrinfo("amazon.com", NULL, NULL, &res);
return NULL;
}
int main(int argc, char **argv)
{
pthread_t pt;
pthread_create(&pt, NULL, th, NULL);
pthread_join(pt, NULL);
return 0;
}
```
## Context
Currently, WAMR supports compiling iwasm with flag `WAMR_BUILD_WASI_NN`.
However, there are scenarios where the user might prefer having it as a shared library.
## Proposed Changes
Decouple wasi-nn context management by internally managing the context given
a module instance reference.
Fix some build errors when building wamrc with LLVM-13, reported in #2311
Fix some build warnings when building wamrc with LLVM-16:
```
core/iwasm/compilation/aot_llvm_extra2.cpp:26:26: warning:
‘llvm::None’ is deprecated: Use std::nullopt instead. [-Wdeprecated-declarations]
26 | return llvm::None;
```
Fix a maybe-uninitialized compile warning:
```
core/iwasm/compilation/aot_llvm.c:413:9: warning:
‘update_top_block’ may be used uninitialized in this function [-Wmaybe-uninitialized]
413 | LLVMPositionBuilderAtEnd(b, update_top_block);
```
## Context
Path to models use `/assets` for testing inside docker. While testing directly from
the repo we are forced to use soft-links or modify the paths.
## Proposed Changes
Use relative path and adjust docker volumes in docs.
Major changes:
- Public headers inside `wasi-nn/include`
- Put cmake files in `cmake` folder
- Make linux iwasm link with `${WASI_NN_LIBS}` so iwasm can enable wasi-nn
This PR attempts to search for the system libuv and use it if found instead of
downloading it. As reported in #1831, this is needed because some tools
build in a sandbox and clear the extra sources.
Move the native stack overflow check from the caller to the callee because the
former doesn't work for call_indirect and imported functions.
Make the stack usage estimation more accurate. Instead of making a guess from
the number of wasm locals in the function, use the LLVM's idea of the stack size
of each MachineFunction. The former is inaccurate because a) it doesn't reflect
optimization passes, and b) wasm locals are not the only reason to use stack.
To use the post-compilation stack usage information without requiring 2-pass
compilation or machine-code imm rewriting, introduce a global array to store
stack consumption of each functions:
For JIT, use a custom IRCompiler with an extra pass to fill the array.
For AOT, use `clang -fstack-usage` equivalent because we support external llc.
Re-implement function call stack usage estimation to reflect the real calling
conventions better. (aot_estimate_stack_usage_for_function_call)
Re-implement stack estimation logic (--enable-memory-profiling) based on the new
machinery.
Discussions: #2105.
Compilation in strict mode fails with
```
wasm_micro_runtime/core/shared/platform/android/platform_init.c:122:30:
error: declaration of 'struct epoll_event` will not be visible outside of this
function [-Werror,-Wvisibility]
epoll_pwait(int epfd, struct epoll_event *events, int maxevents, int timeout,
^
1 error generated.
```
Co-authored-by: Misha Gridnev <gridman@google.com>
Writing GS segment register is not allowed on linux-sgx since it is used as
the base address of thread data in 64-bit hw mode. Reported in #2252.
Disable writing it and disable segue optimization for linux-sgx platform.
LLVM PGO (Profile-Guided Optimization) allows the compiler to better optimize code
for how it actually runs. This PR implements the AOT static PGO, and is tested on
Linux x86-64 and x86-32. The basic steps are:
1. Use `wamrc --enable-llvm-pgo -o <aot_file_of_pgo> <wasm_file>`
to generate an instrumented aot file.
2. Compile iwasm with `cmake -DWAMR_BUILD_STATIC_PGO=1` and run
`iwasm --gen-prof-file=<raw_profile_file> <aot_file_of_pgo>`
to generate the raw profile file.
3. Run `llvm-profdata merge -output=<profile_file> <raw_profile_file>`
to merge the raw profile file into the profile file.
4. Run `wamrc --use-prof-file=<profile_file> -o <aot_file> <wasm_file>`
to generate the optimized aot file.
5. Run the optimized aot_file: `iwasm <aot_file>`.
The test scripts are also added for each benchmark, run `test_pgo.sh` under
each benchmark's folder to test the AOT static pgo.
Segue is an optimization technology which uses x86 segment register to store
the WebAssembly linear memory base address, so as to remove most of the cost
of SFI (Software-based Fault Isolation) base addition and free up a general
purpose register, by this way it may:
- Improve the performance of JIT/AOT
- Reduce the footprint of JIT/AOT, the JIT/AOT code generated is smaller
- Reduce the compilation time of JIT/AOT
This PR uses the x86-64 GS segment register to apply the optimization, currently
it supports linux and linux-sgx platforms on x86-64 target. By default it is disabled,
developer can use the option below to enable it for wamrc and iwasm(with LLVM
JIT enabled):
```bash
wamrc --enable-segue=[<flags>] -o output_file wasm_file
iwasm --enable-segue=[<flags>] wasm_file [args...]
```
`flags` can be:
i32.load, i64.load, f32.load, f64.load, v128.load,
i32.store, i64.store, f32.store, f64.store, v128.store
Use comma to separate them, e.g. `--enable-segue=i32.load,i64.store`,
and `--enable-segue` means all flags are added.
Acknowledgement:
Many thanks to Intel Labs, UC San Diego and UT Austin teams for introducing this
technology and the great support and guidance!
Signed-off-by: Wenyong Huang <wenyong.huang@intel.com>
Co-authored-by: Vahldiek-oberwagner, Anjo Lucas <anjo.lucas.vahldiek-oberwagner@intel.com>
Add nightly (UTC time) checks with asan and ubsan, and also put gcc-4.8 build
to nightly run since we don't need to run it with every PR.
Co-authored-by: Maksim Litskevich <makslit@amazon.co.uk>
For some platforms WAMR gets compiled with `CONFIG_HAS_CLOCK_NANOSLEEP=1`,
while `clock_nanosleep` is not present at the platform, which causes compilation error.
Add check for macro `DISABLE_CLOCK_NANOSLEEP` to resolve the issue, only when
the macro isn't defined can the macro `CONFIG_HAS_CLOCK_NANOSLEEP` take effect.
Add VX delegation as an external delegation of TFLite, so that several NPU/GPU
(from VeriSilicon, NXP, Amlogic) can be controlled via WASI-NN.
Test Code can work with the X86 simulator.
Fix issue reported in #2172: wasm-c-api `wasm_func_call` may use a wrong exec_env
when multi-threading is enabled, with error "invalid exec env" reported
Fix issue reported in #2149: main instance's `c_api_func_imports` are not passed to
the counterpart of new thread's instance in wasi-threads mode
Fix issue of invalid size calculated to copy `c_api_func_imports` in pthread mode
And refactor the code to use `wasm_cluster_dup_c_api_imports` to copy the
`c_api_func_imports` to new thread for wasi-threads mode and pthread mode.
Currently, if a thread is spawned and raises an exception after the main thread
has finished, iwasm returns with success instead of returning 1 (i.e. error).
Since wasm_runtime_get_wasi_exit_code waits for all threads to finish and only
returns the wasi exit code, this PR performs the exception check again and
returns error if an exception was raised.
Since the Tensorflow library is already installed in many cases(especially in the
case of the embedded system), move the installation code to find_package.
According to the 1999 ISO C standard (C99), size_t is an unsigned integer type of
at least 16 bit (see sections 7.17 and 7.18.3), it may be uint32 in 32-bit platforms:
https://en.cppreference.com/w/cpp/types/size_t
Calling function `size_t min(size_t, size_t)` with two uint64 arguments may get
invalid result.
Co-authored-by: Georgii Rylov <godjan@amazon.co.uk>
Make `hmu_tree_node` struct packed and add 4 padding bytes before `kfc_tree_root_buf`
field in `gc_heap_struct` struct to ensure the `left/right/parent` fields in `hmu_tree_node`
are 8-byte aligned on the 64-bit target which doesn't support unaligned memory access.
Fix the issue reported in #2136.
- Translate all the opcodes of threads spec proposal for Fast JIT
- Add the atomic flag for Fast JIT load/store IRs to support atomic load/store
- Add new atomic related Fast JIT IRs and translate them in the codegen
- Add suspend_flags check in branch opcodes and before/after call function
- Modify CI to enable Fast JIT multi-threading test
Co-authored-by: TianlongLiang <tianlong.liang@intel.com>
In LLVM AOT/JIT compiler, only need to check the suspend_flags when memory is
a shared memory since the shared memory must be enabled for multi-threading,
so as not to impact the performance in non-multi-threading memory mode. Also
refine the LLVM IRs to check the suspend_flags.
And fix an issue of multi-tier jit for multi-threading, the instance of the child thread
should be removed from the instance list before it is de-instantiated.
In #1928 we added support for GCC 4.8 but we don't continuously test if it's
working. This PR added a GitHub actions job to test compilation on GCC 4.8
for interpreters and Fast JIT (LLVM JIT/AOT might be added in the future).
The compilation is done using ubuntu 14.04 image as that's the simplest way
to get GCC 4.8 compiler. The job only compiles the code but does not run any
tests.
Load memory data size in each time memory access boundary check in
multi-threading mode since it may be changed by other threads when
memory growing.
And use `memory->memory_data_size` instead of
`memory->num_bytes_per_page * memory->cur_page_count` to refine
the code.
When ref.func opcode refers to a function whose function index no smaller than
current function, the destination func should be forward-declared: it is declared
in the table element segments, or is declared in the export list.
In multi-threading, this line will eventually call `wasm_cluster_wait_for_all_except_self`:
`DEINIT_VEC(store->instances, wasm_instance_vec_delete)`
As the threads are joining they can call `wasm_interp_dump_call_stack` which tries to
use the module frames but they were already freed by this line:
`DEINIT_VEC(store->modules, wasm_module_vec_delete)`
This PR swaps the order that these are deleted so module is deleted after the instances.
Co-authored-by: Andrew Chambers <ncham@amazon.com>
Try using existing exec_env to execute wasm app's malloc/free func and
execute post instantiation functions. Create a new exec_env only when
no existing exec_env was found.
In some cases, the memory address of some variables may have 4 least significant
bytes set to zero. Because we cast the pointer to int, we look only at 4 least
significant bytes; the assertion may fail because 4 least significant bytes are 0.
Change bh_assert implementation to cast the assert expr to int64_t and it works
well with 64-bit architectures.
POLLRDNORM/POLLWRNORM may be not defined in uClibc, so replace them
with the equivalent POLLIN/POLLOUT.
Refer to https://www.man7.org/linux/man-pages/man2/poll.2.html
POLLRDNORM Equivalent to POLLIN
POLLWRNORM Equivalent to POLLOUT
Signed-off-by: Thomas Devoogdt <thomas.devoogdt@barco.com>
Update wasi-libc version to resolve the hang issue when running wasi-threads cases.
Implement custom sync primitives as a counterpart of `pthread_barrier_wait` to
attempt to replace pthread sync primitives since they seem to cause data races
when running with the thread sanitizer.
Use pre-created exec_env for instantiation and module_malloc/free,
use the same exec_env of the current thread to avoid potential
unexpected behavior.
And remove unnecessary shared_mem_lock in wasm_module_free,
which may cause dead lock.
Use the shared memory's shared_mem_lock to lock the whole atomic.wait and
atomic.notify processes, and use it for os_cond_reltimedwait and os_cond_notify,
so as to make the whole processes actual atomic operations:
the original implementation accesses the wait address with shared_mem_lock
and uses wait_node->wait_lock for os_cond_reltimedwait, which is not an atomic
operation.
And remove the unnecessary wait_map_lock and wait_lock, since the whole
processes are already locked by shared_mem_lock.
`wasi-sdk-20` pre-release can be used to avoid building `wasi-libc` to enable threads.
It's not possible to use `wasi-sdk-20` pre-release on Ubuntu 20.04 because of
incompatibility with the glibc version:
```bash
/opt/wasi-sdk/bin/clang: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
(required by /opt/wasi-sdk/bin/clang)
```
- Remove notify_stale_threads_on_exception and change atomic.wait
to be interruptible by keep waiting and checking every one second,
like the implementation of poll_oneoff in libc-wasi
- Wait all other threads exit and then get wasi exit_code to avoid
getting invalid value
- Inherit suspend_flags of parent thread while creating new thread to
avoid terminated flag isn't set for new thread
- Fix wasi-threads test case update_shared_data_and_alloc_heap
- Add "Lib wasi-threads enabled" prompt for cmake
- Fix aot get exception, use aot_copy_exception instead
Fix a data race for test main_proc_exit_wait.c from #1963.
And fix atomic_wait logic that was wrong before:
- a thread 1 started executing wasm instruction wasm_atomic_wait
but hasn't reached waiting on condition variable
- a main thread calls proc_exit and notifies all the threads that reached
waiting on condition variable
Which leads to thread 1 hang on waiting on condition variable after that
Now it's atomically checked whether proc_exit was already called.
In the WASI thread test modified in this PR, malloc was used in multiple threads
without a lock. But wasi-libc implementation of malloc is not thread-safe.
Remove restrictions:
- Only 1 WASM app at a time
- Only 1 model at a time
- `graph` and `graph-execution-context` are ignored
Refer to previous document:
e8d718096d/core/iwasm/libraries/wasi-nn/README.md
- Implement atomic.fence to ensure a proper memory synchronization order
- Destroy exec_env_singleton first in wasm/aot deinstantiation
- Change terminate other threads to wait for other threads in
wasm_exec_env_destroy
- Fix detach thread in thread_manager_start_routine
- Fix duplicated lock cluster->lock in wasm_cluster_cancel_thread
- Add lib-pthread and lib-wasi-threads compilation to Windows CI