Commit Graph

99 Commits

Author SHA1 Message Date
liang.he
99bbad8cdb
perf profiling: Adjust the calculation of execution time (#3089) 2024-01-26 18:06:21 +08:00
Wenyong Huang
313ce8cb61
Fix memory/table segment checks in memory.init/table.init (#3081)
According to the wasm core spec, the checks for the table segments in
`table.init` opcode are similar to the checks for `memory.init` opcode:
- The size of a passive segment is shrunk to zero after `data.drop`
  (or `elem.drop`) opcode is executed, and the segment can be used to do
  `memory.init` (or `table.init`) again
- The `memory.init` only traps when `s+n > len(data.data)` or `d+n > len(mem.data)`
  and `table.init` only traps when `s+n > len(elem.elem)` or `d+n > len(tab.elem)`
- The active segment can also be used to do `memory.init` (or `table.init`),
  while it behaves like a dropped passive segment

https://github.com/WebAssembly/bulk-memory-operations/blob/master/proposals/bulk-memory-operations/Overview.md
```
Segments can also be shrunk to size zero by using the following new instructions:
- data.drop: discard the data in an data segment
- elem.drop: discard the data in an element segment

An active segment is equivalent to a passive segment, but with an implicit
memory.init followed by a data.drop (or table.init followed by a elem.drop)
that is prepended to the module's start function.
```
ps.
https://webassembly.github.io/spec/core/bikeshed/#-hrefsyntax-instr-memorymathsfmemoryinitx%E2%91%A0
https://webassembly.github.io/spec/core/bikeshed/#-hrefsyntax-instr-tablemathsftableinitxy%E2%91%A0
https://github.com/bytecodealliance/wasm-micro-runtime/issues/3020
2024-01-26 09:45:59 +08:00
liang.he
5c8b8a17a6
Enhancements on wasm function execution time statistic (#2985)
Enhance the statistic of wasm function execution time, or the performance
profiling feature:
- Add os_time_thread_cputime_us() to get the cputime of a thread,
  and use it to calculate the execution time of a wasm function
- Support the statistic of the children execution time of a function,
  and dump it in wasm_runtime_dump_perf_profiling
- Expose two APIs:
  wasm_runtime_sum_wasm_exec_time
  wasm_runtime_get_wasm_func_exec_time

And rename os_time_get_boot_microsecond to os_time_get_boot_us.
2024-01-17 09:51:54 +08:00
YAMAMOTO Takashi
18529253d8
interpreter: Simplify memory.grow a bit (#2899) 2023-12-12 20:24:51 +08:00
Yage Hu
ef0cd22119
Fix memory size not updating after growing in interpreter (#2898)
This commit fixes linear memory size not updating after growing.
This causes `memory.fill` to throw an exception after `memory.grow`.
2023-12-12 08:36:59 +08:00
Maks Litskevich
63696ba603
Fix typo in CI config and suppress STORE_U8 in TSAN (#2802)
This typo prevented sanitizers to work in the CI.
2023-12-11 09:16:30 +08:00
Enrico Loparco
0455071fc1
Access linear memory size atomically (#2834)
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/2804
2023-11-29 20:27:17 +08:00
Huang Qi
0b29904f26
Fix configurable bounds checks typo (#2809) 2023-11-21 17:32:45 +08:00
TianlongLiang
a57e70016a
Fix memory.init opcode issue in fast-interp (#2798)
Fix fast interpreter didn't throw OOB exception correctly in some scenarios.
Reported in #2797.
2023-11-20 16:25:43 +08:00
YAMAMOTO Takashi
562a5dd1b6
Fix data/elem drop (#2747)
Currently, `data.drop` instruction is implemented by directly modifying the
underlying module. It breaks use cases where you have multiple instances
sharing a single loaded module. `elem.drop` has the same problem too.

This PR  fixes the issue by keeping track of which data/elem segments have
been dropped by using bitmaps for each module instances separately, and
add a sample to demonstrate the issue and make the CI run it.

Also add a missing check of dropped elements to the fast-jit `table.init`.

Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/2735
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/2772
2023-11-18 08:50:16 +08:00
YAMAMOTO Takashi
24c4d256b3
Grab cluster->lock when modifying exec_env->module_inst (#2685)
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/2680

And when switching back to the original module_inst, propagate exception if any.

cf. https://github.com/bytecodealliance/wasm-micro-runtime/issues/2512
2023-11-09 18:56:02 +08:00
Wenyong Huang
d6bba13e86
Fix fast-interp "pre-compiled label offset out of range" issue (#2659)
When labels-as-values is enabled in a target which doesn't support
unaligned address access, 16-bit offset is used to store the relative
offset between two opcode labels. But it is a little small and the loader
may report "pre-compiled label offset out of range" error.

Emitting 32-bit data instead to resolve the issue: emit label address in
32-bit target and emit 32-bit relative offset in 64-bit target.

See also:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/2635
2023-10-24 10:47:17 +08:00
Enrico Loparco
00539620e9
Improve stack trace dump and fix coding guideline CI (#2599)
Avoid the stack traces getting mixed up together when multi-threading is enabled
by using exception_lock/unlock in dumping the call stacks.

And remove duplicated call stack dump in wasm_application.c.

Also update coding guideline CI to fix the clang-format-12 not found issue.
2023-09-29 10:52:54 +08:00
YAMAMOTO Takashi
51714c41c0
Introduce WASMModuleInstanceExtraCommon (#2429)
Move the common parts of WASMModuleInstanceExtra and
AOTModuleInstanceExtra into the new structure.
2023-08-08 09:35:29 +08:00
YAMAMOTO Takashi
91592429f4
Fix memory sharing (#2415)
- Inherit shared memory from the parent instance, instead of
  trying to look it up by the underlying module. The old method
  works correctly only when every cluster uses different module.
- Use reference count in WASMMemoryInstance/AOTMemoryInstance
  to mark whether the memory is shared or not
- Retire WASMSharedMemNode
- For atomic opcode implementations in the interpreters, use
  a global lock for now
- Update the internal API users
  (wasi-threads, lib-pthread, wasm_runtime_spawn_thread)

Fixes https://github.com/bytecodealliance/wasm-micro-runtime/issues/1962
2023-08-04 10:18:13 +08:00
Wenyong Huang
59b2099b68
Fix some check issues on table operations (#2392)
Fix some check issues on table.init, table.fill and table.copy, and unify the check method
for all running modes.
Fix issue #2390 and #2096.
2023-07-27 21:53:48 +08:00
Marcin Kolny
0f4edf9735
Implement suspend flags as atomic variable (#2361)
We have observed a significant performance degradation after merging
https://github.com/bytecodealliance/wasm-micro-runtime/pull/1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
2023-07-21 08:27:09 +08:00
YAMAMOTO Takashi
228a3bed53
Fix unused warnings on disable_bounds_checks (#2347) 2023-07-06 15:31:22 +08:00
Huang Qi
18092f86cc
Make memory access boundary check behavior configurable (#2289)
Allow to use `cmake -DWAMR_CONFIGURABLE_BOUNDS_CHECKS=1` to
build iwasm, and then run `iwasm --disable-bounds-checks` to disable the
memory access boundary checks.

And add two APIs:
`wasm_runtime_set_bounds_checks` and `wasm_runtime_is_bounds_checks_enabled`
2023-07-04 16:21:30 +08:00
Wenyong Huang
76be848ec3
Implement the segue optimization for LLVM AOT/JIT (#2230)
Segue is an optimization technology which uses x86 segment register to store
the WebAssembly linear memory base address, so as to remove most of the cost
of SFI (Software-based Fault Isolation) base addition and free up a general
purpose register, by this way it may:
- Improve the performance of JIT/AOT
- Reduce the footprint of JIT/AOT, the JIT/AOT code generated is smaller
- Reduce the compilation time of JIT/AOT

This PR uses the x86-64 GS segment register to apply the optimization, currently
it supports linux and linux-sgx platforms on x86-64 target. By default it is disabled,
developer can use the option below to enable it for wamrc and iwasm(with LLVM
JIT enabled):
```bash
wamrc --enable-segue=[<flags>] -o output_file wasm_file
iwasm --enable-segue=[<flags>] wasm_file [args...]
```
`flags` can be:
    i32.load, i64.load, f32.load, f64.load, v128.load,
    i32.store, i64.store, f32.store, f64.store, v128.store
Use comma to separate them, e.g. `--enable-segue=i32.load,i64.store`,
and `--enable-segue` means all flags are added.

Acknowledgement:
Many thanks to Intel Labs, UC San Diego and UT Austin teams for introducing this
technology and the great support and guidance!

Signed-off-by: Wenyong Huang <wenyong.huang@intel.com>
Co-authored-by: Vahldiek-oberwagner, Anjo Lucas <anjo.lucas.vahldiek-oberwagner@intel.com>
2023-05-26 10:13:33 +08:00
Wenyong Huang
1e5f206464
Fix compile warnings on windows platform (#2208) 2023-05-15 13:48:48 +08:00
Wenyong Huang
5fc48e3584
Fix interpreter read linear memory size for multi-threading (#2088)
Load memory data size in each time memory access boundary check in
multi-threading mode since it may be changed by other threads when
memory growing.

And use `memory->memory_data_size` instead of
`memory->num_bytes_per_page * memory->cur_page_count` to refine
the code.
2023-04-04 09:05:52 +08:00
Wenyong Huang
49d439a3bc
Fix/Simplify the atomic.wait/nofity implementations (#2044)
Use the shared memory's shared_mem_lock to lock the whole atomic.wait and
atomic.notify processes, and use it for os_cond_reltimedwait and os_cond_notify,
so as to make the whole processes actual atomic operations:
the original implementation accesses the wait address with shared_mem_lock
and uses wait_node->wait_lock for os_cond_reltimedwait, which is not an atomic
operation.

And remove the unnecessary wait_map_lock and wait_lock, since the whole
processes are already locked by shared_mem_lock.
2023-03-23 09:21:16 +08:00
Wenyong Huang
f279ba84ee
Fix multi-threading issues (#2013)
- Implement atomic.fence to ensure a proper memory synchronization order
- Destroy exec_env_singleton first in wasm/aot deinstantiation
- Change terminate other threads to wait for other threads in
  wasm_exec_env_destroy
- Fix detach thread in thread_manager_start_routine
- Fix duplicated lock cluster->lock in wasm_cluster_cancel_thread
- Add lib-pthread and lib-wasi-threads compilation to Windows CI
2023-03-08 10:57:22 +08:00
Enrico Loparco
e8d718096d
Add/reorganize locks for thread synchronization (#1995)
Attempt to fix data races when using threads.
- Protect access (from multiple threads) to exception and memory
- Fix shared memory lock usage
2023-03-04 08:15:26 +08:00
Enrico Loparco
52e26e59cf
Add lock to protect the operations of accessing exec env (#1991)
Data race may occur when accessing exec_env's fields, e.g. suspend_flags
and handle. Add lock `exec_env->wait_lock` for them to resolve the issue.
2023-02-27 19:53:41 +08:00
Wenyong Huang
38c67b3f48
thread-mgr: Fix spread "wasi proc exit" exception and atomic.wait issues (#1988)
Raising "wasi proc exit" exception, spreading it to other threads and then
clearing it in all threads may result in unexpected behavior: the sub thread
may end first, handle the "wasi proc exit" exception and clear exceptions
of other threads, including the main thread. And when main thread's
exception is cleared, it may continue to run and throw "unreachable"
exception. This also leads to some assertion failed.

Ignore exception spreading for "wasi proc exit" and don't clear exception
of other threads to resolve the issue.

And add suspend flag check after atomic wait since the atomic wait may
be notified by other thread when exception occurs.
2023-02-24 20:05:39 +08:00
Enrico Loparco
216dc43ab4
Use shared memory lock for threads generated from same module (#1960)
Multiple threads generated from the same module should use the same
lock to protect the atomic operations.

Before this PR, each thread used a different lock to protect atomic
operations (e.g. atomic add), making the lock ineffective.

Fix #1958.
2023-02-16 11:54:19 +08:00
YAMAMOTO Takashi
7d3b2a8773
Make memory profiling show native stack usage (#1917) 2023-02-01 11:52:15 +08:00
Martin Klang
622cdbefd6
Prevent undefined behavior from c_api_func_imports == NULL (#1883)
The module instance's c_api_func_imports may be NULL under some circumstances,
add checks before accessing it.
2023-01-14 07:52:39 +08:00
Wenyong Huang
9d52960e4d
Fix wasm-c-api import func link issue in wasm_instance_new (#1787)
When a wasm module is duplicated instantiated with wasm_instance_new,
the function import info of the previous instantiation may be overwritten by
the later instantiation, which may cause unexpected behavior.

Store the function import info into the module instance to fix the issue.
2022-12-07 16:43:04 +08:00
Wenyong Huang
da7117a092
Refine the stack frame size check in interpreter (#1730)
Limit max_stack_cell_num/max_csp_num to be no larger than UINT16_MAX,
and don't check all_cell_num in interpreter again.

And refine some codes in interpreter.
2022-11-22 15:32:48 +08:00
Wenyong Huang
440bbea81e
Fix interp/fast-jit float min/max issues (#1733) 2022-11-22 09:14:20 +08:00
Xu Jun
032b9aa74b
Fix issue of restoring wasm operand stack (#1721) 2022-11-18 18:51:13 +08:00
Wenyong Huang
7fd37190e8
Add control for the native stack check with hardware trap (#1682)
Add a new options to control the native stack hw bound check feature:
- Besides the original option `cmake -DWAMR_DISABLE_HW_BOUND_CHECK=1/0`,
  add a new option `cmake -DWAMR_DISABLE_STACK_HW_BOUND_CHECK=1/0`
- When the linear memory hw bound check is disabled, the stack hw bound check
   will be disabled automatically, no matter what the input option is
- When the linear memory hw bound check is enabled, the stack hw bound check
  is enabled/disabled according to the value of input option
- Besides the original option `--bounds-checks=1/0`, add a new option
  `--stack-bounds-checks=1/0` for wamrc

Refer to: https://github.com/bytecodealliance/wasm-micro-runtime/issues/1677
2022-11-07 18:26:33 +08:00
Wenyong Huang
a182926a73
Refactor interpreter/AOT module instance layout (#1559)
Refactor the layout of interpreter and AOT module instance:
- Unify the interp/AOT module instance, use the same WASMModuleInstance/
  WASMMemoryInstance/WASMTableInstance data structures for both interpreter
  and AOT
- Make the offset of most fields the same in module instance for both interpreter
  and AOT, append memory instance structure, global data and table instances to
  the end of module instance for interpreter mode (like AOT mode)
- For extra fields in WASM module instance, use WASMModuleInstanceExtra to
  create a field `e` for interpreter
- Change the LLVM JIT module instance creating process, LLVM JIT uses the WASM
  module and module instance same as interpreter/Fast-JIT mode. So that Fast JIT
  and LLVM JIT can access the same data structures, and make it possible to
  implement the Multi-tier JIT (tier-up from Fast JIT to LLVM JIT) in the future
- Unify some APIs: merge some APIs for module instance and memory instance's
  related operations (only implement one copy)

Note that the AOT ABI is same, the AOT file format, AOT relocation types, how AOT
code accesses the AOT module instance and so on are kept unchanged.

Refer to:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/1384
2022-10-18 10:59:28 +08:00
Xu Jun
826cf4f8e1
Fix threads spec test issues (#1586) 2022-10-13 13:53:09 +08:00
Wenyong Huang
ab929c20a3
Add check for code section size, fix interp float operations (#1480)
And enable classic interpreter instead fast interpreter when llvm jit is enabled,
so as to fix the issue that llvm jit cannot handle opcode drop_64/select_64.
2022-09-14 19:49:18 +08:00
FromLiQg
88bb4f3c81
Normalize wasm types (#1378)
Normalize wasm types, for the two wasm types, if their parameter types
and result types are the same, we only save one copy, so as to reduce
the footprint and simplify the type comparison in opcode CALL_INDIRECT.

And fix issue in interpreter globals_instantiate, and remove used codes.
2022-08-18 17:52:02 +08:00
Xu Jun
4b00432c1a
Fix dump call stack issue in interpreter (#1358)
Fix dump call stack issue in interpreter introduced by hw bound check:
the call stack isn't dumped if the exception is thrown and caught by
signal handler.
And restore the wasm stack frame to the original status after calling a
wasm function.
2022-08-08 11:15:30 +08:00
Wenyong Huang
dd62b32b20
Fix interp hw bound check issues (#1322)
Fix build script to enable hw bound check for interpreter when
AOT is disabled, so as to enable spec cases test for interp with
hw bound check. And fix the issues found.
2022-07-23 20:39:01 +08:00
Wenyong Huang
fd5030e02e
Implement interpreter hw bound check (#1309)
Implement boundary check with hardware trap for interpreter on
64-bit platforms:
- To improve the performance of interpreter and Fast JIT
- To prepare for multi-tier compilation for the feature

Linux/MacOS/Windows 64-bit are enabled.
2022-07-22 11:05:40 +08:00
liang.he
0f6e5a55a4
Fix sub module's aux stack info not synchronized to main module issue (#1279)
Sub module's auxiliary stack boundary and bottom may be different from
main module's counterpart, so when calling sub module, its aux stack info
should be gotten and set to exec_env firstly, or aux stack overflow and out
of bounds memory access exception may be thrown when calling sub
module's function.
Fix the issue reported in PR #1278.
2022-07-11 19:42:29 +08:00
Xu Jun
471cac4719
Enable dump call stack to a buffer (#1244)
Enable dump call stack to a buffer, use API
`wasm_runtime_get_call_stack_buf_size` to get the required buffer size
and use API
`wasm_runtime_dump_call_stack_to_buf` to dump call stack to a buffer
2022-06-25 21:38:43 +08:00
Wenyong Huang
d7a2888b18
Fix Windows compilation warnings (#1171)
And update the Zephyr platform sample help, add arc target into usage list.
2022-05-16 09:12:58 +08:00
Xu Jun
98431225f2
Store import function pointer in module instance (#1130)
Fix the issue reported by #1118 , use this approach since it avoids copying
unnecessary static information into instance and reduces the footprint.
2022-04-27 20:21:51 +08:00
Wenyong Huang
adaaf348ed
Refine opcode br_table for classic interpreter (#1112)
Refine opcode br_table for classic interpreter as there may be a lot of
leb128 decoding when the br count is big:
1. Use the bytecode itself to store the decoded leb br depths if each
    decoded depth can be stored with one byte
2. Create br_table cache to store the decode leb br depths if the decoded
    depth cannot be stored with one byte
After the optimization, the class interpreter can access the br depths array
with index, no need to decode the leb128 again.

And fix function record_fast_op() return value unchecked issue in source
debugging feature.
2022-04-23 19:15:55 +08:00
YAMAMOTO Takashi
91222e1e44
interpreter: Fix an UBSan complaint in word_copy (#1106)
Fix an UBSan complaint introduced by recent change by adding more checks
to word_copy:
```
wasm_interp_fast.c:792:9: runtime error: applying zero offset to null pointer
```
2022-04-21 12:21:37 +08:00
Wenyong Huang
d6e781af28
Add more operand stack overflow checks for fast-interp (#1104)
And clear some compile warnings on Windows
2022-04-20 16:19:12 +08:00
Wenyong Huang
d4758d7380
Refine codes and fix several issues (#1094)
Add aot relocation for ".rodata.str" symbol to support more cases
Fix some coding style issues
Fix aot block/value stack destroy issue
Refine classic/fast interpreter codes
Clear compile warning of libc_builtin_wrapper.c in 32-bit platform
2022-04-18 17:33:30 +08:00