Multiple threads generated from the same module should use the same lock to protect the atomic operations. Before this PR, each thread used a different lock to protect atomic operations (e.g. atomic add), making the lock ineffective. Fix #1958.