In my last post, I described the sync/async problem and explained that the new JavaScript Promise Integration (JSPI) web standard allows us to solve it. I also described the new Pyodide APIs which use JSPI to block for the resolution of awaitables.
This post is focused on implementation details for a C program (e.g., CPython)
wanting to use JSPI. In this post, we will focus on C programs written for the
wasm32-unknown-unknown
target. Such programs cannot use libc or most other C
libraries and so are very limited. In the next post, I will discuss the
additional problems that one needs to solve in order to use JSPI with the
wasm32-unknown-emscripten
target (where in particular we may use libc).
Thanks to Antonio Cuni, Guy Bedford, and Gyeongjae Choi for feedback on drafts. Thanks also to my current employer Cloudflare and my former employer Anaconda for paying me to work on this.
Using JSPI in a simple C program
All of these examples are designed to work with NodeJS 24. Node must be started
with --experimental-wasm-jspi
. (Unfortunately, Node 24 was released just
before the JSPI feature gate was removed from v8.)
We’ll start with a basic example of the JSPI API. The full example is here.
Suppose we have an async JavaScript function:
async function awaitFakeFetch(x) {
console.log("JS: fetching (fake)...");
await sleep(1000); // simulate a slow "fetch" request
console.log("JS: fetched (fake)");
return x + 1;
}
which we want to call from WebAssembly. From C’s perspective, awaitFakeFetch()
will return an int
. We use it in the following C function. In Pyodide, the
function might use Python’s C API to call a synchronous Python function that
wants to do async I/O further down the stack.
WASM_EXPORT("fakePyFunc")
void fakePyFunc(int x) {
logString("About to call awaitFakeFetch");
int res = awaitFakeFetch(x);
logString("Got result:");
logInt(res);
}
We can’t use libc functions like printf
because we are compiling to the
wasm32-unknown-unknown
target. Instead we use custom logString()
and
logInt()
functions which we will need to define in JavaScript.
We compile and link this C file with clang with
clang -I../include -target wasm32-unknown-unknown -Wl,--no-entry -nostdlib -O2 \
basic.c -o basic.wasm
To instantiate the WebAssembly module, we need JavaScript definitions for the
imports logInt
and logString
and awaitFakeFetch()
.
function logInt(x) {
console.log("C:", x);
}
function logString(ptr) {
let endPtr = ptr;
// Locate null byte
while (HEAP[endPtr]) {
endPtr ++;
}
console.log("C:", new TextDecoder().decode(HEAP.slice(ptr, endPtr)));
}
And we need to make an imports
object and instantiate the WebAssembly module.
To make awaitFakeFetch()
into a suspending import, we wrap it with
WebAssembly.Suspending()
.
const imports = {
env: {
logInt,
logString,
awaitFakeFetch: new WebAssembly.Suspending(awaitFakeFetch),
},
};
export const { instance } = await WebAssembly.instantiate(
readFileSync("basic.wasm"),
imports,
);
const HEAP = new Uint8Array(instance.exports.memory.buffer);
To call fakePyFunc()
, we need to wrap the export in WebAssembly.Promising()
:
export const fakePyFunc = WebAssembly.promising(
instance.exports.fakePyFunc,
);
And now we call fakePyFunc(3)
and it will log:
C: About to call awaitFakeFetch
JS: fetching (fake)...
<pauses for 1000 ms>
JS: fetched (fake)
C: Got result:
C: 4
If you have Node v24 and clang, you can try it out for yourself by cloning
http://github.com/hoodmane/jspi-blog-examples/, cding into the 2-basic-example
directory, and running:
./build.sh
./basic.mjs
All the other examples can be run in the same way.
A separate function to suspend
We can separate the function that blocks from the original promise-returning function. This lets us schedule multiple promises from C and only later block for them.
Suppose now we have two async operations say asyncHttpRequest()
and
asyncDbQuery()
. They will return promises. In C, the return type will be
__externref_t
which is an opaque reference to a JavaScript object. The only
operations allowed on them are assignment and calling functions. Attempting to
add them, dereference them, take their address, use them as struct fields or
pass them as arguments to a varargs function all will result in compile errors.
The only thing we can do with these __externref_t
promises is call
awaitInt()
to suspend for the integers they resolve to.
WASM_EXPORT("fakePyFunc")
void fakePyFunc(int x) {
logString("Call fakeAsyncHttpRequest");
__externref_t promiseHttpRequest = fakeAsyncHttpRequest(x);
logString("Call fakeAsyncDbQuery");
__externref_t promiseDbQuery = fakeAsyncDbQuery(x);
logString("Suspending for promiseHttpRequest");
int res1 = awaitInt(promiseHttpRequest);
logString("-- got res1:");
logInt(res1);
logString("Suspending for promiseDbQuery");
int res2 = awaitInt(promiseDbQuery);
logString("Got res2:");
logInt(res2);
}
Our JavaScript imports are then:
function awaitInt(promise) {
// This is just the identity function...
// We need it so we can wrap it with WebAssembly.Suspending
return promise;
}
async function fakeAsyncHttpRequest(x) {
console.log("JS: fakeAsyncHttpRequest: sleeping");
await sleep(1000);
console.log("JS: fakeAsyncHttpRequest: slept");
return x + 1;
}
async function fakeAsyncDbQuery(x) {
console.log("JS: fakeAsyncDbQuery: sleeping");
await sleep(2000);
console.log("JS: fakeAsyncDbQuery: slept");
return x * x;
}
Only awaitInt
needs to be a Suspending
import, the async functions just
return promises (represented as __externref_t
in C).
const imports = {
env: {
logInt,
logString,
fakeAsyncHttpRequest,
fakeAsyncDbQuery,
awaitInt: new WebAssembly.Suspending(awaitInt),
},
};
We need the same boilerplate as before to instantiate the WebAssembly module and
wrap fakePyFunc()
with WebAssembly.promising()
. And now we can call
fakePyFunc(4)
and it will log:
C: Call fakeAsyncHttpRequest
JS: fakeAsyncHttpRequest: sleeping
C: Call fakeAsyncDbQuery
JS: fakeAsyncDbQuery: sleeping
C: Suspending for promiseHttpRequest
JS: fakeAsyncHttpRequest: slept
C: -- got res1:
C: 5
C: Suspending for promiseDbQuery
JS: fakeAsyncDbQuery: slept
C: Got res2:
C: 16
To handle more general promises that don’t necessarily resolve to an int
, we
could use an awaitExternRef()
function where the return value is an
__externref_t
. Then we could use a separate externRefToInt()
function to
convert the result to an integer.
This example is here.
Troubles with reentrancy
JSPI handles switching the native WebAssembly call stack. However, the native WebAssembly stack is opaque – it is not possible to create pointers to data stored on it.
For this reason, Clang implements the C stack using a combination of the native WebAssembly stack and a “spill stack” in linear memory which the WebAssembly VM knows nothing about. Since the spill stack is in WebAssembly linear memory, it is addressable. Any value that we need to take a pointer to will go in the spill stack. JSPI only handles switching the native WebAssembly stack. Unless we handle the spill stack ourselves, it will go out of sync.
For example, consider the following C code:
WASM_IMPORT("sleep")
void sleep(int);
// Escape is a no-op function to ensure that spill stack space is actually
// allocated. Without this, clang will optimize out stack operations.
WASM_IMPORT("escape")
void escape(void* x);
WASM_EXPORT("allocateOnStackAndSleep")
void allocateOnStackAndSleep() {
// Allocate 4 bytes on stack. (The stack is always required to be aligned to
// 16 bytes so we'll bump the stack pointer by 16.)
int x[] = {7};
// Force the compiler to store x on the spill stack
escape(x);
// Let victim allocate its stack space
sleep(0);
// Now we will reset the stack pointer in the epilogue
}
The function allocateOnStackAndSleep()
will be compiled to code that looks
like the following in the WebAssembly text format:
(func $allocateOnStackAndSleep
(local $x i32)
;; int x[] = {7};
;; allocate 16 bytes on the stack
;; we only need 4 but the stack pointer must always be aligned to 16
global.get $__stack_pointer
i32.const 16
i32.sub
;; store the current stack pointer into x and __stack_pointer
local.tee $x
global.set $__stack_pointer
;; initialize the list: x[0] = 7
local.get $x
i32.const 7
i32.store offset=0
;; Call escape(x);
local.get $x
call $escape
;; Call sleep(0);
i32.const 0
call $sleep
;; Epilogue: restore stack pointer
local.get $x
i32.const 16
i32.add
global.set $__stack_pointer
)
If sleep()
stack switches, then we could call another victim()
function that
allocates its own variables on the stack. If victim()
also stack switches,
then allocateOnStackAndSleep()
will exit and reset the stack pointer,
deallocating stack space that victim()
is still using. Calling a third
overwritesVictimsStack()
function that allocates on the stack after
allocateOnStackAndSleep()
exits and before victim()
resumes would then
overwrite victim()
’s stack space.
The victim()
function looks as follows:
WASM_EXPORT("victim")
void victim() {
// Allocate our string on stack below the 16 bytes allocated by
// `sleepsToResetStackPointer()`
char x[] = "victim's important string";
escape(x);
logStrStr("victim1", x);
// While we're sleeping, `allocateOnStackAndSleep()` exits and sets the
// stack pointer above us. Then `overwritesVictimsStack()` writes over our
// stack space.
sleep(500);
// This next line will print a different value!
logStrStr("victim2", x);
}
All overwritesVictimsStack()
needs to do is write data to the stack:
WASM_EXPORT("overwritesVictimsStack")
void overwritesVictimsStack(void) {
char x[] = "this is a long string and it will write over lots of other stuff!";
escape(x);
}
Escape is a no-op:
// Does nothing, just forces variables to be allocated on stack
function escape(ptr) { }
As usual we have to define our imports, compile and instantiate the WebAssembly
module. sleep()
is our only suspending import. We need to wrap
allocateOnStackAndSleep()
and victim()
in Webassembly.promising()
since they
stack switch:
const allocateOnStackAndSleep = WebAssembly.promising(
instance.exports.allocateOnStackAndSleep,
);
const victim = WebAssembly.promising(instance.exports.victim);
const overwritesVictimsStack = instance.exports.overwritesVictimsStack;
Last, we need to call the functions in the appropriate order:
// allocates 16 bytes on stack
const pResetStack = allocateOnStackAndSleep();
// allocates more data on stack below `allocateOnStackAndSleep()`.
const pVictim = victim();
// Resets stack pointer
await pResetStack;
overwritesVictimsStack();
await pVictim
Running this prints:
victim1 my important string
victim2 l write over lots of other stuff!
This example is here.
The simplest fix for reentrancy
We can fix the problem by redefining sleep()
to save the region of stack that
the sleeping thread cares about and restore it when we are done sleeping. We
need to record the top of the stack when each thread enters:
let stackTop;
function promisingExport(func) {
const promisingFunc = WebAssembly.promising(func);
return async function(...args) {
stackTop = stackPointer.value;
return await promisingFunc(...args);
}
}
We use this to wrap the exports that stack switch:
const allocateOnStackAndSleep = promisingExport(
instance.exports.allocateOnStackAndSleep,
);
const victim = promisingExport(instance.exports.victim);
When a thread sleeps we save the stack pointer and the range of stack that we
care about. When the thread is restored, we can restore all this. The new
sleep()
function looks as follows:
async function sleep(ms) {
// Save
const curStackTop = stackTop;
const curStackBottom = stackPointer.value;
const savedMemory = HEAP.slice(curStackBottom, curStackTop);
// Suspend
await new Promise(res => setTimeout(res, ms));
// Restore the stack
HEAP.subarray(curStackBottom, curStackTop).set(savedMemory);
stackPointer.value = curStackBottom;
stackTop = curStackTop;
}
In a more general case, we would run this code around awaitInt()
or
awaitExternRef()
.
This example is here: here.
This code isn’t efficient because we eagerly copy the stack. If no other task writes over this stack space while we are suspended, then we don’t need to copy it. The most efficient way to do this is to make an object that records the range that our thread cares about and a buffer with just the data that has actually been overwritten. When a thread is restored, it will evict the data of any other threads that care about the stack range and restore any of its own data that has been evicted. The code that handles this is a bit more complicated so I won’t explain it here, but you can find a complete working example here: here
Conclusion
Using JSPI as a WebAssembly developer is unfortunately quite hard and there is still limited toolchain support for it. The WebAssembly native stack is automatically switched but separate work must be done to keep the linear memory stack in sync. Using an appropriate data structure we can do this efficiently. It will also only work with C code that is itself thread safe.
There are substantial further difficulties in integrating JSPI into a real program that uses libc. I will discuss them in my next post.
Despite all these implementation difficulties, the capabilities that JSPI enables are so powerful that it is worth the effort.