In my last post, I described the sync/async problem and explained that the new JavaScript Promise Integration (JSPI) web standard allows us to solve it. I also described the new Pyodide APIs which use JSPI to block for the resolution of awaitables.

This post is focused on implementation details for a C program (e.g., CPython) wanting to use JSPI. In this post, we will focus on C programs written for the wasm32-unknown-unknown target. Such programs cannot use libc or most other C libraries and so are very limited. In the next post, I will discuss the additional problems that one needs to solve in order to use JSPI with the wasm32-unknown-emscripten target (where in particular we may use libc).

Thanks to Antonio Cuni, Guy Bedford, and Gyeongjae Choi for feedback on drafts. Thanks also to my current employer Cloudflare and my former employer Anaconda for paying me to work on this.

Using JSPI in a simple C program

All of these examples are designed to work with NodeJS 24. Node must be started with --experimental-wasm-jspi. (Unfortunately, Node 24 was released just before the JSPI feature gate was removed from v8.)

We’ll start with a basic example of the JSPI API. The full example is here.

Suppose we have an async JavaScript function:

async function awaitFakeFetch(x) {
  console.log("JS: fetching (fake)...");
  await sleep(1000); // simulate a slow "fetch" request
  console.log("JS: fetched (fake)");
  return x + 1;
}

which we want to call from WebAssembly. From C’s perspective, awaitFakeFetch() will return an int. We use it in the following C function. In Pyodide, the function might use Python’s C API to call a synchronous Python function that wants to do async I/O further down the stack.

WASM_EXPORT("fakePyFunc")
void fakePyFunc(int x) {
  logString("About to call awaitFakeFetch");
  int res = awaitFakeFetch(x);
  logString("Got result:");
  logInt(res);
}

We can’t use libc functions like printf because we are compiling to the wasm32-unknown-unknown target. Instead we use custom logString() and logInt() functions which we will need to define in JavaScript.

We compile and link this C file with clang with

clang -I../include -target wasm32-unknown-unknown -Wl,--no-entry -nostdlib -O2 \
    basic.c -o basic.wasm

To instantiate the WebAssembly module, we need JavaScript definitions for the imports logInt and logString and awaitFakeFetch().

function logInt(x) {
  console.log("C:", x);
}

function logString(ptr) {
  let endPtr = ptr;
  // Locate null byte
  while (HEAP[endPtr]) {
      endPtr ++;
  }
  console.log("C:", new TextDecoder().decode(HEAP.slice(ptr, endPtr)));
}

And we need to make an imports object and instantiate the WebAssembly module. To make awaitFakeFetch() into a suspending import, we wrap it with WebAssembly.Suspending().

const imports = {
  env: {
    logInt,
    logString,
    awaitFakeFetch: new WebAssembly.Suspending(awaitFakeFetch),
  },
};

export const { instance } = await WebAssembly.instantiate(
  readFileSync("basic.wasm"),
  imports,
);
const HEAP = new Uint8Array(instance.exports.memory.buffer);

To call fakePyFunc(), we need to wrap the export in WebAssembly.Promising():

export const fakePyFunc = WebAssembly.promising(
  instance.exports.fakePyFunc,
);

And now we call fakePyFunc(3) and it will log:

C: About to call awaitFakeFetch
JS: fetching (fake)...
<pauses for 1000 ms>
JS: fetched (fake)
C: Got result:
C: 4

If you have Node v24 and clang, you can try it out for yourself by cloning http://github.com/hoodmane/jspi-blog-examples/, cding into the 2-basic-example directory, and running:

./build.sh
./basic.mjs

All the other examples can be run in the same way.

A separate function to suspend

We can separate the function that blocks from the original promise-returning function. This lets us schedule multiple promises from C and only later block for them.

Suppose now we have two async operations say asyncHttpRequest() and asyncDbQuery(). They will return promises. In C, the return type will be __externref_t which is an opaque reference to a JavaScript object. The only operations allowed on them are assignment and calling functions. Attempting to add them, dereference them, take their address, use them as struct fields or pass them as arguments to a varargs function all will result in compile errors. The only thing we can do with these __externref_t promises is call awaitInt() to suspend for the integers they resolve to.

WASM_EXPORT("fakePyFunc")
void fakePyFunc(int x) {
  logString("Call fakeAsyncHttpRequest");
  __externref_t promiseHttpRequest = fakeAsyncHttpRequest(x);
  logString("Call fakeAsyncDbQuery");
  __externref_t promiseDbQuery = fakeAsyncDbQuery(x);

  logString("Suspending for promiseHttpRequest");
  int res1 = awaitInt(promiseHttpRequest);
  logString("-- got res1:");
  logInt(res1);

  logString("Suspending for promiseDbQuery");
  int res2 = awaitInt(promiseDbQuery);
  logString("Got res2:");
  logInt(res2);
}

Our JavaScript imports are then:

function awaitInt(promise) {
  // This is just the identity function...
  // We need it so we can wrap it with WebAssembly.Suspending
  return promise;
}

async function fakeAsyncHttpRequest(x) {
  console.log("JS: fakeAsyncHttpRequest: sleeping");
  await sleep(1000);
  console.log("JS: fakeAsyncHttpRequest: slept");
  return x + 1;
}

async function fakeAsyncDbQuery(x) {
  console.log("JS: fakeAsyncDbQuery: sleeping");
  await sleep(2000);
  console.log("JS: fakeAsyncDbQuery: slept");
  return x * x;
}

Only awaitInt needs to be a Suspending import, the async functions just return promises (represented as __externref_t in C).

const imports = {
  env: {
    logInt,
    logString,
    fakeAsyncHttpRequest,
    fakeAsyncDbQuery,
    awaitInt: new WebAssembly.Suspending(awaitInt),
  },
};

We need the same boilerplate as before to instantiate the WebAssembly module and wrap fakePyFunc() with WebAssembly.promising(). And now we can call fakePyFunc(4) and it will log:

C: Call fakeAsyncHttpRequest
JS: fakeAsyncHttpRequest: sleeping
C: Call fakeAsyncDbQuery
JS: fakeAsyncDbQuery: sleeping
C: Suspending for promiseHttpRequest
JS: fakeAsyncHttpRequest: slept
C: -- got res1:
C: 5
C: Suspending for promiseDbQuery
JS: fakeAsyncDbQuery: slept
C: Got res2:
C: 16

To handle more general promises that don’t necessarily resolve to an int, we could use an awaitExternRef() function where the return value is an __externref_t. Then we could use a separate externRefToInt() function to convert the result to an integer.

This example is here.

Troubles with reentrancy

JSPI handles switching the native WebAssembly call stack. However, the native WebAssembly stack is opaque – it is not possible to create pointers to data stored on it.

For this reason, Clang implements the C stack using a combination of the native WebAssembly stack and a “spill stack” in linear memory which the WebAssembly VM knows nothing about. Since the spill stack is in WebAssembly linear memory, it is addressable. Any value that we need to take a pointer to will go in the spill stack. JSPI only handles switching the native WebAssembly stack. Unless we handle the spill stack ourselves, it will go out of sync.

For example, consider the following C code:

WASM_IMPORT("sleep")
void sleep(int);

// Escape is a no-op function to ensure that spill stack space is actually
// allocated. Without this, clang will optimize out stack operations.
WASM_IMPORT("escape")
void escape(void* x);

WASM_EXPORT("allocateOnStackAndSleep")
void allocateOnStackAndSleep() {
  // Allocate 4 bytes on stack. (The stack is always required to be aligned to
  // 16 bytes so we'll bump the stack pointer by 16.)
  int x[] = {7};
  // Force the compiler to store x on the spill stack
  escape(x);
  // Let victim allocate its stack space
  sleep(0);
  // Now we will reset the stack pointer in the epilogue
}

The function allocateOnStackAndSleep() will be compiled to code that looks like the following in the WebAssembly text format:


  (func $allocateOnStackAndSleep
    (local $x i32)
    ;; int x[] = {7};
    ;; allocate 16 bytes on the stack
    ;; we only need 4 but the stack pointer must always be aligned to 16
    global.get $__stack_pointer
    i32.const 16
    i32.sub
    ;; store the current stack pointer into x and __stack_pointer
    local.tee $x
    global.set $__stack_pointer
    ;; initialize the list: x[0] = 7
    local.get $x
    i32.const 7
    i32.store offset=0
    ;; Call escape(x);
    local.get $x
    call $escape
    ;; Call sleep(0);
    i32.const 0
    call $sleep
    ;; Epilogue: restore stack pointer
    local.get $x
    i32.const 16
    i32.add
    global.set $__stack_pointer
  )
  

If sleep() stack switches, then we could call another victim() function that allocates its own variables on the stack. If victim() also stack switches, then allocateOnStackAndSleep() will exit and reset the stack pointer, deallocating stack space that victim() is still using. Calling a third overwritesVictimsStack() function that allocates on the stack after allocateOnStackAndSleep() exits and before victim() resumes would then overwrite victim()’s stack space.

The victim() function looks as follows:

WASM_EXPORT("victim")
void victim() {
  // Allocate our string on stack below the 16 bytes allocated by
  // `sleepsToResetStackPointer()`
  char x[] = "victim's important string";
  escape(x);
  logStrStr("victim1", x);
  // While we're sleeping, `allocateOnStackAndSleep()` exits and sets the
  // stack pointer above us. Then `overwritesVictimsStack()` writes over our
  // stack space.
  sleep(500);
  // This next line will print a different value!
  logStrStr("victim2", x);
}

All overwritesVictimsStack() needs to do is write data to the stack:

WASM_EXPORT("overwritesVictimsStack")
void overwritesVictimsStack(void) {
  char x[] = "this is a long string and it will write over lots of other stuff!";
  escape(x);
}

Escape is a no-op:

// Does nothing, just forces variables to be allocated on stack
function escape(ptr) { }

As usual we have to define our imports, compile and instantiate the WebAssembly module. sleep() is our only suspending import. We need to wrap allocateOnStackAndSleep() and victim() in Webassembly.promising() since they stack switch:

const allocateOnStackAndSleep = WebAssembly.promising(
  instance.exports.allocateOnStackAndSleep,
);
const victim = WebAssembly.promising(instance.exports.victim);
const overwritesVictimsStack = instance.exports.overwritesVictimsStack;

Last, we need to call the functions in the appropriate order:

// allocates 16 bytes on stack
const pResetStack = allocateOnStackAndSleep();
// allocates more data on stack below `allocateOnStackAndSleep()`.
const pVictim = victim();
// Resets stack pointer
await pResetStack;
overwritesVictimsStack();
await pVictim

Running this prints:

victim1 my important string
victim2 l write over lots of other stuff!

This example is here.

The simplest fix for reentrancy

We can fix the problem by redefining sleep() to save the region of stack that the sleeping thread cares about and restore it when we are done sleeping. We need to record the top of the stack when each thread enters:

let stackTop;
function promisingExport(func) {
  const promisingFunc = WebAssembly.promising(func);
  return async function(...args) {
    stackTop = stackPointer.value;
    return await promisingFunc(...args);
  }
}

We use this to wrap the exports that stack switch:

const allocateOnStackAndSleep = promisingExport(
  instance.exports.allocateOnStackAndSleep,
);
const victim = promisingExport(instance.exports.victim);

When a thread sleeps we save the stack pointer and the range of stack that we care about. When the thread is restored, we can restore all this. The new sleep() function looks as follows:

async function sleep(ms) {
  // Save
  const curStackTop = stackTop;
  const curStackBottom = stackPointer.value;
  const savedMemory = HEAP.slice(curStackBottom, curStackTop);
  // Suspend
  await new Promise(res => setTimeout(res, ms));
  // Restore the stack
  HEAP.subarray(curStackBottom, curStackTop).set(savedMemory);
  stackPointer.value = curStackBottom;
  stackTop = curStackTop;
}

In a more general case, we would run this code around awaitInt() or awaitExternRef().

This example is here: here.

This code isn’t efficient because we eagerly copy the stack. If no other task writes over this stack space while we are suspended, then we don’t need to copy it. The most efficient way to do this is to make an object that records the range that our thread cares about and a buffer with just the data that has actually been overwritten. When a thread is restored, it will evict the data of any other threads that care about the stack range and restore any of its own data that has been evicted. The code that handles this is a bit more complicated so I won’t explain it here, but you can find a complete working example here: here

Conclusion

Using JSPI as a WebAssembly developer is unfortunately quite hard and there is still limited toolchain support for it. The WebAssembly native stack is automatically switched but separate work must be done to keep the linear memory stack in sync. Using an appropriate data structure we can do this efficiently. It will also only work with C code that is itself thread safe.

There are substantial further difficulties in integrating JSPI into a real program that uses libc. I will discuss them in my next post.

Despite all these implementation difficulties, the capabilities that JSPI enables are so powerful that it is worth the effort.