Pyodide v0.27.0 is out. This release was focused on improving the long-term stability of Pyodide.
Welcome Agriya Khetarpal to the Pyodide team
Agriya Khetarpal has joined as a new maintainer. Agriya has been active in the Scientific Python area and is a contributor to NumPy, SciPy, and scikit-learn. He has already significantly strengthened Pyodide’s support for various Scientific Python packages.
Build System Improvements
Decoupling pyodide-build
from Pyodide runtime
pyodide-build
is a tool that builds Python packages to run in Pyodide.
Previously, the version of pyodide-build
was strongly coupled to the version
of Pyodide, meaning that if you wanted to build a package against a specific
version of Pyodide, you had to use the corresponding version of pyodide-build
.
The problem with this approach was that even if we improved the build system,
downstream users would have to wait for the next Pyodide release to use it.
In this release, we have separated pyodide-build
from the Pyodide runtime.
This allows us to develop and release pyodide-build
independently of Pyodide.
pyodide-build
is now developed in pyodide/pyodide-build.
You can install it with pip install pyodide-build
.
Recent versions of pyodide-build
work with Pyodide 0.26 and higher. For
example, you can use pyodide-build
version 0.29.2 to build packages for use
with Pyodide 0.26.4:
pip install pyodide-build==0.29.2
pyodide xbuildenv install 0.26.4
pyodide build <...>
Wheels for new packages
This release includes about twenty new packages, most notably popular data science packages PyArrow, Polars, and DuckDB.
Previously, we have built all packages in our CI from within the repository. The increasing number of packages with long build times have been a strain on our CI resources. For instance, PyArrow takes tens of minutes to build. Now, several of these packages are built in a separate repository and managed by the package maintainers. In Pyodide 0.28.0, we are planning to unvendor the packages from the Pyodide runtime entirely. This will pave the way to supporting many more packages.
Update to NumPy 2.0
We have updated NumPy to version 2.0.2. This is a major update that required updates to many downstream packages in the scientific Python ecosystem. Please see Ecosystem compatibility with numpy 2.0 for more information.
Performance improvements to foreign function interface
We’ve long prioritized making the foreign function comprehensive and correct over making it fast. At this point we no longer get many bug reports or feature requests for the foreign function interface. However, for some use cases the foreign function interface is inconvienently slow. We made several improvements.
The first improvement we made was to getattr
on a JsProxy
. Each JsProxy
has a Python dictionary in addition to the JavaScript object that it holds. When
someone accesses an attribute on the JsProxy
we have to first look it up on
the dictionary. If we don’t find the attribute on the dictionary, we then look
up the attribute on the JavaScript object. Most attributes are found in the
second lookup on the JavaScript object. On this expected codepath, the failed
Python lookup raises an AttributeError
which we catch. In a successful lookup,
between 43% and 76% of the execution time was spent formatting the message for
the AttributeError
. We now avoid creating the AttributeError
in the first place
which prevents wasting time formatting an error message that we will throw away.
The second improvement we made was to optimize away temporary bound methods. To execute code like
a.f()
Python first looks up a.f
and then calls the result. This is translated into
bytecode like:
LOAD a
LOAD_ATTR f
CALL
The function a.f
receives a
as the first self
argument, so it needs to be
a special bound method object that knows the correct value for the self
argument. However, we only use this object once to call it and then we throw it
away. We can avoid allocating and destroying an object when we call a method by
calling type(a).f(a)
. The LOAD_ATTR
opcode has a special argument that
indicates that the next opcode is going to be CALL
and in that case calls a
method named _PyObject_GetMethod
instead of the typical PyObject_GetAttr
to
perform the attribute lookup. We patched _PyObject_GetMethod
to have special
handling for JsProxy
objects so that we could optimize away the temporary for
JS objects too.
It would be possible to also improve other opcode sequences like
LOAD a
LOAD_ATTR x
LOAD_ATTR y
but in typical Python code, LOAD_ATTR
does need create a temporary so it would
require a more invasive patch to the Python interpreter.
What’s next?
We plan to upgrade to Python 3.13 in our next release. In addition, we intend to make the following changes:
Package unvendoring
We will unvendor the package recipes from the main Pyodide repository. The package index will gain a separate release process from the Pyodide runtime itself. This should have several significant benefits:
- It will reduce CI usage due to rebuilding packages less often.
- People who add a package will not have to wait as long until their package is included in our index.
Upstream work in Emscripten and Python
We are working on
restoring Emscripten to a tier 3 supported target for Python.
As a part of this work, we are upstreaming
a large number of fixes to the Emscripten file system.
For example, it now works to seek on /dev/null
(it
does nothing), symlink support is much better, it will work to stat
a file
descriptor which doesn’t point to a named file, and many file system system
calls have more posix-compliant error handling. Most of these changes have not
made their way into Pyodide yet because we’re still using Emscripten 3.1.58 for
ABI compatibility.
Wasm Exception Handling
Emscripten supports two different stack unwinding ABIs for C++ exceptions, Rust panics, and setjmp/longjmp, a legacy ABI based on JavaScript exception handling, and a newer ABI based on WebAssembly Exception Handling. We use the JavaScript Exception handling but hope to switch to WebAssembly exception handling. This will lead to faster, smaller code, and fewer bugs. It also eliminates a lot of complexity in the stack switching support code – it is impossible to stack switch through JavaScript frames and by default every C++ try block introduces a JavaScript frame. This requires upstream work on the Rust compiler and switching to a custom build of the Rust standard library.
Acknowledgements
Thanks to Agriya Khetarpal, Loïc Estève, and Ralf Gommers for their work helping ensure that packages in the Scientific Python ecosystem are well supported in Pyodide.
Thanks to Joe Marshall and George Stagg for their contributions towards PyArrow and Polars support respectively, and to DuckDB maintainers for getting DuckDB to work in Pyodide.
Additionally, we appreciate the continued support from the Emscripten team.
The following people committed to Pyodide in this release:
Agriya Khetarpal, Andrei V. Plamada, Andrew Moon, Bart Broere, Carlo Piovesan, Castedo Ellerman, Chris Pyles, Christian Clauss, Deepak Cherian, Eli Lamb, Em Zhan, Eric Brown, Gyeongjae Choi, Hanno Rein, Henry Schreiner, Hood Chatham, Ian Thomas, JHM Darbyshire, James J Balamuta, James Lamb, Jiefu7, Joe Marshall, Joel Ostblom, Juniper Tyree, Kellen Malek, Kyle Barron, Loïc Estève, Luiz Irber, M Bussonnier, Maarten Breddels, Marco Edward Gorelli, Marianne Corvellec, Muspi Merol, Myles Scolnick, Nick Altmann, Olivier Grisel, Oscar Benjamin, Péter Gyarmati, Phillip Cloud, Riya Sinha, Szabolcs Dombi, Victor Blomqvist, YISH, Yan Wong, Zsolt Dollenstein, airen1986, chrysn, josephrocca, swnf