Pyodide is a distribution of CPython for WebAssembly which was started by Michael Droettboom at Mozilla. Today, Pyodide is an independent open source project.
Supporting libraries in Pyodide requires packaging all of them, along with their dependencies. Unfortunately, the 32-bit WebAssembly specification limits the amount of memory that can be used, and if packages were to use it all, it would be hard to make some space for datasets. Moreover, it can take a long time to download and compile packages when they are first used within the web browser.
To make an effort in this direction – we aimed to reduce the size of the Matplotlib library.
The part of the Matplotlib architecture responsible for defining how pixels are displayed on the screen is called a renderer. Fortunately, Matplotlib allows overriding the default implementation with a custom renderer. Currently, the version of Matplotlib shipped with Pyodide includes a wasm build of the Agg renderer, which is based on the Anti-Grain Geometry (AGG) library. It’s the Agg renderer that draws the plot and an image of that plot is pasted (by extracting the underlying buffer data) onto the web document inside a canvas.
However, web browsers already include tools to render graphics such as the <canvas> element and WebGL. Writing a new backend renderer based on these would enable us to reduce the size and memory footprint of the final build and possibly use GPU acceleration along with giving us the ability to use locally installed fonts with web fonts. In a Google Summer of Code project for 2019, we experimented with rendering graphics directly from Matplotlib using the Canvas API.
Implementing the new Renderer
The Matplotlib library has a layered structure (Scripting Layer, Artist Layer, Backend Layer) and implementing a new renderer requires tinkering with the Backend layer – which is the lowest in the stack.
In essence, it requires us to re-implement some functions that define how primitive stuff is rendered:
- draw_path: for drawing lines, curves (both quadratic and bezier), etc.
[when plt.plot(), or plt.scatter() is called, etc.]
- draw_image: for rendering images [when plt.imshow() is called, etc.]
- draw_text: for drawing headings, axis markings, etc. along with mathematical text
- draw_markers: for drawing axis ticks
In addition to the above, styles and various properties such as join-style, cap-style, line-width, dashes, etc also need to be set.
- Rendering Text:
The Matplotlib library provides functionality for looking up fonts that it ships with. Unfortunately, these fonts are not accessible to the web browser to draw onto a canvas. We need to provide a way to load custom fonts that the browser can use. Thus, a simple assignment like the one below doesn’t work.
from js import document canvas_element = document.getElementById("canvas") ctx = canvas_element.getContext("2d") ctx.font = “30px cmr10”
The cmr10 font is available inside Matplotlib’s virtual file system but is not accessible to the browser and thus, the <canvas> element cannot use it and falls back to using a default placeholder font.
However, as easy as it sounds, keeping track of when the font is loaded (so as to request a redraw) is not so trivial. The FontFace API is asynchronous and relies on the network while draw_text() in matplotlib is synchronous – making things harder due to the following problems:
Matplotlib can call “draw_text()” multiple times in a single drawing instance – each of them requesting the “correct” font to be loaded. It’s possible that a previous request for asynchronously loading the correct font isn’t completed yet (still fetching from the network), and a new request has been made at that time. This creates the problem of multiple font loading due to queued and concurrent invocations of the
FontFace API and affects performance.
Drawing a plot from the Matplotlib library leads to calling “draw_text()” which leads us to using the FontFace API to asynchronously load the font which leads us to request a redraw which further calls “draw_text()” and leads us to calling the FontFace API again which then requests the redraw again…
In short, it’s an infinite recursion of
load font → redraw → load font → redraw → ....
Fortunately, both of these can be fixed using a global state which keeps track of what fonts have already been loaded into the browser’s environment. This way, we only load fonts that are not present in this state.
- Rendering Images and Transparent Pixels:
Images are essentially represented as multi-dimensional arrays and can be manipulated by the NumPy library on the Python side. The <canvas> element however needs the ImageData object.
# im: (N, M, 4) array-like of np.uint8 i.e. an array of RGBA pixels. import numpy as np from js import document, ImageData from pyodide import create_proxy # converting numpy array to an ImageData object # through Uint8ClampedArray representation h, w, d = im.shape im = np.ravel(np.uint8(np.reshape(im, (h * w * d, -1)))).tobytes() pixels_proxy = create_proxy(im) pixels_buf = pixels_proxy.getBuffer("u8clamped") img_data = ImageData.new(pixels_buf.data, w, h) # rendering the ImageData object onto a canvas element canvas_element = document.createElement("canvas") canvas_element.width = w canvas_element.height = h ctx = canvas_element.getContext("2d") ctx.putImageData(img_data, 0, 0) # clean-up pixels_proxy.destroy() pixels_buf.release()
Image-data which is pasted doesn’t care about transformations. Thus, even if the canvas is transformed using operations such as rotate(), scale(), translate(), setTransform(). The resulting image would appear the same as if the canvas was not transformed at all.
If for some reason, the ImageData object contains some transparent pixels, and the canvas already has a red background, the transparent pixels will replace the red pixels which is not what we usually expect. We usually expect transparent pixels to blend with the background i.e. the pixels remain red at positions where pixels of the incoming ImageData object are transparent. But, this is not what happens, the previous pixels are simply over-written.
To overcome the above 2 issues, we use the “drawImage()” function in addition to the “putImageData()” function. The “drawImage()” function doesn’t suffer from the above two issues but it expects another canvas element as it’s input instead of an ImageData object. Thus, to use them in tandem we require something known as an In-memory canvas.
An In-memory canvas is a normal <canvas> element which is used for off-screen rendering which is about rendering content somewhere, but the screen. That “somewhere” means the memory. Thus, we render graphics to the memory.
To achieve this, we create a <canvas> element, but we do not link it to the DOM and thus its content won’t be visualized onto the screen. Once we have rendered something to the off-screen canvas, it’s content can be used in another canvas element (which will NOT be off-screen and will be linked to the DOM).
To make this possible — the function putImageData() is used for off-screen rendering. This is used to put an ImageData object into an off-screen canvas. Once that is done, the on-screen canvas can now use the drawImage() function — to copy the contents of the off-screen canvas and render that stuff to the screen.
While the long process of rendering to off-screen and then transferring its data to on-screen using “drawImage()” seems like overkill, it helps us avoid the 2 issues regarding transformations and transparency we discussed above.
With the above major issues fixed, the new backend is ready. Here are some sample plots (taken from the Matplotlib gallery) that are rendered using the new canvas based backend.
To use the <canvas> backend in your own projects, please use the following statements at the top of your script.
import matplotlib matplotlib.use("module://matplotlib.backends.html5_canvas_backend")
You can find a more complete example of plotting with matplotlib WASM backend on JSFiddle.
How Fast is it? and Potential Optimizations
Using the Canvas API to draw the plots live introduces a performance penalty. We assume this is due to iterating over all of the points on a curve in Python, since loops in Python are known to be slow. Further, loading fonts asynchronously over the network can take some time. However, Rendering Images is as fast as before. Below are the benchmarks for the new Canvas based renderer compared to the default Agg renderer for 5 sample plots in both firefox and chrome.
In essence, the new renderer is about 1.25 to 1.5x slower but that’s the price one pays for reducing the size.
It’s been really gratifying to see all of the cool things that have been created with Pyodide in the few years since its public launch. However, there’s still lots to do to turn this proof-of-concept into a professional tool for everyday data science work. If you’re interested in helping us build that future, come find us on Gitter, Github and our mailing list.
About the Author
Madhur Tandon is a Scientific Software Engineer at QuantStack. He completed his Bachelors in Computer Science from IIITD. He is active in the open source ecosystem and has worked with Elucidata, INCF, TCS Research, Mozilla, Deepnote, etc. in the past. Besides core data science and machine learning, he is interested in tools that enhance a data scientist’s workflow and the scientific software ecosystem. This work was initially done as a part of the Google Summer of Code program, under the supervision of Michael Droettboom at Mozilla and was later revived as part of my role at QuantStack, with support from Roman Yurchak. Madhur has also contributed to the JupyterLite project (a Jupyter distribution that runs entirely in the browser and provides a Pyodide-based Python kernel) which was also developed at QuantStack. Finally, he has been a speaker at conferences such as PyData and JupyterCon, etc. Talk to him about his solo trip to SF, his previous experiences, or about books and food. When not working, he is working out!
Huge thanks to Michael Droettboom, Roman Yurchak, Sylvain Corlay and the whole Iodide team (Brendan Colloran, Hamilton Ulmer, William Lachance and others), for making this happen!