What Have I Done?

(back to main page)

Fall 2023

libsrcml.js

srcML is a well-known research project that aims to generate abstract syntax tree markup for code; it labels syntactic entities like classes, functions, and statements with XML in a uniform style across multiple languages. Its core is a C/C++ library, libsrcml, which contains a parser and a lot of utility functions for managing "units" (XML files) and "archives" (XML files that correspond to multiple source code files). A Python wrapper for the library already exists, which loads and calls compiled code using the ctypes module. I was tasked with bringing libsrcml to the JavaScript ecosystem by compiling it to WebAssembly and writing an idiomatic and object-oriented wrapper around it. This project required thinking across a wide range of levels of abstraction, from low-level C-style memory management to JSDoc types.

Although libsrcml is written with C++, its interface consists of standalone C functions that operate on pointers to opaque structs. This is very different from the typical JavaScript library, where nested, transparent objects and first-class functions are ubiquitous; this gap had to be bridged with a very different design, where units and archives went from being opaque pointers to being the dynamic objects that provide access to the original library's functionality.

The build process also posed some challenges; although it is the industry standard and the most feature-complete C/C++ to WASM compiler I could find, Emscripten was inconsistent about outputting CJS vs. ES modules, and leveraged tricks with dynamic imports that bundlers like Vite struggled to understand. I ended up using Docker to provide a portable environment for deterministically compiling libsrcml into WASM and carefully patching the emitted JavaScript "glue code."