Data comes in all sizes, shapes and qualities. The process of getting data ready for further analysis is equally crucial and tedious, as many data professionals will confirm.
This process of many names (data wrangling/munging/cleaning) is often performed by an unholy mix of command-line tools, one-shot scripts and whatever is at hand depending on the data formats and computing environment.
I have been intending to share some of the tools I have found useful for this task in a series of blog posts, especially if they are particularly unexpected or lesser-known. I will always try to demonstrate the tool with some common data processing application, and then finally highlight which conditions the tool is most suitable under.
Jupyter/JupyterLab are the de-facto standard notebook environment, especially among Python data scientists (although it was designed from the start to work for multiple languages or kernels as the name hints). The frontend runs in a browser, and setting up the backend often requires a local installation, although some providers will let you spin a backend in the cloud, see: Google Collab or The Binder Project.
JupyterLite is simpler/cleaner solution for simple analysis if sharing is not needed: it is a quite complete Jupyter environment where all its components run in the browser via webassembly compilation. Just visit its Github project page for the details. Following some of the referenced projects and examples is a worthy rabbit hole to enter.
Some things you might not expect from a webassembly solution:
%pip install -q bqplot ipyleaflet
Sometimes you need to perform some not so simple manipulation in an Excel sheet that outgrows pivot tables but is kind of the bread and butter of pandas. Since copying an excel table defaults to a tabbed-separated string, getting a pandas dataframe is as easy as firing up jupyterlite by visiting this page, then getting a python notebook and evaluating this code in the first cell, pasting the excel table between the multi-line string separator:
JupyterLite is an amazing project: as with many webassembly based solutions, we are just starting to see the possibilities. I encourage you to explore it beyond data manipulation because you can easily find other applications for it, from interactive dashboards to authoring diagrams…