2024-03-01

DuckDB Xonsh

DuckDB is a free, open-source, embedded, in-process, relational database management system (DBMS) designed specifically for analytical tasks.

Think of Duck DB as an analytics-optimized replica of SQLite, it combines the simplicity of SQLite with the functionalities of more sophisticated databases.

To begin, as shown in the video below, open your PowerShell and launch the Xonsh shell. Xonsh is a Python-based shell that combines features from both Bash and Python.

To install the DuckDB Python client, use the following command:

   xpip install duckdb

If you want to install a pre-release version, you can use the --pre flag:

   xpip install duckdb --upgrade --pre

Once DuckDB is installed, you can import the Python module into the Xonsh shell with the following import statement:

   import duckdb

DuckDB allows you to read data from CSV files directly into your database. Let's assume that you have a remote CSV file containing the Anscombe Quadrant data. You can fetch it using the read_csv function.

Anscombe’s quartet is a fascinating collection of four datasets, each consisting of eleven (x, y) points. Despite having nearly identical simple descriptive statistics, these datasets exhibit vastly different distributions and appear distinct when graphed.

By applying a suitable SQL filter, it's easy to extract the relevant data points that belong to the first quadrant. And finally, use Seaborn to visualize the x-y dataset for the first quadrant.

CLI

It's also possible to directly use the DuckDb command line binary:

   duckdb -c "SELECT * FROM read_csv('https://tessarinseve.pythonanywhere.com/staticweb/anscombe.csv') where dataset='IV'" | svgbob -o iv.svg

The ASCII table, obtained from the query, serves as the standard input for Svgbob. Svgbob then generates its SVG representation, as demonstrated below.