e68e93e3fcb8053ea3f8d839af22b02495036fdc — Ngô Ngọc Đức Huy 4 months ago 0588bad
Add new post about mayim
1 files changed, 197 insertions(+), 0 deletions(-)

A content/posts/2024-01-05-mayim-sqlite.md
A content/posts/2024-01-05-mayim-sqlite.md => content/posts/2024-01-05-mayim-sqlite.md +197 -0
@@ 0,0 1,197 @@
title: "Querying data from SQLite with Mayim"
date: 2024-01-05
lang: en
categories: [ blog ]
tags: [mayim, sqlite, python]
translationKey: "2024-01-05-mayim-sqlite"

(No resolution post this year huh? Maybe I'll save for lunar new year.)

So I'm dabbling with [Sanic][sanic] as an asynchronous web framework, where I
learned about [Mayim][mayim], which is like a wrapper for SQL connectors to
"hydrate" the data into models to be validated.  Additionally, it also helps
with separating the SQL queries from the code to call those more elegantly.

[sanic]: https://sanic.dev
[mayim]: https://ahopkins.github.io/mayim/

Using Mayim is really simple:

1. Put your SQL queries inside `queries/` folder
2. Declare models for outputs
3. Write an executor class with methods named the same as the SQL files

These methods will execute the SQL queries and return the result as the model

(See their documentation for more info and code samples)

So far so good.  I'm trying out with a small project as a
playground for this, so I don't want a DBMS, so I'm sticking to SQLite.
That turns out to be some (solvable) troubles because there are stuff that's
specific to each SQL implementation.

Most of its example include only one table, to keep the query simple.  For an
actual app, there are usually several tables, with some one-to-many or
many-to-many relations.  The most complicated query I see in the doc is [an
relation.  It was also PostgresQL though, so they're not immediately applicable
for my experiment.  Argh it requires pydantic, another library for this to work
also---I'm expecting to keep the dependencies minimal, but okay.

For this experiment, I have these tables with products, categories, and prices
with both one-to-many and many-to-many relations.  The schema is defined by the
queries below:

CREATE TABLE product (

  product_id INTEGER NOT NULL,
  FOREIGN KEY (product_id) REFERENCES product (id),

CREATE TABLE category (

CREATE TABLE product_category (
  product_id INTEGER NOT NULL,
  category_id INTEGER NOT NULL,
  FOREIGN KEY (product_id) REFERENCES product (id),
  FOREIGN KEY (category_id) REFERENCES category (id)

class Price(BaseModel):
    id: int
    unit: str
    price: int

class Product(BaseModel):
    id: int
    name: str
    prices: list[Price]

Let's try with the one-to-many one first---the product-price relation.

(As for why there are multiple prices for a single product: when one buys in
bulk, the price is usually cheaper.)

Normally, one would query it as:

SELECT product.id, product.name, price, unit
FROM product
JOIN price ON product.id = price.product_id

That doesn't work here: we need price as a list of price objects for each
product.  From the example linked above, we learned that nested object requires
turning to JSON, so let's look at [SQLite's document for JSON
functions][sqlite-json].  `json_group_array()` seems to do what we need here,
and we can use `json_object()` to construct the JSON.

[sqlite-json]: https://sqlite.org/json1.html

SELECT product.id, name, json_group_array(
  json_object('id', price.id, 'price', price, 'unit', unit)
) as prices
FROM product JOIN price
ON product.id = price.product_id
GROUP BY product.id;

This turns out to be simpler than the PostgresQL example!

However, when I ran that:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Product
  Input should be a valid list [type=list_type, input_value='[{"id":1,"price":8000,...}]', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/list_type

It turns out, the SQLite library can only returns string and not JSON.  Maybe
they should support that?  I don't know.  But we can solve that ourselves by
extending pydantic's `BaseModel`.  To save you from inspecting pydantic's
source code, upon instantiation BaseModel takes `**data` as initialization
parameters, and has `model_fields` attributes to store type constraints that we
defined above.  This maps from the field name to an object of type `FieldInfo`,
which contains `annotation`, which is the data type that we need.

Let's customize that to parse the JSON data:

class SQLiteModel(BaseModel):
    """Custom models to parse JSON from SQLite which is returned as string."""
    def __init__(self, /, **data):
        for field, info in self.model_fields.items():
            # To be careful, only try to parse data expected to be list|dict
            if (type(info.annotation) is types.GenericAlias
                    and info.annotation.__origin__ in [list, dict]
                    and type(data[field]) is str):
                data[field] = json.loads(data[field])
                if type(info.annotation) is types.GenericAlias:

That should work now.

Now, let's add categories to the products:

class Product(SQLiteModel):
    id: int
    name: str
    prices: list[Price]
    categories: list[str]

I'm skipping a bit here because categories has only one data field, though I
probably should write full objects as for prices.

SELECT product.id, product.name, json_group_array(category.name) as categories, json_group_array(
  json_object('id', price.id, 'price', price, 'unit', unit)
) as prices
FROM product
JOIN price ON product.id = price.product_id
JOIN product_category ON product.id = product_category.product_id
JOIN category ON category.id = product_category.category_id
GROUP BY product.id;

Running that would result in rows with duplicated categories like:

Product(id=1, name='energy drink', prices=[Price(id=1, unit='bottle', price=12000), Price(id=2, unit='packets', price=70000)], categories=['drink', 'drink']),

This is because I joined three tables at once, and the categories are repeated
the same number of the prices of a product.  If I wrote a full model for
Category, it wouldn't run at all due to duplicated primary key.  To solve this,
simply add `distinct` keyword would work.

So, after this, I found writing bare queries for this small project not too
complicated and the encountered problems quite trivial to solve, but I expect
it to be less so for larger projects, and so appreciate how much
<abbr>ORM</abbr> simplifies the process.  I'll try out some more with this
approach, though.