This document is an initial design document. It's sparse in details in some places and is meant only to serve as exploration for initial development. It contains omissions and is light on details for things that are in a more exploratory phase.
The main focus is ensuring that the overall direction is correct, as well as discovering things that will need to be learned or implemented.
IsabellaDB is a chess database. It's meant to be used for analyzing past games and patterns across them to uncover interesting facts or improve as a chess player.
Some things which IsabellaDB will support:
The fundamental functionality of a chess database is to be able to store and retrieve games with various filters. Additional functionality, like analysis, is layered on top of this. As such, game storage and retrieval is the critical core of IsabellaDB.
We'll focus first on storage of games, then retrieval, then indexing.
Each game has a set of metadata associated with it, as well as a list of moves.
The metadata contains:
The game information contains:
There will be a lot of repetition in metadata, so it will be converted to store only once in memory in a separate names store, and then the game records will point into that store.
Each game will be assigned a unique id when it is ingested into the system, and these will be used as the lookup keys from the indexes.
The initial in-memory storage will be pretty basic and will look like:
type GameID = u64;
struct GameStore {
games: HashMap<GameID, Game>,
indexes: ...
}
There are two types of filters that you can do: filtering based on metadata about a game, and filtering based on what happened within the game.
Retrieval will be pretty straightforward. Given a list of filters, in order of use (this order will be determined by the query planner), we can retrieve them with this pseudocode:
# get the games that match the first filter
games = db.retrieve(filters[0])
for filter in filters[1:]:
games = games.filter(filter)
To do the retrieval, the db.retrieve
call will utilize the index that
corresponds to that filter and retrieve the records that match it, either as
a range or an exact result list.
To start, we'll have some fairly simple indexes. Over time, we'll build up more complicated indexes to handle more complicated search patterns.
The initial indexes will be:
Indexing on game results would not add any significant benefit, since there are only three classes (win, lose, draw).
Eventually, we'll also have indexes based on features of positions, instead of exact position matches. This includes things like the presence (or absence) of pins, certain pieces being on the board or not, etc. Further development of indexes will be driven by need to support certain types of queries.
Out of scope for the initial design:
A user should be able to submit a query for a set of games which match certain criteria. Here is an example of doing that with a few different filters on metadata:
# Find all the games played by Magnus and Fabi after 2015
(search-games
(match-metadata
(either-name "Fabiano Caruana") # matches either player
(either-name "Magnus Carlsen") # matches either player
(year > 2015)
)
)
And here's another example based on positional matching:
# Find games which match the provided position (5 ply of the Italian) with an
# extra filter on metadata
(search-games
(match-position
(fen "r1bqkbnr/pppp1ppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3")
)
(match-metadata
(white-rating > 2300)
(black-rating > 2300)
)
)
The query language will be defined in more detail over time. This is just a sample to guide what capabilities it'll need to provide. This will be one of the later pieces developed.
When a query is received, it will be parsed (as an S-expression), converted to a sequence of operations that can be run to retrieve the data from the store, and optimized (mostly around ordering of application of filters, and use of indexes).
The initial development for the first phase will focus on getting data ingested and basic indexes constructed. Querying will be in the second phase.
v0.1.0:
v0.2.0: