~jomco/straatnaam

02b954cfba961244b58d5b5ed1af3a3af29b6937 — Remco van 't Veer 1 year, 11 months ago 89c591a
Add presentation

Presented at "Eerlijke WOZ B.V." on 2022-10-12.
A doc/presentation/epsg-28992.png => doc/presentation/epsg-28992.png +0 -0
A doc/presentation/plot-1000x1000.png => doc/presentation/plot-1000x1000.png +0 -0
A doc/presentation/presentation.org => doc/presentation/presentation.org +424 -0
@@ 0,0 1,424 @@
#+TITLE: Basisregistratie Adressen en Gebouwen
#+AUTHOR: Remco van 't Veer
#+DATE: 2022-10-12
#+OPTIONS: ':t ^:nil toc:nil

# Use ox-beamer to export

#+ATTR_LATEX: :height .66\linewidth
[[file:plot-1000x1000.png]]
See also: [[file:doc/plot.sh]]


* Hi, I am Remco

- I like writing software

- I don't like reading documentation


* Disclaimer

- I am not a BAG expert

- I am not a GIS nerd


* What's BAG?

- BRA: Basisregistratie Adressen

  (base registration addresses)

- BGR: Basisgebouwenregistratie

  (base registration buildings)

- BRA + BGR = BAG

- BAG: Basisregistratie Adressen en Gebouwen

  (base registration addresses and buildings)

- LV BAG: Landelijke Voorziening BAG

  (national facility BAG)


* Objects

- NUM: Nummeraanduiding

  (number designation)

- OPR: Openbareruimte

  (public space)

- WPL: Woonplaats

  (city)

- VBO: Verblijfsobject

  (residence)

- PND: Pand

  (building)

- LIG: Ligplaats

  (berth for a houseboat etc.)

- STA: Standplaats

  (pitch of a caravan etc.)


* Model

#+begin_src dot :file "presentation-model.png"
  digraph {
          NUM -> OPR [label="1"];
          OPR -> WPL [label="1"];
          NUM -> WPL [label="0..1"];
          VBO -> NUM [label="1"];
          VBO -> PND [label="1..*"];
          LIG -> NUM [label="1"];
          STA -> NUM [label="1"];
  }
#+end_src

#+ATTR_LATEX: :height .75\linewidth
#+RESULTS:
[[file:presentation-model.png]]


* Nevenadressen

Primary and secondary addresses.

#+begin_src dot :file "presentation-neven.png"
  digraph {
          NUM -> OPR [label="1" color=grey];
          OPR -> WPL [label="1" color=grey];
          NUM -> WPL [label="0..1" color=grey];
          VBO -> NUM [label="1" color=grey];
          VBO -> PND [label="1..*" color=grey];
          LIG -> NUM [label="1" color=grey];
          STA -> NUM [label="1" color=grey];
          VBO -> NUM [label="0..* (neven)" color=red];
          LIG -> NUM [label="0..* (neven)" color=red];
          STA -> NUM [label="0..* (neven)" color=red];
  }
#+end_src

#+ATTR_LATEX: :height .75\linewidth
#+RESULTS:
[[file:presentation-neven.png]]


* Document based

State including validity period.


* Number of records

| Object | All   | Active |
|--------+-------+--------|
| NUM    | 12.0m | 11.5m  |
| OPR    | 343k  | 337k   |
| WPL    | 3.8k  | 3.7k   |
| VBO    | 21.3m | 20.7m  |
| PND    | 29.4m | 20.5m  |
| LIG    | 17.7k | 16.6k  |
| STA    | 50.1k | 44.5m  |
|--------+-------+--------|
|        |       |        |

Active = current (validity period of record) *AND* an "active" status
(not revoked, demolished etc.)


* Living data

- Bad data:

  #+begin_example
  SELECT COUNT(*)
  FROM verblijfsobject
  WHERE actief AND oppervlakte = 1;
   count
  -------
   12512
  #+end_example

- More bad data:

  #+begin_example
  SELECT COUNT(*)
  FROM pand
  WHERE actief AND oorspronkelijk_bouwjaar = 9999;
   count
  -------
      27
  #+end_example

- Anecdotal: I live in my house since 2012 but
  "oorspronkelijk_bouwjaar" in BAG is 2014


* Oversharing

#+begin_example
SELECT COUNT(*)
FROM nummeraanduiding
WHERE actief AND postcode IS NULL;
 count
--------
 983370
#+end_example

- 3199KM 5

# [[https://www.openstreetmap.org/?mlat=51.96420149343773&mlon=3.9626308066261027#map=20/51.96420149343773/3.9626308066261027]]

- 9143NZ 1

# https://www.openstreetmap.org/?mlat=53.45932339542564&mlon=6.055841821360722#map=20/53.45932339542564/6.055841821360722


* History of project

- NLExtract CSV

- Ruby -> Elixer -> Clojure

- NLExtract -> geotoko.nl (€€€)

- Clojure (BAG 1.0)

- Clojure (BAG 2.0)


* Source

- Municipalities record BAG data

- Collected by Kadaster into LV BAG

  (Landelijke Voorziening BAG / national facility BAG)

- BAG Extract downloadable from PDOK

  (Publieke Dienstverlening Op de Kaart)


* Documentation

- Catalogus BAG 2018

  https://www.kadaster.nl/-/bag-catalogus-basisregistraties-adressen-en-gebouwen

- Praktijkhandleiding BAG

  https://imbag.github.io/praktijkhandleiding/

- XSDs are excellent

  https://www.kadaster.nl/zakelijk/producten/adressen-en-gebouwen/bag-2.0-extract


* Code overview

- updater

- importer

- sanity check

- API


* Updater

- atom feed (gets updated every 8th of the month)


* Importer

- src/straatnaam/data.clj

  - updater

  - sanity check

- src/straatnaam/lvbag.clj

  - extract

  - parse

  - SQL


* Zips a zip

#+begin_example
Archive:  lvbag-extract-nl.zip
   Length      Date    Time   Name
 ---------  ---------- -----  ----
       788  09-08-2022 21:37  Leveringsdocument-BAG-Extract.xml
     50051  09-08-2022 21:37  GEM-WPL-RELATIE-08092022.zip
   1550578  09-08-2022 21:34  9999LIG08092022.zip
  17297535  09-08-2022 21:34  9999NietBag08092022.zip
   3577291  09-08-2022 21:34  9999STA08092022.zip
 983923925  09-08-2022 21:34  9999VBO08092022.zip
  20923408  09-08-2022 21:35  9999WPL08092022.zip
1677454980  09-08-2022 21:35  9999PND08092022.zip
 310004529  09-08-2022 21:37  9999NUM08092022.zip
    114721  09-08-2022 21:37  9999Inactief08092022.zip
  10742024  09-08-2022 21:37  9999OPR08092022.zip
 110996158  09-08-2022 21:37  9999InOnderzoek08092022.zip
 ---------                    -------
3136635988                    12 files
#+end_example


* XML a zips

#+begin_example
Archive:  lvbag-extract-nl/9999PND08092022.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
 14795612  09-08-2022 11:41   9999PND08092022-000001.xml
 14668254  09-08-2022 11:41   9999PND08092022-000002.xml
 14715853  09-08-2022 11:41   9999PND08092022-000003.xml
....
 14142790  09-08-2022 18:07   9999PND08092022-002049.xml
 14896460  09-08-2022 18:07   9999PND08092022-002050.xml
 10306002  09-08-2022 18:07   9999PND08092022-002051.xml
---------                     -------
30403929905                     2051 files
#+end_example


* XML

- 10000 records per XML

- clojure.data.xml/source-seq event stream

- insert into Postgres in batches


* One table per object type

One to many relation (like nevenadres) link tables build using
`generate_subscripts` and Postgres array datatype.

#+begin_example
INSERT INTO verblijfsobject_neven
SELECT id, a[i] FROM (
  SELECT
    x.id AS id,
    x.neven_ids AS a,
    generate_subscripts(x.neven_ids, 1) AS i
  FROM verblijfsobject
)
#+end_example


* Geometry

- Rijksdriehoekscoördinaten aka EPSG:28992

  (governmental triangulation coordinates?)

#+ATTR_LATEX: :height .33\linewidth
[[file:epsg-28992.png]]

- PostGIS to the rescue


* Sanity checks

- Never trust data from the internet!

- Inserts into versioned schema aka namespace

- Counters and searches for some railway stations


* Bag view

| Column               | Type                   |
|----------------------+------------------------|
| id                   | numeric                |
| openbareruimte       | character varying(120) |
| huisnummer           | integer                |
| huisletter           | character varying(1)   |
| huisnummertoevoeging | character varying(4)   |
| postcode             | character varying(6)   |
| woonplaats           | character varying(80)  |
| x                    | double precision       |
| y                    | double precision       |
| gebruiksdoel         | character varying(24)  |
| nevenadres           | boolean                |
| object_type          | character varying(15)  |
| object_id            | numeric                |
| lengtegraad          | double precision       |
| breedtegraad         | double precision       |


* Eerlijke WOZ extras

#+begin_src sql
SELECT
  bag.*, vbo.oppervlakte, pnd.oorspronkelijk_bouwjaar
FROM
  bag
LEFT JOIN
  verblijfsobject vbo
ON
  vbo.id = bag.object_id
LEFT JOIN
  verblijfsobject_pand vbo_pnd
ON
  vbo_pnd.id = bag.object_id
LEFT JOIN
  pand pnd
ON
  vbo_pnd.pand_id = pnd.id
#+end_src


* Eerlijke WOZ address

| id                      |    363200000520973 |
| openbareruimte          | Johan Huizingalaan |
| huisnummer              |                763 |
| huisletter              |                  A |
| huisnummertoevoeging    |                    |
| postcode                |             1066VH |
| woonplaats              |          Amsterdam |
| x                       |             117021 |
| y                       |             483978 |
| gebruiksdoel            |   industriefunctie |
| nevenadres              |                  f |
| object_type             |    verblijfsobject |
| object_id               |    363010001036463 |
| lengtegraad             |  4.829889464736984 |
| breedtegraad            |  52.34240599219023 |
| oppervlakte             |              39570 |
| oorspronkelijk_bouwjaar |               1969 |


* Please contribute

Discussion and patches to: [[mailto:~jomco/public-inbox@lists.sr.ht][~jomco/public-inbox@lists.sr.ht]]


* Questions?

A doc/presentation/presentation.pdf => doc/presentation/presentation.pdf +0 -0