~sbinet/talks

bc7a24f690bf43720ad7e7fddc4111272fec9a80 — Sebastien Binet 2 years ago bd9d711
2021-08-26-atlas-grok: first import
A 2021/2021-08-26-atlas-grok/README.md => 2021/2021-08-26-atlas-grok/README.md +3 -0
@@ 0,0 1,3 @@
# 2021-08-26-atlas-grok

`go-present` link: [slides](https://talks.sbinet.org/2021/2021-08-26-atlas-grok/talk.slide)

A 2021/2021-08-26-atlas-grok/_code/xaod_datamodeltest.cxx => 2021/2021-08-26-atlas-grok/_code/xaod_datamodeltest.cxx +22 -0
@@ 0,0 1,22 @@
// DO NOT EDIT; Automatically generated by grok for DMTest::C_v1

#include "DataModelTestCommon/versions/C_v1.h"
#include "xAODCore/AuxStoreAccessorMacros.h"

namespace DMTest {

AUXSTORE_PRIMITIVE_SETTER_AND_GETTER (C_v1, float, aFloat, setAFloat)

AUXSTORE_PRIMITIVE_SETTER_AND_GETTER (C_v1, int, anInt, setAnInt)

AUXSTORE_PRIMITIVE_SETTER_AND_GETTER (C_v1, unsigned int, pInt, setPInt)

AUXSTORE_PRIMITIVE_SETTER_AND_GETTER (C_v1, float, pFloat, setPFloat)

AUXSTORE_OBJECT_SETTER_AND_GETTER (C_v1, std::vector<int>, pvInt, setPVInt)
AUXSTORE_OBJECT_MOVE (C_v1, std::vector<int>, pvInt, setPVInt)

AUXSTORE_OBJECT_SETTER_AND_GETTER (C_v1, std::vector<float>, pvFloat, setPVFloat)
AUXSTORE_OBJECT_MOVE (C_v1, std::vector<float>, pvFloat, setPVFloat)

} // namespace DMTest

A 2021/2021-08-26-atlas-grok/_code/xaod_datamodeltest.grok => 2021/2021-08-26-atlas-grok/_code/xaod_datamodeltest.grok +35 -0
@@ 0,0 1,35 @@
package athena/Control/DataModelTest/DataModelTestCommon

import {
	athena/Control/AthContainers
	athena/Event/xAOD/xAODCore
}

type {
	# C_v1 is an xAOD AuxElement.
	C_v1 {
		cxx-name {
			DMTest::C_v1
		}

		bases {
			AthContainers.AuxElement
		}

		fields {
			aFloat float32
			anInt  int32
			pInt   uint32
			pFloat float32

			pvInt []int32 {
				getter pvInt
				setter setPVInt
			}

			pvFloat []float32 {
				setter setPVFloat
			}
		}
	}
}

A 2021/2021-08-26-atlas-grok/_code/xaod_datamodeltest.h => 2021/2021-08-26-atlas-grok/_code/xaod_datamodeltest.h +26 -0
@@ 0,0 1,26 @@
// DO NOT EDIT; Automatically generated by grok for DMTest::C_v1

// This file's extension implies that it's C, but it's really -*- C++ -*-.
#ifndef DATAMODELTESTCOMMON_C_V1_H
#define DATAMODELTESTCOMMON_C_V1_H 1

#include <vector>

#include "AthContainers/AuxElement.h"

namespace DMTest {

class C_v1
  : public SG::AuxElement {
public:
  float aFloat() const;
  void setAFloat(float v);
  // ...
  const std::vector<int>& pvInt() const;
  void setPVInt(const std::vector<int>& v);
  void setPVInt(std::vector<int>&& v);
  // ...
}; // C_v1
} // namespace DMTest

#endif // not DATAMODELTESTCOMMON_C_V1_H

A 2021/2021-08-26-atlas-grok/_code/xaod_egamma.grok => 2021/2021-08-26-atlas-grok/_code/xaod_egamma.grok +26 -0
@@ 0,0 1,26 @@
package xAODEgamma

import (
	ath "athena/Control/AthContainers"
	"athena/Control/AthLinks"
	"athena/Event/xAOD/xAODBase"
	calo "athena/Event/xAOD/xAODCaloEvent"
)

type EgammaContainer    EgammaContainer_v1
type EgammaContainer_v1 []ath.DataVector[Egamma_v1]
type Egamma             Egamma_v1

// Egamma_v1 represents an e/gamma object.
//grok:cxx-name xAOD::Egamma_v1
type Egamma_v1 struct {
	xAODBase.IParticle

	caloClusterLinks []AthLinks.ElementLink[calo.CaloClusterContainer]

	pt, eta, phi, m float32
	author          uint16

	EgammaCovarianceMatrix []float32
	// ...
}

A 2021/2021-08-26-atlas-grok/_figs/grok-overview.dia => 2021/2021-08-26-atlas-grok/_figs/grok-overview.dia +0 -0
A 2021/2021-08-26-atlas-grok/_figs/grok-overview.png => 2021/2021-08-26-atlas-grok/_figs/grok-overview.png +0 -0
A 2021/2021-08-26-atlas-grok/talk.slide => 2021/2021-08-26-atlas-grok/talk.slide +225 -0
@@ 0,0 1,225 @@
# grok: groking xAOD
ATLAS Core s/w, 2021-08-26

Sebastien Binet
CNRS/IN2P3/LPC-Clermont
https://github.com/sbinet
@0xbins
sebastien.binet@clermont.in2p3.fr

## Context

The idea of `grok` stemmed from this talk (Scott):

- [event/1023846: EDM evolution — phase 1, pp28-29](https://indico.cern.ch/event/1023846/contributions/4298480/attachments/2219416/3758074/2021-04-01-edm.pdf)

where the idea of accessing `xAOD` data w/o necessarily `ROOT` dictionaries (nor _via_ `PyROOT` in `python`) was expressed.

[Go](https://golang.org), [Rust](https://rust-lang.org) and other languages were mentionned as possible use cases, as well as **GPGPU** offloading.

That talk wondered wether one couldn't:

- generate a C-API for use by other languages via a [FFI](https://en.wikipedia.org/wiki/Foreign_function_interface)
- define and leverage a [Domain Specific Language (DSL)](https://en.wikipedia.org/wiki/Domain-specific_language) to describe `xAOD` classes and generate code.

  ⇒ grok.

## Strategies

At least 2 different strategies are available (I think):

- write a DSL that describes `xAOD` classes, generate code out of that DSL; or
- write code that inspects a given `xAOD` file (using `ROOT` dictionaries) to generate code.

a variation of `b)` is to write code that inspects `StoreGate` for `xAOD` data containers and generate code.

Both `b.1)` and `b.2)` suffer from an additional step (that could lead to skewed definitions) and possible logic wrinkles when trying to reassemble a bunch of container names into a coherent `xAOD` definition (we already kind of did that in the old `CBNT`s days...)

_Therefore,_ a DSL seems to me like the preferred strategy.
At least, _a_ strategy that generates code from a common definition layer seems the most workable one.

## Proposal

The current proposal code lives at:

- [gitlab.cern.ch/binet/grok](https://gitlab.cern.ch/binet/grok)

and motions to:

- Define a DSL (or re-use an already available one: `FlatBuffers`, `ProtoBuf`, `YAML`, ...) to describe `xAOD` classes in `.grok` files.

`grok` files would contain:

- the name of the package targeted to host the `xAOD` class
- the name of the packages that this package depends on
- the name of the `xAOD` class(es) this package defines

## Proposal - II

Each `xAOD` class description would hold:

- the list of variables (name, type) defining that `xAOD` class
- a mechanism to decorate such variables (_e.g.:_ setters/getters customization, `std::move`, ...)

`grok` would process these `.grok` files and:

- generate the corresponding `C`, `C++`, `Go`, ... code to create, access and modify `xAOD` data
- (possibly) generate metadata informations describing the memory layout known at compile-time (à la `NumPy` or `Arrow`)

The generated code would be committed to the repository: clients of `xAOD` packages would not need to know about `grok`.

## Proposal - III

.image _figs/grok-overview.png

## Proposal - IV

- `grok-fmt` reformats `.grok` files the "one true way"
- `grok-build` parses `.grok` files and builds an intermediate representation of `xAOD` types (which could be archived on disk in a binary if performance needs it)
- `grok-gen-cxx` parses `.grok` files (or reuse the archived form) to generate `C++` API code
- `grok-gen-c` does the same for plain `C`
- `grok-gen-py` generates [cffi](https://cffi.readthedocs.io)-based code (so it could be used by `CPython` and/or `PyPy`. We could also just generate [ctypes](https://docs.python.org/3/library/ctypes.html)-based (CPython-only) code.)
- etc...

Optionally, all these `grok-xyz` commands could be built into a single "multi command":

```
  $> grok fmt
  $> grok build
  $> grok help
  $> grok gen-cxx
```

## Proposal - V

I volunteer myself to work on `grok`:

- a [Go](https://golang.org) based command
  - `ATLAS` is already using `Go` quite extensively (`Docker`, `Kubernetes`, ...)
  - `Go` provides a portable environment to access the filesystem
  - faster to launch and run than `Python`
  - access to packages that can ease refactoring code at large
- a command ingesting `.grok` files written in a `DSL` describing `xAOD` classes
- and generating code (first in `C++`, `C+python` and `Go`) to access `xAOD` data

---

(I have some experience with automatically generating code from an `IR` in `Go` for `ctypes/cffi`, see [go-python/gopy](https://github.com/go-python/gopy).)

## Tentative design

## 

.code _code/xaod_egamma.grok

## DSL

The DSL is still in flux.
Before investing too much effort in this and being shut down, I haven't implemented the DSL from last slide.


Instead, the following is _(temporarily)_ used in the `dev` branch of [gitlab/binet/grok](https://gitlab.cern.ch/binet/grok)

.code _code/xaod_datamodeltest.grok /^package/,/^}/

## 

.code _code/xaod_datamodeltest.grok /^type/,/^}/

```go
type {
	# C_v1 is an xAOD AuxElement.
	C_v1 {
		cxx-name {
			DMTest::C_v1
		}

		bases {
			AthContainers.AuxElement
		}

		fields {
			aFloat float32
			anInt  int32

			pvInt []int32 {
				getter pvInt
				setter setPVInt
			}

			pvFloat []float32 {
				setter setPVFloat
			}
		}
	}
}
```

## DSL - II

We need to be able to:

- know the complete package name
  - this could be inferred from the package location in the filesystem
- know the list of packages' dependencies
- know the list of classes used in a given `xAOD` type (and the packages that host them)
- know the mapping of `MyNS::MyClass` into a given `grok xAOD` type (parsing `C++` with all `C++` semantics is hard.)
- (optionally) customize the name of the setters/getters
  (`pvInt -> setPVInt` instead of `setPvInt`)
- enable/disable the generation of `std::move` setter for non-builtins

## Status

## Status

Right now, `grok` is a simple self-contained, statically-compiled, portable [Go](https://golang.org) command that generates the following `.h` and `.cxx` files from the previous `xaod_datamodeltest.grok` file:

- handles includes of `C++ stdlib`
- handles includes of dependant packages
- handles `SG::AuxElement` inheritance (only direct inheritance)
- automatically detect builtin/non-builtin `AUX_VARIABLE` and generates the associated `OBJECT_MOVE` API

(the following was only slightly edited to fit a slide)

## 

.code _code/xaod_datamodeltest.h

## Status

.code _code/xaod_datamodeltest.cxx

## TODO

- migrate to the "real" DSL
- correctly carry over comments from `.grok` files to `.{h,cxx}` files
- define the `xAOD` base classes in `.grok` files
  - `SG::AuxElement`
  - `xAOD::AuxContainerBase`
  - `xAOD::AuxInfoBase`
- (perhaps) define "external" classes (_e.g.:_ `TLorentzVector`, ...)
- assess performances over a set of packages
  - if needed, define an archive format for the result of "compiled" `.grok` files to efficiently store the IR (and avoid importing+re-parsing over and over `.grok` files)
- flesh out the import mechanism (I've cheated for now w/ `AthContainers`)

## Open issues

- define a rule to decide whether to add the `SG_BASE` and/or `DATAVECTOR_BASE_FIN` decorations (or specify it w/ a grok decoration, like `"cxx-name"` ?)
- test some more the package/header file mapping
- differentiate between types inheriting from `SG::AuxElement`, `xAOD::AuxContainerBase` and `xAOD::AuxInfoBase` (as the generation differ somewhat between these types: `AUX_VARIABLE` vs `AUXSTORE_xyz`)
- devise a mechanism to customize further an `SG::AuxElement` inheriting class:
  - probably have `grok` generate a **`grok::xAOD::Egamma_v1` class** and have people inherit from it to add extra methods, _e.g._ `Egamma_v1::addAuthor` and `{c,setC}ovMatrix`
- mechanism for sharing code (_e.g._ `{Electron,Photon}Aux{,Trig}Container_v3`)
- use cases I didn't think or know about.

## Related issues

If we go with this and are serious with providing access to `xAOD` from non-`C++`, non-`PyROOT` languages, we'll probably need to:

- provide a `C` API to `StoreGate`
  - there are some things that could be ripped off of `Control/StoreGateBindings` but these are very much `PyROOT` oriented
  - probably needed anyway for `grok-gen-{c,py}`
- provide a `C` "framework" to implement `AthAlgorithm` from `C` (that could be leveraged from other languages)
  - probably a `C++` `CAthAlgorithm` inheriting from `::AthAlgorithm` (or a pure `Gaudi` one?) that calls a set of `C` function pointers (`initialize`, `execute`, `finalize`)

But that's for another meeting.