~sbinet/talks

b813daeced6be95a6a4e5e4192641a6f71b76aef — Sebastien Binet 1 year, 10 months ago 7cd42c6
2022-11-29-tio-update: first import

Signed-off-by: Sebastien Binet <binet@cern.ch>
A 2022/2022-11-29-tio-update/README.md => 2022/2022-11-29-tio-update/README.md +3 -0
@@ 0,0 1,3 @@
# 2022-11-29-tio-update

`go-present` link: [slides](https://talks.sbinet.org/2022/2022-11-29-tio-update/talk.slide)

A 2022/2022-11-29-tio-update/_code/read-back-0.sh => 2022/2022-11-29-tio-update/_code/read-back-0.sh +2 -0
@@ 0,0 1,2 @@
2022/10/27 17:20:03 got 82814 HVs from Oracle   (10.577163231s)
2022/10/27 17:41:12 got 82814 HVs from Postgres (21m8.787602459s)

A 2022/2022-11-29-tio-update/_code/read-back-1.sh => 2022/2022-11-29-tio-update/_code/read-back-1.sh +2 -0
@@ 0,0 1,2 @@
2022/11/07 15:58:54 got 84046 HVs from Oracle   (9.565754967s)
2022/11/07 15:58:57 got 84046 HVs from Postgres (3.006473277s)

A 2022/2022-11-29-tio-update/_code/tiodb-mirror-init.sh => 2022/2022-11-29-tio-update/_code/tiodb-mirror-init.sh +18 -0
@@ 0,0 1,18 @@
$> tiodb mirror-init
[...]
tio-db: beg: 2022-10-26 00:00:00 +0200 CEST
tio-db: end: 2022-10-27 00:00:00 +0200 CEST
tio-db: mirroring kind=m...
tio-db: recv 314078 rows...
tio-db: sent 314078 rows...
tio-db: mirroring kind=r...
tio-db: recv 74 rows...
tio-db: sent 74 rows...
tio-db: mirroring kind=i...
tio-db: recv 24556 rows...
tio-db: sent 24556 rows...
tio-db: mirroring kind=v...
tio-db: recv 6144 rows...
tio-db: sent 6144 rows...
tio-db: --> 2m50.442453919s


A 2022/2022-11-29-tio-update/_figs/daqmon-l1evts.png => 2022/2022-11-29-tio-update/_figs/daqmon-l1evts.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-l1rate.png => 2022/2022-11-29-tio-update/_figs/daqmon-l1rate.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-laser-lost-emits.png => 2022/2022-11-29-tio-update/_figs/daqmon-laser-lost-emits.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-laser-main-hover.png => 2022/2022-11-29-tio-update/_figs/daqmon-laser-main-hover.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-laser-main.png => 2022/2022-11-29-tio-update/_figs/daqmon-laser-main.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-laser-tt-recv.png => 2022/2022-11-29-tio-update/_figs/daqmon-laser-tt-recv.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-main-hover.png => 2022/2022-11-29-tio-update/_figs/daqmon-main-hover.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-main.png => 2022/2022-11-29-tio-update/_figs/daqmon-main.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-tile-ctpin.png => 2022/2022-11-29-tio-update/_figs/daqmon-tile-ctpin.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-tile-dqflag.png => 2022/2022-11-29-tio-update/_figs/daqmon-tile-dqflag.png +0 -0
A 2022/2022-11-29-tio-update/_figs/daqmon-tile-eba.png => 2022/2022-11-29-tio-update/_figs/daqmon-tile-eba.png +0 -0
A 2022/2022-11-29-tio-update/_figs/lbc59-pmt.png => 2022/2022-11-29-tio-update/_figs/lbc59-pmt.png +0 -0
A 2022/2022-11-29-tio-update/_figs/ps-dt2018.png => 2022/2022-11-29-tio-update/_figs/ps-dt2018.png +0 -0
A 2022/2022-11-29-tio-update/talk.slide => 2022/2022-11-29-tio-update/talk.slide +238 -0
@@ 0,0 1,238 @@
# New plugins for Tile-In-One
ATLAS LPC
29 Nov 2022

Sebastien Binet
CNRS/IN2P3/LPC-Clermont
https://github.com/sbinet
@0xbins
sebastien.binet@clermont.in2p3.fr

## New (Go) plugins for Tile-In-One

Working (10-20% FTE) on TileCal monitoring duties for LPC-Clermont, 2 prongs:

- [git:tio-hvmon](https://gitlab.cern.ch/tile-in-one/tio-0021), _a.k.a_ `tio-0021`
- [git:tio-daqmon](https://gitlab.cern.ch/tile-in-one/tio-0022), _a.k.a_ `tio-0022`

## Tile-in-One (tio)

## Tile-in-One

Tile-in-One is a collection of small web applications, called plugins,
designed to provide data quality assessment for the ATLAS Tile Calorimeter.

- [tio.cern.ch](https://tio.cern.ch)
- [tio.cern.ch/documentation](https://tio.cern.ch/documentation/index.md)

` `

Basically, a HTTP server + a reverse proxy serving a bunch of dedicated HTTP servers serving Tile-related informations:

- [Hello world](https://tio.cern.ch/plugin-template/)
- [Conditions web selection](https://tio.cern.ch/cond-web-select/)
- [DQ history](https://tio.cern.ch/dq-history/)
- [DQ notes](https://tio.cern.ch/dq-notes/)
- [DQM3](https://tio.cern.ch/dqm3/)
- [Powercycling](https://tio.cern.ch/powercycling/)
- [Run List](https://tio.cern.ch/run-list/)
- ...
- [Tile DAQ mon](https://tio.cern.ch/daqmon/)

## tio-daqmon

## tio-daqmon

Migrated the (`python` based) [TileDAQmon](https://gitlab.cern.ch/atlas-tile-online/TileDAQmon) program from D. Calvet to [TiO](https://tio.cern.ch).

` `

`TileDAQmon` monitors the status of the Tile partitions and the LASER, scrapping data off [atlasop.cern.ch](https://atlasop.cern.ch), and presenting the aggregated data as a static HTML page (auto-refreshed every `120s`).

- all presented data generated _via_ `acron`
- authentication done with [authzsvc/auth-get-sso-cookie](https://gitlab.cern.ch/authzsvc/tools/auth-get-sso-cookie)

## tio-daqmon

In order to integrate `TileDAQmon` with `TiO`:

- developed a ([Go](https://golang.org)) library to automatically renew Kerberos credentials (via `keytab`)
	- inspired from CERN IT's [authzsvc/auth-get-sso-cookie](https://gitlab.cern.ch/authzsvc/tools/auth-get-sso-cookie)
- needed to query [https://atlasop.cern.ch/info/current/ATLAS/is](https://atlasop.cern.ch/info/current/ATLAS/is)

.image _figs/daqmon-main.png 350 _

.link https://tio.cern.ch/daqmon/

## tio-daqmon

`tio-daqmon` is a complete ([Go](https://golang.org)) `HTTP` server, with multiple end-points:

- `/` displays a `TiO`-compatible (static) web page
- `/tile` displays TileDAQ-related informations
- `/laser` displays LASER-related informations
- `/plot/<xyz>` display PNG plots

All the work is performed _"server side"_:

- very lightweight HTML page
- next to no Javascript
- data is collected from `atlasop` every 2mn
- PNGs are re-generated every 2mn
- PNGs are served from a local cache

## tio-daqmon - Tile

.image _figs/daqmon-main.png 600 _

## tio-daqmon - Tile

.image _figs/daqmon-main-hover.png 600 _

## tio-daqmon - Tile

.image _figs/daqmon-l1evts.png 600 _

## tio-daqmon - Tile

.image _figs/daqmon-l1rate.png 600 _

## tio-daqmon - Tile

.image _figs/daqmon-tile-dqflag.png 600 _

## tio-daqmon - Tile

.image _figs/daqmon-tile-eba.png 600 _

## tio-daqmon - Tile

.image _figs/daqmon-tile-ctpin.png 600 _

## tio-daqmon - LASER

.image _figs/daqmon-laser-main.png 600 _

## tio-daqmon - LASER

.image _figs/daqmon-laser-main-hover.png 600 _

## tio-daqmon - LASER

.image _figs/daqmon-laser-lost-emits.png 600 _

## tio-daqmon - LASER

.image _figs/daqmon-laser-tt-recv.png 600 _

## tio-hvmon

## tio-hvmon

`tio-hvmon` is supposed to (eventually) display monitoring informations for the HVs.

` `

`tio-hvmon` was the initial work item.

- integrate D. Calvet's code from [HVAnaMon](https://gitlab.cern.ch/atlas-clermont/tile/HV/HVAnaMon) into `TiO`
- `HVAnaMon` was initially written for _"offline"_ analysis of HVs
- improved `HVAnaMon` data storage (about `3x` **smaller**)
- improved `HVAnaMon` data analysis time (about `2x` **faster**)
- `HVAnaMon` needs a bit of re-engineering for a _"quasi-online"_ workload

## HVAnaMon

[`HVAnaMon`](https://gitlab.cern.ch/atlas-clermont/tile/HV/HVAnaMon) is a set of ([Go-based](https://golang.org)) tools to:

- ingest [DDV](https://atlas-ddv.cern.ch/DDV.html) (_DCS Data Viewer_) data
- analyse data over multiple periods and/or modules (`all`, `power` or `plotonly`)
- monitor stability of HV channels
- predict/infer future possible issues
- plan for repairs
- devise strategies in advance when boards come back from CERN to Clermont

.image _figs/lbc59-pmt.png

## tio-hvmon

- `HVAnaMon` worked by pulling data off `atlas-ddv.cern.ch`
- data retrieval speed not sufficient for a daily monitoring (`~12h` for a single partition)
- current strategy is to directly retrieve data from `ATONR_ADG`, in a similar manner to what [TileDCSDataGrabber.py](https://gitlab.cern.ch/atlas/athena/-/blob/master/TileCalorimeter/TileCoolDcs/python/TileDCSDataGrabber.py) does (thanks Sasha!)
- initial tests with `tio-hvmon-db-pull` reduced data retrieval to less than an `1h`
  (for all partitions)

` `

Eventually, this will all land under:

- [https://tio.cern.ch/hvmon]()

## tio-hvmon - tio-db mirror

Doubling down on the initial `tio-hvmon-db-pull` code, I developed a daemon that:

- mirrored the whole `Oracle` database (`ATONR_ADG`) for all partitions since August 1st
- wakes up every hour and gobbles up any new data
- initially, dumped all data into a local `Postgres` database
- after some recurrent troubles with [SELinux](https://en.wikipedia.org/wiki/Security-Enhanced_Linux) (enabled by default on the `tio-xxx` machines), dumped into a _"Database-on-Demand"_ `Postgres` instance instead.

.link https://dbod-user-guide.web.cern.ch/

## 

Initial tests were rather encouraging:

.code _code/tiodb-mirror-init.sh

`3mn` to fetch a day's worth of monitoring data
- for HVs, for all partitions and all data _"kinds"_
  - monitored, requested, input, voltage

## data consistency

But while testing for consistency Oracle/Postgres and reading back from Postgres:

.code _code/read-back-0.sh

which only grew worse when adding more and more monitoring data.

## optimizing Postgres database schema

After a bit of performance debugging and optimizing the database schema/definition:

.code _code/read-back-1.sh

**Good enough for now**

## tio-hvmon - TODO

Still need to:

- gain experience with automatic monitoring
	- run every day? every hour? twice a week?
- assess disk storage needs
    - 20Gb of db space available
	- `< 5Gb` for 3 months of monitoring data (HV-only)
- wire up `HVAnaMon` backend with `tio-hvmon`
- provide a web output (in addition to PDF+PNG)

.image _figs/ps-dt2018.png 400 _

## Conclusions

[tio/daqmon](https://tio.cern.ch/daqmon) is live and ready.

` `

- ready for feedback and/or (constructive) criticism
  - provide a way to _"go back in time"_ (and serve `daqmon` pages from a given date in time)
- **integrate `tio-daqmon` into the usual DAQ on-call shifters tour?**

---

[tio/hvmon](https://tio-dev.cern.ch/hvmon) is still a _WIP_.

- eventually, should migrate from `tio-dev.cern.ch` to `tio.cern.ch`
- eventually, should be also integrated into the usual DQ shifters tour