~ne02ptzero/libfloat

87cb99dc — Louis Solofrizzo 26 days ago master
log: Apply deep-sleep state from heartbeats / leaders instead of computing locally

We're now applying the deep-sleep state from the AEs/Hearbeats from the
leader, instead of each node computing it locally. This way, only the
leader computes it, and gossip this state to other nodes via AEs, and
those node simply apply it. Solve some issues of desync deep-sleep
timers seen on production.

I've also reworked the deep-sleep routines a bit, in order to have a
single entry-point for it.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/50314

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Julien Egloff <jegloff@scaleway.com>
Acked-by: Florian Florensa <fflorensa@scaleway.com>

 ________________________________________
/ Just because they are called           \
| 'forbidden' transitions does not mean  |
| that they are forbidden. They are less |
| allowed than allowed transitions, if   |
| you see what I mean. -- From a Part 2  |
\ Quantum Mechanics lecture.             /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
65f9f585 — Louis Solofrizzo 3 months ago
log: Only ERROR log on smaller commit_id when it is problematic

Otherwise, DEBUG it. Should generate less logs on production / tests.
Also removes the minimum 20 entries used to compute optimistic
replication.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/47901

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
77f3b3c9 — Louis Solofrizzo 4 months ago
periodic: Change log level from ERROR to DEBUG on gray-failure checks

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/47503

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
9e7bac3b — Louis Solofrizzo 8 months ago
periodic: Trigger an election when a leader has lost its followers

Also fixes some formatting issues in the latest patch.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/42881

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>

 ________________________________________
/ Take what you can use and let the rest \
\ go by. -- Ken Kesey                    /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
raft: add soft snapshot feature

This patch add soft snapshot feature: if no logs are received for the
soft_compact_time seconds, a snapshot will be made based on the
soft_compact_after_n value.

Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
27f69f8a — Julien Egloff 9 months ago
log, periodic: Ensure libfloat is resistant to clock drifting

By using relative timer instead of absolute ones.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/42580

Signed-off-by: Julien Egloff <jegloff@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
a97638c1 — Louis Solofrizzo 10 months ago
log: Missing condition logic on append_entries receive

That can lead to stuck snapshots, on specific cases.
I've simplified some logic and added some debug here and there.
I've also fixed the snapshotting logic to account for off-by-one or more
log id error.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/41678

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
1e1bf074 — Louis Solofrizzo 10 months ago
log: Accept snapshots logs if the term is higher than our snapshot term

It's stucking replication on some clusters in production

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/41159

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>

 _____________________________________
/ "Nature is very un-American. Nature \
| never hurries." -- William George   |
\ Jordan                              /
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
c45a1f8a — Louis Solofrizzo 10 months ago
compilation: Add a build.zig compilation file

Working as intended, some small fixes in the code to compile with clang
without warnings.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/41085

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>

 ________________________________________
/ If I had only known, I would have been \
\ a locksmith. -- Albert Einstein        /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
4c31834f — Louis Solofrizzo 11 months ago
election: Fix follower condition and be less strict on term

Revert "dynamo: Do not discard leader when timeout is reached when using leader-check dynamo like elections"
Fix no_leader count for gray-failures
Add no wake-up if the leader has been recovered from a gray-failure check
Load leader before deep-sleep states in order not to force a wake-up on restarts

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40645

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
937fe35f — Louis Solofrizzo 11 months ago
dynamo: Do not discard leader when timeout is reached when using leader-check dynamo like elections

Rather, discard the leader when all the nodes respond with a loss and
trigger an election.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40455

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by: Julien Egloff <jegloff@scaleway.com>

 ________________________________________
/ "What do you give a man who has        \
| everything?" the pretty teenager asked |
| her mother. "Encouragement, dear," she |
\ replied.                               /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
4ee65d14 — Louis Solofrizzo 11 months ago
persistent: Also save deep-sleep timer to be persistent across restarts

Useful for rolling releases of the deep-sleep feature

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40428

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>

 ____________________________________
< A well-known friend is a treasure. >
 ------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
68af1737 — Louis Solofrizzo 1 year, 3 days ago
raft: Add optionnal deep-sleep clusters

This patchs adds a new feature: Deep-sleep clusters. The idea is simple:
When a certain time is reached (conf.deep_sleep_time) without any logs
written, the timeout thresholds are raised by two, for a maximum of 4
times. This should lower the heartbeats per seconds required on large
clusters, but will impact the recovery time of those clusters in case of
a sudden leader-loss.

Also added a persistent state for the leadership and the deep sleep
state; this way, it is reloaded on cluster restart, which can avoid
errors on restarts.

Also added a small fix which avoids the cluster-check routines when an
election is already on-going.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40300

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by     : Patrik Cyvoct <pcyvoct@scaleway.com>

 ____________________________________
/ Xerox never comes up with anything \
\ original.                          /
 ------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
298eb764 — Louis Solofrizzo 1 year, 1 month ago
log: Replace some mallocs with callocs

Prevent garbage reading above.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/39608

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by: Florian Florensa <fflorensa@scaleway.com>
9b60f790 — Louis Solofrizzo 1 year, 1 month ago
election: Change the dynano election trigger to a relative majority

We have seen an (rare) issue in production, where an election is never
trigerred if 2 out of 5 nodes are unreachable. That's because a node was
waiting for at least 3 answers (5 - 2), not counting itself, to trigger
an election. This is now fixed, as we wait for a relative majority (5 /
2).

Patch : https://lists.sr.ht/~ne02ptzero/libfloat/patches/39359

Signed-off-by : Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by      : Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by      : Julien Egloff <jegloff@scaleway.com>

 ______________________________________
/ Girls marry for love. Boys marry     \
| because of a chronic irritation that |
| causes them to gravitate in the      |
| direction of objects with certain    |
| curvilinear properties. -- Ashley    |
\ Montagu                              /
 --------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
periodic: force election from followers in case of a total leader loss

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/37524

Acked-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>

Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
37ff9491 — Louis Solofrizzo 1 year, 4 months ago
log: Don't trigger an election when an unknown leader have a lower term

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/37045

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by     : Patrik Cyvoct <pcyvoct@scaleway.com>

 _________________________________________
/ The only thing we learn from history is \
| that we learn nothing from history. --  |
| Hegel I know guys can't learn from      |
| yesterday ... Hegel must be taking the  |
| long view. -- John Brunner, "Stand on   |
\ Zanzibar"                               /
 -----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
c03e69a2 — Louis Solofrizzo 1 year, 4 months ago
election: Add a possible callback when becoming a follower

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/36566

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by     : Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by     : Florian Florensa <fflorensa@scaleway.com>

 _________________________________________
/ Soldiers who wish to be a hero Are      \
| practically zero, But those who wish to |
| be civilians, They run into the         |
\ millions.                               /
 -----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
639a13ae — Louis Solofrizzo 1 year, 4 months ago
election: Do not spam elections if one is already ongoing

And more debug here and there

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/36926

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by     : Florian Florensa <fflorensa@scaleway.com>

 __________________________________
/ To iterate is human, to recurse, \
\ divine. -- Robert Heller         /
 ----------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
62693522 — Louis Solofrizzo 1 year, 4 months ago
internals: Remove last_log from the internal context

May cause some bugs, and it is mostly useless anyway. I've reworked
AE response a bit to simplify it. We now do a DB write per log applied,
instead of one write per AE call.

Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/36843

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>

 _______________________________________
/ "Nominal fee". What an ugly sentence. \
| It's one of those things that implies |
| that if you have to ask, you can't    |
\ afford it. -- Linus Torvalds          /
 ---------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
Next