log: Apply deep-sleep state from heartbeats / leaders instead of computing locally
We're now applying the deep-sleep state from the AEs/Hearbeats from the
leader, instead of each node computing it locally. This way, only the
leader computes it, and gossip this state to other nodes via AEs, and
those node simply apply it. Solve some issues of desync deep-sleep
timers seen on production.
I've also reworked the deep-sleep routines a bit, in order to have a
single entry-point for it.
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/50314
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Julien Egloff <jegloff@scaleway.com>
Acked-by: Florian Florensa <fflorensa@scaleway.com>
________________________________________
/ Just because they are called \
| 'forbidden' transitions does not mean |
| that they are forbidden. They are less |
| allowed than allowed transitions, if |
| you see what I mean. -- From a Part 2 |
\ Quantum Mechanics lecture. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
raft: add soft snapshot feature
This patch add soft snapshot feature: if no logs are received for the
soft_compact_time seconds, a snapshot will be made based on the
soft_compact_after_n value.
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
log: Accept snapshots logs if the term is higher than our snapshot term
It's stucking replication on some clusters in production
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/41159
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
_____________________________________
/ "Nature is very un-American. Nature \
| never hurries." -- William George |
\ Jordan /
-------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
compilation: Add a build.zig compilation file
Working as intended, some small fixes in the code to compile with clang
without warnings.
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/41085
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
________________________________________
/ If I had only known, I would have been \
\ a locksmith. -- Albert Einstein /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
election: Fix follower condition and be less strict on term
Revert "dynamo: Do not discard leader when timeout is reached when using leader-check dynamo like elections"
Fix no_leader count for gray-failures
Add no wake-up if the leader has been recovered from a gray-failure check
Load leader before deep-sleep states in order not to force a wake-up on restarts
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40645
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
dynamo: Do not discard leader when timeout is reached when using leader-check dynamo like elections
Rather, discard the leader when all the nodes respond with a loss and
trigger an election.
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40455
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by: Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by: Julien Egloff <jegloff@scaleway.com>
________________________________________
/ "What do you give a man who has \
| everything?" the pretty teenager asked |
| her mother. "Encouragement, dear," she |
\ replied. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
raft: Add optionnal deep-sleep clusters
This patchs adds a new feature: Deep-sleep clusters. The idea is simple:
When a certain time is reached (conf.deep_sleep_time) without any logs
written, the timeout thresholds are raised by two, for a maximum of 4
times. This should lower the heartbeats per seconds required on large
clusters, but will impact the recovery time of those clusters in case of
a sudden leader-loss.
Also added a persistent state for the leadership and the deep sleep
state; this way, it is reloaded on cluster restart, which can avoid
errors on restarts.
Also added a small fix which avoids the cluster-check routines when an
election is already on-going.
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/40300
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by : Patrik Cyvoct <pcyvoct@scaleway.com>
____________________________________
/ Xerox never comes up with anything \
\ original. /
------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
election: Change the dynano election trigger to a relative majority
We have seen an (rare) issue in production, where an election is never
trigerred if 2 out of 5 nodes are unreachable. That's because a node was
waiting for at least 3 answers (5 - 2), not counting itself, to trigger
an election. This is now fixed, as we wait for a relative majority (5 /
2).
Patch : https://lists.sr.ht/~ne02ptzero/libfloat/patches/39359
Signed-off-by : Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by : Patrik Cyvoct <pcyvoct@scaleway.com>
Acked-by : Julien Egloff <jegloff@scaleway.com>
______________________________________
/ Girls marry for love. Boys marry \
| because of a chronic irritation that |
| causes them to gravitate in the |
| direction of objects with certain |
| curvilinear properties. -- Ashley |
\ Montagu /
--------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
log: Don't trigger an election when an unknown leader have a lower term
Patch: https://lists.sr.ht/~ne02ptzero/libfloat/patches/37045
Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Acked-by : Patrik Cyvoct <pcyvoct@scaleway.com>
_________________________________________
/ The only thing we learn from history is \
| that we learn nothing from history. -- |
| Hegel I know guys can't learn from |
| yesterday ... Hegel must be taking the |
| long view. -- John Brunner, "Stand on |
\ Zanzibar" /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||