-
Notifications
You must be signed in to change notification settings - Fork 136
v1.x RFC: pull based approach #430
Comments
I like the general idea! I agree that a "pull model" is more elegant and probably more flexible pleasant to maintain. How do you see filesystem I/O working in this model? When libraft decides it's time to persist a log entry or take a snapshot, would that actually be done by the caller (i.e. dqlite)? |
Yeah, I feel the same. There are tradeoffs, but they are probably worth.
I would like this to be similar to When I would not expect |
Could you sketch how the API would work and how dqlite would consume it? From my perspective, it has to be a direct win for dqlite. Ideally (still imo) dqlite ultimately will only have 1 way of operating (disk probably) and 1 way of doing things with the raft library and having a very general libraft might not currently fit in dqlite's goals. |
For simplicity let's first start focusing only on the aspects of the API strictly relevant to dqlite as a consumer. One important thing that I'd like to reiterate is that all this can be done in stages and very gradually, over a relatively long period of time, during which we might refine things as we learn and explore more. Stage 0 would be probably no change at all for dqlite itself, in how it consumes libraft. Externally libraft would keep the exact same v0.x API that we have right now, but internally libraft could have already a v1.x API that should make Stage 1 could be dqlite beginning to move away from the push model and adopt the pull model. Basically there are only two v0.x APIs that dqlite (or any other consumer for that matter) depends on: int raft_apply(struct raft *r,
struct raft_apply *req,
const struct raft_buffer bufs[],
const unsigned n,
raft_apply_cb cb); and struct raft_fsm; /* Definition omitted for brevity */ I'll treat them separately.
|
Just wanted to highlight that as I mentioned during our call this v1 version of the API would avoid the tight application coupling we have in v0 when it comes to taking snapshots: in the sketch above there is no mandated buffer that the application has to use, and no mandated way in which snapshots must be taken and persisted, so dqlite could for example write its snapshot to disk incrementally with whatever best strategy is appropriate, or if the dqlite FSM gets eventually backed by storage a copy-on-write approach could be used. |
Signed-off-by: Free Ekanayaka <[email protected]>
The
v0.x
series of libraft has been around for a few years now, and during that time we gained experience around its strengths and weaknesses.I believe there's now enough data that we could aim at introducing a
v1.x
series with better long-term design, extensibility and maintainability.I have a few ideas around this topic which I've been ruminating around for quite some time now, and I'd like to get feedback about them. I'll start by listing what I consider the aspect of the current design that would need to be improved and I'll then add possible approaches. It's not anything radical, but rather a change in the control flow that should however make things more flexible, composable and understandable.
Aspects of v0.x design that could be improved
struct raft_io
andstruct raft_fsm
The initial goal of the
struct raft_io
interface was to make it easy to implement the network and storage parts ofstruct raft
, while keeping those concerns hidden from the point of view ofstruct raft
consumers, which would get a reasonably friendly/intuitive API.The same can be said of
struct raft_fsm
, which was meant to offer an easy way forstruct raft
consumers to plug in their application FSM.However, we observed that there are actually several flavors/styles of FSMs and I/O that we want to support: synchronous vs aynchronous FSM snapshots, in-memory vs disk-based FSMs, chunked snapshots,
io_uring
-style I/O (see below). All that puts a strain onstruct raft_io
andstruct raft_fsm
that must be made generic enough to accommodate all these various mechanics, and that leads to increased complexity.push model
The current callback-based style (which I'll call "push model", because we have user's code that gets called by raft, instead of raft code that gets called by the user) was mainly inspired by
libuv
and other similar asynchronous frameworks in other languages (Javascript and Python).That might be fine for writing asynchronous network clients or mostly-stateless web services (with request/response dynamics and state managed by a db). However it seems to complicate the control flow a bit for complex stateful engines like
struct raft
.This is something that was discussed in
libuv
itself, however it is harder for them to move away from that because of backward compatibility, see libuv/leps#3 and libuv/libuv#6.Interestingly,
io_uring
is instead based on a pull model, which is more flexible and efficient for asynchronous systems, because it gives control to the user and not to the framework.Possible ideas for v1.x
I think both the two concerns above can be addressed by moving to a pull model.
Essentially the control flow would be inverted:
struct raft
would be a pure state machine which does not make any call to external code, instead the user would be responsible to drivestruct raft
and advance its state based on external events (network I/O, FSM, user requests). Such design is actually similar to other raft implementation like etcd's raft.The benefit will be to decouple
struct raft
from any "rigid"struct raft_io
orstruct raft_fsm
interfaces, which would then be in the hand of the consumer and not dictated by libraft (although libraft would still provide conveniences for that). Later down the road it would also make it possible to have a pure and idiomaticio_uring
-based implementation (nolibuv
), which takes full advantage of all the features that theio_uring
approach has with respect toepoll
and friends (pull/completion-based vs push/readiness-based I/O see also https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023 fromio_uring
's author).The current logic of
struct raft
wouldn't need to change much, as this change mainly concerns the "border" ofstruct raft
, i.e. its user-facing APIs that would become pull based instead of push based. I have some ideas about how the pull based API could look like, but first wanted to gather feedback around the high-level approach. I can surely provide concrete API sketches if they help to clarify.Implementation remarks
It would be possible to work towards v1.x slowly and incrementally, without any abrupt change. Being v1.x more flexible and general than v0.x, we could keep supporting v0.x for as long as we want, with some glue/compatibility code.
The text was updated successfully, but these errors were encountered: