Bug 2027419
| Summary: | [RFE] Improve the thread model of nbdkit-retry-filter | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | mxie <mxie> |
| Component: | nbdkit | Assignee: | Virtualization Maintenance <virt-maint> |
| Status: | CLOSED MIGRATED | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 9.0 | CC: | chhu, eblake, hongzliu, juzhou, lersek, mzhan, rjones, tyan, tzheng, virt-maint, vwu, xiaodwan |
| Target Milestone: | rc | Keywords: | FutureFeature, MigratedToJIRA, Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-07 20:51:36 UTC | Type: | Feature Request |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
mxie@redhat.com
2021-11-29 15:32:03 UTC
Oooh, that's tricky. Consider what happens if we allow parallel requests:
thread1 thread2
enter retry_pwrite
- call next->pwrite
enter retry_pwrite
- call next->pwrite
- plugin fails
- want to restart plugin
- plugin completes (may be success or failure)
We can't close the old connection to the plugin until all pending operations finish, then when we connect to the new plugin, we must replay ALL pending requests that were not successful prior to the reconnect, which includes queuing up any data to be written, and so forth. But just because
Thinking aloud here: we need some sort of gating. On entry, every command checks if the connection is live (proceed immediately, and increment semaphore) or pending restart (wait for the connection to come back up). On success or failure after retries expired, decrement semaphore and return. On failure (with retries still permitted), toggle the pending restart flag, decrement semaphore, then block until notified of successful restart. It may be easier with a background thread whose job is to wake up when we have marked a pending restart, wait for the semaphore to drop to 0, then perform the restart and notify all pending threads waiting on the restart.
(In reply to Eric Blake from comment #1) > We can't close the old connection to the plugin until all pending operations > finish, then when we connect to the new plugin, we must replay ALL pending > requests that were not successful prior to the reconnect, which includes > queuing up any data to be written, and so forth. But just because Sorry for the incomplete thought. But just because one thread fails does not mean they all will (although it may be likely: if one thread failed because of a dropped connection, the others will likely fail for the same reason, but we only want one restart, not N restarts). Do we know how the VDDK library behaves when it has queued some requests, those requests are executing asynchronously, and the network connection fails? "nbdkit-vddk-plugin" currently collects requests from all nbdkit threads into a queue, then that queue "fans out" to async vddk operations. Will all those async operations gracefully fail (report error in their completion callbacks) when the network breaks? If the VDDK library gets confused in this situation, then we can't wait for all requests after the first failed one to "drain cleanly", so that we can reconnect to the server and resubmit the requests. VDDK so the answer is unlikely to be good. In this case the TCP connection eventually times out and is closed and subsequent requests fail. The retry filter gets around this by doing a complete reopen of the VDDK connection (ie. going through VixDiskLib_ConnectEx and _Open again). What's not clear to me is what happens to the ReadAsync and WriteAsync calls which are in flight at this time, although again, it being VDDK, the behaviour is unlikely to be a sensible one. Let me test this. Thinking aloud to myself -- this is an approach without a background
thread:
enum conn_state = {
CST_GOOD,
CST_BROKEN,
CST_GIVE_UP
};
static struct retry_state {
enum conn_state cst;
unsigned reqs_in_flight;
unsigned retry_count;
pthread_mutex_t mutex;
pthread_cond_t progress;
};
static bool
progress (struct retry_state *rs)
{
return rs->cst == CST_GOOD ||
rs->cst == CST_GIVE_UP ||
rs->reqs_in_flight == 0;
}
static int
process_synch_request (struct retry_state *rs, ...)
{
int r;
bool retry = true;
pthread_mutex_lock (&rs->mutex);
do {
bool progress_before_update;
while (!progress (rs))
pthread_cond_wait (&rs->progress, &rs->mutex);
switch (rs->cst) {
case CST_GOOD:
rs->reqs_in_flight++;
pthread_mutex_unlock (&rs->mutex);
r = request (...);
pthread_mutex_lock (&rs->mutex);
progress_before_update = progress (rs);
rs->reqs_in_flight--;
if (r == 0) {
retry = false;
r->cst = CST_GOOD;
r->retry_count = 0;
} else if (rs->cst == CST_GOOD)
rs->cst = CST_BROKEN;
if (!progress_before_update && progress (rs))
pthread_cond_broadcast (&rs->progress);
break;
case CST_BROKEN:
assert (rs->reqs_in_flight == 0);
if (rs->retry_count < max_retry_count) {
reconnect (...);
rs->retry_count++;
rs->cst = CST_GOOD;
} else
rs->cst = CST_GIVE_UP;
break;
case CST_GIVE_UP:
r = -EIO;
retry = false;
break;
}
} while (retry);
pthread_mutex_unlock (&rs->mutex);
return r;
}
Not a functional difference, just a simplification / clarification, for comment 9: @@ -23,11 +23,12 @@ progress_before_update = progress (rs); rs->reqs_in_flight--; + assert (rs->cst == CST_GOOD || rs->cst == CST_BROKEN); if (r == 0) { retry = false; r->cst = CST_GOOD; r->retry_count = 0; - } else if (rs->cst == CST_GOOD) + } else rs->cst = CST_BROKEN; if (!progress_before_update && progress (rs)) In the mid term, I'd like to take a stab at this; minimally it should be a learning experience for me about nbdkit. Assigning myself the BZ. |