Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2027419

Summary:	[RFE] Improve the thread model of nbdkit-retry-filter
Product:	Red Hat Enterprise Linux 9	Reporter:	mxie <mxie>
Component:	nbdkit	Assignee:	Virtualization Maintenance <virt-maint>
Status:	CLOSED MIGRATED	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	9.0	CC:	chhu, eblake, hongzliu, juzhou, lersek, mzhan, rjones, tyan, tzheng, virt-maint, vwu, xiaodwan
Target Milestone:	rc	Keywords:	FutureFeature, MigratedToJIRA, Triaged
Target Release:	---	Flags:	pm-rhel: mirror+
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-07-07 20:51:36 UTC	Type:	Feature Request
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description mxie@redhat.com 2021-11-29 15:32:03 UTC

Description of problem:
[RFE] Improve the thread model of nbdkit-retry-filter

Version-Release number of selected component (if applicable):
nbdkit-1.28.2-2.el9.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Check the thread model of nbdkit-vddk-plugin and nbdkit-retry-filter

# nbdkit vddk --dump-plugin | grep thread
max_thread_model=parallel
thread_model=parallel

# nbdkit vddk --filter=retry --dump-plugin | grep thread
max_thread_model=parallel
thread_model=serialize_requests


Actual results:
The thread model of retry filter will reduce the thread model of vddk plugin during v2v conversion, details please refer to https://bugzilla.redhat.com/show_bug.cgi?id=2018463#c6


Expected results:
As above description

Additional info:

Comment 1 Eric Blake 2021-11-29 16:03:30 UTC

Oooh, that's tricky.  Consider what happens if we allow parallel requests:
thread1            thread2
enter retry_pwrite
 - call next->pwrite
                   enter retry_pwrite
                     - call next->pwrite
 - plugin fails
 - want to restart plugin
                     - plugin completes (may be success or failure)

We can't close the old connection to the plugin until all pending operations finish, then when we connect to the new plugin, we must replay ALL pending requests that were not successful prior to the reconnect, which includes queuing up any data to be written, and so forth.  But just because 

Thinking aloud here: we need some sort of gating.  On entry, every command checks if the connection is live (proceed immediately, and increment semaphore) or pending restart (wait for the connection to come back up).  On success or failure after retries expired, decrement semaphore and return.  On failure (with retries still permitted), toggle the pending restart flag, decrement semaphore, then block until notified of successful restart.  It may be easier with a background thread whose job is to wake up when we have marked a pending restart, wait for the semaphore to drop to 0, then perform the restart and notify all pending threads waiting on the restart.

Comment 2 Eric Blake 2021-11-29 16:05:09 UTC

(In reply to Eric Blake from comment #1)
> We can't close the old connection to the plugin until all pending operations
> finish, then when we connect to the new plugin, we must replay ALL pending
> requests that were not successful prior to the reconnect, which includes
> queuing up any data to be written, and so forth.  But just because 

Sorry for the incomplete thought.

But just because one thread fails does not mean they all will (although it may be likely: if one thread failed because of a dropped connection, the others will likely fail for the same reason, but we only want one restart, not N restarts).

Comment 3 Laszlo Ersek 2021-11-30 07:45:15 UTC

Do we know how the VDDK library behaves when it has queued some requests, those requests are executing asynchronously, and the network connection fails? "nbdkit-vddk-plugin" currently collects requests from all nbdkit threads into a queue, then that queue "fans out" to async vddk operations. Will all those async operations gracefully fail (report error in their completion callbacks) when the network breaks? If the VDDK library gets confused in this situation, then we can't wait for all requests after the first failed one to "drain cleanly", so that we can reconnect to the server and resubmit the requests.

Comment 4 Richard W.M. Jones 2021-11-30 16:49:52 UTC

VDDK so the answer is unlikely to be good.  In this case the TCP connection
eventually times out and is closed and subsequent requests fail.  The retry
filter gets around this by doing a complete reopen of the VDDK connection
(ie. going through VixDiskLib_ConnectEx and _Open again).

What's not clear to me is what happens to the ReadAsync and WriteAsync calls
which are in flight at this time, although again, it being VDDK, the behaviour is
unlikely to be a sensible one.

Let me test this.

Comment 9 Laszlo Ersek 2021-12-01 16:34:58 UTC

Thinking aloud to myself -- this is an approach without a background
thread:

enum conn_state = {
  CST_GOOD,
  CST_BROKEN,
  CST_GIVE_UP
};

static struct retry_state {
  enum conn_state cst;
  unsigned reqs_in_flight;
  unsigned retry_count;
  pthread_mutex_t mutex;
  pthread_cond_t progress;
};

static bool
progress (struct retry_state *rs)
{
  return rs->cst == CST_GOOD ||
         rs->cst == CST_GIVE_UP ||
         rs->reqs_in_flight == 0;
}

static int
process_synch_request (struct retry_state *rs, ...)
{
  int r;
  bool retry = true;

  pthread_mutex_lock (&rs->mutex);

  do {
    bool progress_before_update;

    while (!progress (rs))
      pthread_cond_wait (&rs->progress, &rs->mutex);

    switch (rs->cst) {
    case CST_GOOD:
      rs->reqs_in_flight++;

      pthread_mutex_unlock (&rs->mutex);
      r = request (...);
      pthread_mutex_lock (&rs->mutex);

      progress_before_update = progress (rs);

      rs->reqs_in_flight--;
      if (r == 0) {
        retry = false;
        r->cst = CST_GOOD;
        r->retry_count = 0;
      } else if (rs->cst == CST_GOOD)
        rs->cst = CST_BROKEN;

      if (!progress_before_update && progress (rs))
        pthread_cond_broadcast (&rs->progress);

      break;

    case CST_BROKEN:
      assert (rs->reqs_in_flight == 0);
      if (rs->retry_count < max_retry_count) {
        reconnect (...);
        rs->retry_count++;
        rs->cst = CST_GOOD;
      } else
        rs->cst = CST_GIVE_UP;
      break;

    case CST_GIVE_UP:
      r = -EIO;
      retry = false;
      break;
    }
  } while (retry);

  pthread_mutex_unlock (&rs->mutex);

  return r;
}

Comment 10 Laszlo Ersek 2021-12-02 08:58:11 UTC

Not a functional difference, just a simplification / clarification, for comment 9:

@@ -23,11 +23,12 @@
       progress_before_update = progress (rs);
 
       rs->reqs_in_flight--;
+      assert (rs->cst == CST_GOOD || rs->cst == CST_BROKEN);
       if (r == 0) {
         retry = false;
         r->cst = CST_GOOD;
         r->retry_count = 0;
-      } else if (rs->cst == CST_GOOD)
+      } else
         rs->cst = CST_BROKEN;
 
       if (!progress_before_update && progress (rs))

Comment 11 Laszlo Ersek 2023-03-16 15:11:41 UTC

In the mid term, I'd like to take a stab at this; minimally it should be a learning experience for me about nbdkit. Assigning myself the BZ.