RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2173054 - [abrt] nbdkit-server: raw_send_socket(): nbdkit killed by SIGABRT
Summary: [abrt] nbdkit-server: raw_send_socket(): nbdkit killed by SIGABRT
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: nbdkit
Version: 9.2
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: mxie@redhat.com
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:363ab534cc1daec2aa5c7dd72fa...
Depends On: 2168629
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-23 20:00 UTC by Eric Blake
Modified: 2023-11-07 09:27 UTC (History)
9 users (show)

Fixed In Version: nbdkit-1.33.11-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2173047
Environment:
Last Closed: 2023-11-07 08:28:48 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-149854 0 None None None 2023-02-23 20:01:31 UTC
Red Hat Product Errata RHBA-2023:6374 0 None None None 2023-11-07 08:29:02 UTC

Description Eric Blake 2023-02-23 20:00:55 UTC
+++ This bug was initially created as a clone of Bug #2173047 +++

Description of problem:
'make check' in libnbd project but system-installed nbdkit

Version-Release number of selected component:
nbdkit-server-1.32.5-1.fc37

Additional info:
reporter:       libreport-2.17.4
backtrace_rating: 4
cgroup:         0::/user.slice/user-14986.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-d6fe51c3-4e81-4e8f-8775-2b8c418b0fcc.scope
cmdline:        nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5
crash_function: raw_send_socket
executable:     /usr/sbin/nbdkit
journald_cursor: s=1be4dd20d4854712bab1191d895af0dc;i=4a3ad5;b=c162492451d944938c784f4627ff93a6;m=693923ae92;t=5f5626ed38091;x=72b7f4e687039a6c
kernel:         6.1.11-200.fc37.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            14986

--- Additional comment from Eric Blake on 2023-02-23 12:26:51 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:26:52 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:26:54 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:26:55 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:26:56 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:26:58 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:26:59 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:27:00 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:27:02 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:27:03 MST ---



--- Additional comment from Eric Blake on 2023-02-23 12:59:17 MST ---

Reproduced with:
$ nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null:
...
nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed.

using libnbd-1.15.9-2.fc38.x86_64, nbdkit-1.33.8-1.fc38.x86_64

The libnbd testsuite is silently continuing in spite of the nbdkit assertion failure.

The failure itself is in raw_send_socket(), in an assertion added in commit daef505e

Comment 1 Richard W.M. Jones 2023-02-23 20:04:25 UTC
Unfortunately Monday is the beginning of the exception phase
for RHEL 9.2.  If we had exception+ then we could fix this.

It probably does not affect virt-v2v or any layered products.

But we're still waiting on Eric's analysis of the bug, and
that might change if he things it is more serious.

Comment 2 Eric Blake 2023-02-23 20:08:44 UTC
nbdkit: pattern.1: debug: error-inject: pread count=262144 offset=4194304
nbdkit: pattern.1: error: injecting EIO error into pread
nbdkit: pattern.1: debug: sending error reply: Input/output error
nbdkit: pattern.0: debug: pattern: pread count=262144 offset=4456448
nbdkit: pattern.2: error: write data: NBD_CMD_READ: Broken pipe
nbdkit: pattern.2: debug: exiting worker thread pattern.2
nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed.

Looks like nbdcopy is peppering the server with multiple requests, but hanging up early as soon as one request hits EIO.  Other pending requests that do succeed happen to get EPIPE because the client is already gone, and change sock to -1 to reflect this fact, even before we can detect clean shutdown.  Perhaps libnbd can be nicer and send NBD_CMD_DISC after read errors rather than abruptly hanging up, but the server should NOT be crashing.  Fortunately, the crash is only on exit (the use of --exit-with-parent shows that no other client will be trying to connect), rather than during the data-serving phase.  I'm playing with ideas how to patch upstream...

Comment 3 Eric Blake 2023-02-23 20:30:28 UTC
The crash is more serious when --exit-with-parent is not in use.  If a single server allows parallel clients, any one of the clients can trigger the EPIPE/SIGABRT scenario by hanging up early with large in-flight read requests, which then tears down that connection, but where the SIGABRT then tears down the entire nbdkit process and denies service to all other currently-connected clients.  I'm not sure if that ranks as a CVE, though - either you are using TLS (so the only client that can trigger the problem already has the same privileges as all other clients that were able to connect - no privilege escalation boundary), or you are not (at which point, there's plenty of other ways for one client to starve others, whether or not we patch this SIGABRT).

Comment 4 Eric Blake 2023-02-23 20:41:42 UTC
Upstream patch proposed:
https://listman.redhat.com/archives/libguestfs/2023-February/030855.html

Comment 5 Eric Blake 2023-02-23 20:44:19 UTC
It may also be worth cloning this bug to libnbd to have nbdcopy gracefully consume ALL pending requests and issue a clean NBD_CMD_DISC, rather than abruptly hanging up on the server on the first EIO, since not all NBD servers might be as graceful as we intend for nbdkit to behave.

Comment 6 Eric Blake 2023-02-24 17:15:04 UTC
Laszlo asked me a question which led me to find a potential data corruption bug introduced at the same time as the assertion failure, if a second client connects in the window between when thread 1 of the first client checks the connection status, thread 2 of the first client kills the connection, then thread 1 tries to flush its pending output buffer on the stale fd now pointing to the socket allocated by the second client connecting.
https://listman.redhat.com/archives/libguestfs/2023-February/030871.html

The window is rather narrow, so it is hard to argue whether a client could actually intentionally abuse it to the point of corrupting data of a peer client rather than crashing nbdkit with an assertion failure, but this race should be fixed at the same time.

Comment 9 mxie@redhat.com 2023-03-22 14:43:05 UTC
Reproduce the bug with nbdkit-1.32.5-4.el9.x86_64 and libnbd-1.14.2-1.el9.x86_64

Steps to reproduce:
1.# nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null:
.....
nbdkit: pattern.1: error: injecting EIO error into pread
nbdkit: pattern.14: debug: error-inject: pread count=262144 offset=4718592
nbdkit: pattern.14: debug: pattern: pread count=262144 offset=4718592
nbdkit: pattern.1: debug: sending error reply: Input/output error
nbdkit: pattern.4: debug: error-inject: pread count=262144 offset=4980736
nbdkit: pattern.4: debug: pattern: pread count=262144 offset=4980736
nbdkit: pattern.3: error: write data: NBD_CMD_READ: Broken pipe
nbdkit: pattern.3: debug: exiting worker thread pattern.3
nbdkit: pattern.0: debug: exiting worker thread pattern.0
nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed.
nbdkit: pattern.14: debug: exiting worker thread pattern.14

Result:  Third thread will trigger the assertion failure  when nbdkit client hangs up abruptly

Verify the bug with nbdkit-server-1.33.11-1.el9.x86_64 and libnbd-1.15.12-1.el9.x86_64

Steps:
1. #nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null:
nbdkit: pattern.5: error: write error reply: Bad file descriptor
nbdkit: pattern.5: debug: exiting worker thread pattern.5
nbdkit: pattern.9: error: write reply: NBD_CMD_READ: Bad file descriptor
nbdkit: pattern.9: debug: exiting worker thread pattern.9
nbdkit: pattern.10: error: write error reply: Bad file descriptor
nbdkit: pattern.15: debug: exiting worker thread pattern.15
nbdkit: pattern.10: debug: exiting worker thread pattern.10
nbdkit: pattern.11: error: write reply: NBD_CMD_READ: Bad file descriptor
nbdkit: pattern.12: error: write reply: NBD_CMD_READ: Bad file descriptor
nbdkit: pattern.11: debug: exiting worker thread pattern.11
nbdkit: pattern.12: debug: exiting worker thread pattern.12
nbdkit: pattern[1]: debug: error-inject: finalize
nbdkit: pattern[1]: debug: pattern: finalize
nbdkit: debug: error-inject: cleanup
nbdkit: debug: pattern: cleanup
nbdkit: debug: pattern: unload plugin
nbdkit: debug: error-inject: unload filter

Result: nbdkit can exit gracefully

Comment 11 Richard W.M. Jones 2023-04-12 16:46:38 UTC
This bug is in a confusing state.  Shouldn't it be added to an erratum (automatically)?

Comment 14 mxie@redhat.com 2023-05-05 03:52:19 UTC
Verify the bug with nbdkit-server-1.34.1-1.el9.x86_64 and libnbd-1.16.0-1.el9.x86_64

Steps:
1. #nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null:
....
nbdkit: pattern.10: error: write reply: NBD_CMD_READ: Bad file descriptor
nbdkit: pattern.10: debug: exiting worker thread pattern.10
nbdkit: pattern.12: error: write error reply: Bad file descriptor
nbdkit: pattern.12: debug: exiting worker thread pattern.12
nbdkit: pattern.13: error: write reply: NBD_CMD_READ: Bad file descriptor
nbdkit: pattern.13: debug: exiting worker thread pattern.13
nbdkit: pattern.7: error: write error reply: Bad file descriptor
nbdkit: pattern.7: debug: exiting worker thread pattern.7
nbdkit: pattern.8: error: write error reply: Bad file descriptor
nbdkit: pattern.8: debug: exiting worker thread pattern.8
nbdkit: pattern.9: error: write error reply: Bad file descriptor
nbdkit: pattern.9: debug: exiting worker thread pattern.9
nbdkit: pattern[1]: debug: error-inject: finalize
nbdkit: pattern[1]: debug: pattern: finalize
nbdkit: debug: error-inject: cleanup
nbdkit: debug: pattern: cleanup
nbdkit: debug: pattern: unload plugin
nbdkit: debug: error-inject: unload filter


Result: nbdkit can exit gracefully, move the bug from ON_QA to VERIFIED

Comment 16 errata-xmlrpc 2023-11-07 08:28:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nbdkit bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6374


Note You need to log in before you can comment on or make changes to this bug.