Bug 2173054
Summary: | [abrt] nbdkit-server: raw_send_socket(): nbdkit killed by SIGABRT | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Eric Blake <eblake> |
Component: | nbdkit | Assignee: | Eric Blake <eblake> |
Status: | CLOSED ERRATA | QA Contact: | mxie <mxie> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 9.2 | CC: | eblake, extras-qa, lersek, mxie, rjones, tzheng, virt-maint, vwu, xiaodwan |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
URL: | https://retrace.fedoraproject.org/faf/reports/bthash/7ab51c0d92763b4d5af6b7989e05708c45a535e8 | ||
Whiteboard: | abrt_hash:363ab534cc1daec2aa5c7dd72faf2da433da482f;VARIANT_ID=workstation; | ||
Fixed In Version: | nbdkit-1.33.11-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 2173047 | Environment: | |
Last Closed: | 2023-11-07 08:28:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2168629 | ||
Bug Blocks: |
Description
Eric Blake
2023-02-23 20:00:55 UTC
Unfortunately Monday is the beginning of the exception phase for RHEL 9.2. If we had exception+ then we could fix this. It probably does not affect virt-v2v or any layered products. But we're still waiting on Eric's analysis of the bug, and that might change if he things it is more serious. nbdkit: pattern.1: debug: error-inject: pread count=262144 offset=4194304 nbdkit: pattern.1: error: injecting EIO error into pread nbdkit: pattern.1: debug: sending error reply: Input/output error nbdkit: pattern.0: debug: pattern: pread count=262144 offset=4456448 nbdkit: pattern.2: error: write data: NBD_CMD_READ: Broken pipe nbdkit: pattern.2: debug: exiting worker thread pattern.2 nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed. Looks like nbdcopy is peppering the server with multiple requests, but hanging up early as soon as one request hits EIO. Other pending requests that do succeed happen to get EPIPE because the client is already gone, and change sock to -1 to reflect this fact, even before we can detect clean shutdown. Perhaps libnbd can be nicer and send NBD_CMD_DISC after read errors rather than abruptly hanging up, but the server should NOT be crashing. Fortunately, the crash is only on exit (the use of --exit-with-parent shows that no other client will be trying to connect), rather than during the data-serving phase. I'm playing with ideas how to patch upstream... The crash is more serious when --exit-with-parent is not in use. If a single server allows parallel clients, any one of the clients can trigger the EPIPE/SIGABRT scenario by hanging up early with large in-flight read requests, which then tears down that connection, but where the SIGABRT then tears down the entire nbdkit process and denies service to all other currently-connected clients. I'm not sure if that ranks as a CVE, though - either you are using TLS (so the only client that can trigger the problem already has the same privileges as all other clients that were able to connect - no privilege escalation boundary), or you are not (at which point, there's plenty of other ways for one client to starve others, whether or not we patch this SIGABRT). Upstream patch proposed: https://listman.redhat.com/archives/libguestfs/2023-February/030855.html It may also be worth cloning this bug to libnbd to have nbdcopy gracefully consume ALL pending requests and issue a clean NBD_CMD_DISC, rather than abruptly hanging up on the server on the first EIO, since not all NBD servers might be as graceful as we intend for nbdkit to behave. Laszlo asked me a question which led me to find a potential data corruption bug introduced at the same time as the assertion failure, if a second client connects in the window between when thread 1 of the first client checks the connection status, thread 2 of the first client kills the connection, then thread 1 tries to flush its pending output buffer on the stale fd now pointing to the socket allocated by the second client connecting. https://listman.redhat.com/archives/libguestfs/2023-February/030871.html The window is rather narrow, so it is hard to argue whether a client could actually intentionally abuse it to the point of corrupting data of a peer client rather than crashing nbdkit with an assertion failure, but this race should be fixed at the same time. Reproduce the bug with nbdkit-1.32.5-4.el9.x86_64 and libnbd-1.14.2-1.el9.x86_64 Steps to reproduce: 1.# nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null: ..... nbdkit: pattern.1: error: injecting EIO error into pread nbdkit: pattern.14: debug: error-inject: pread count=262144 offset=4718592 nbdkit: pattern.14: debug: pattern: pread count=262144 offset=4718592 nbdkit: pattern.1: debug: sending error reply: Input/output error nbdkit: pattern.4: debug: error-inject: pread count=262144 offset=4980736 nbdkit: pattern.4: debug: pattern: pread count=262144 offset=4980736 nbdkit: pattern.3: error: write data: NBD_CMD_READ: Broken pipe nbdkit: pattern.3: debug: exiting worker thread pattern.3 nbdkit: pattern.0: debug: exiting worker thread pattern.0 nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed. nbdkit: pattern.14: debug: exiting worker thread pattern.14 Result: Third thread will trigger the assertion failure when nbdkit client hangs up abruptly Verify the bug with nbdkit-server-1.33.11-1.el9.x86_64 and libnbd-1.15.12-1.el9.x86_64 Steps: 1. #nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null: nbdkit: pattern.5: error: write error reply: Bad file descriptor nbdkit: pattern.5: debug: exiting worker thread pattern.5 nbdkit: pattern.9: error: write reply: NBD_CMD_READ: Bad file descriptor nbdkit: pattern.9: debug: exiting worker thread pattern.9 nbdkit: pattern.10: error: write error reply: Bad file descriptor nbdkit: pattern.15: debug: exiting worker thread pattern.15 nbdkit: pattern.10: debug: exiting worker thread pattern.10 nbdkit: pattern.11: error: write reply: NBD_CMD_READ: Bad file descriptor nbdkit: pattern.12: error: write reply: NBD_CMD_READ: Bad file descriptor nbdkit: pattern.11: debug: exiting worker thread pattern.11 nbdkit: pattern.12: debug: exiting worker thread pattern.12 nbdkit: pattern[1]: debug: error-inject: finalize nbdkit: pattern[1]: debug: pattern: finalize nbdkit: debug: error-inject: cleanup nbdkit: debug: pattern: cleanup nbdkit: debug: pattern: unload plugin nbdkit: debug: error-inject: unload filter Result: nbdkit can exit gracefully This bug is in a confusing state. Shouldn't it be added to an erratum (automatically)? Verify the bug with nbdkit-server-1.34.1-1.el9.x86_64 and libnbd-1.16.0-1.el9.x86_64 Steps: 1. #nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null: .... nbdkit: pattern.10: error: write reply: NBD_CMD_READ: Bad file descriptor nbdkit: pattern.10: debug: exiting worker thread pattern.10 nbdkit: pattern.12: error: write error reply: Bad file descriptor nbdkit: pattern.12: debug: exiting worker thread pattern.12 nbdkit: pattern.13: error: write reply: NBD_CMD_READ: Bad file descriptor nbdkit: pattern.13: debug: exiting worker thread pattern.13 nbdkit: pattern.7: error: write error reply: Bad file descriptor nbdkit: pattern.7: debug: exiting worker thread pattern.7 nbdkit: pattern.8: error: write error reply: Bad file descriptor nbdkit: pattern.8: debug: exiting worker thread pattern.8 nbdkit: pattern.9: error: write error reply: Bad file descriptor nbdkit: pattern.9: debug: exiting worker thread pattern.9 nbdkit: pattern[1]: debug: error-inject: finalize nbdkit: pattern[1]: debug: pattern: finalize nbdkit: debug: error-inject: cleanup nbdkit: debug: pattern: cleanup nbdkit: debug: pattern: unload plugin nbdkit: debug: error-inject: unload filter Result: nbdkit can exit gracefully, move the bug from ON_QA to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (nbdkit bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6374 |