Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1679818

Summary: stratisd panics due to single device failure: "thread 'main' panicked at 'Kernel must return at least 8 values from thin pool status'"
Product: Red Hat Enterprise Linux 8 Reporter: Corey Marthaler <cmarthal>
Component: stratisdAssignee: mulhern <amulhern>
Status: CLOSED ERRATA QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0CC: amulhern, coughlan, rhandlin
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 21:04:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2019-02-21 22:59:03 UTC
Description of problem:

[root@hayes-01 ~]# stratis blockdev list
Pool Name  Device Node         Physical Size   State   Tier
my_pool1   /dev/sde1                1.82 TiB  In-use   Data
my_pool2   /dev/sdf1                1.82 TiB  In-use   Data
my_pool2   /dev/sdg1                1.82 TiB  In-use   Data
my_pool2   /dev/sdh1                1.82 TiB  In-use   Data
my_pool3   /dev/sdd1                1.82 TiB  In-use   Data
my_pool4   /dev/sdb1                1.82 TiB  In-use   Data
my_pool4   /dev/sdc1                1.82 TiB  In-use  Cache
my_pool4   /dev/sdi1                1.82 TiB  In-use   Data
my_pool5   /dev/MY_VG/lvol0         1.82 TiB  In-use   Data

[root@hayes-01 ~]# df -h
Filesystem                                                                                       Size  Used Avail Use% Mounted on
/dev/mapper/stratis-1-3354bab7fa444f1294b7fd807127781c-thin-fs-d9fbf3d0e9c44e69bc33e69862efa655  1.0T  7.2G 1017G   1% /mnt/stratis1
/dev/mapper/stratis-1-14b9879b40c2497187b0c76c446804d0-thin-fs-041acd0d91bd4fa79bf10457b6369c24  1.0T  7.2G 1017G   1% /mnt/stratis2
/dev/mapper/stratis-1-8d7300e29a4c46048b00e059efe3a2e3-thin-fs-092e282222244aa0bce3c6b1a56e20e0  1.0T  7.2G 1017G   1% /mnt/stratis3
/dev/mapper/stratis-1-d76c5196f2e8492092a48ef2e2ed89ff-thin-fs-7a10d4f7b3b24d4aadc65cd8bb5612f2  1.0T  7.2G 1017G   1% /mnt/stratis4
/dev/mapper/stratis-1-36a22ef6d12347dfb0d58590e0a19de4-thin-fs-a9606e209a2b4963b3092fc3bc8c7018  1.0T  7.2G 1017G   1% /mnt/stratis5

# umounted fs who's device is about to be failed
[root@hayes-01 ~]# umount /mnt/stratis5



# FAILED /dev/sdj



Feb 21 16:41:07 hayes-01 kernel: XFS (dm-33): Unmounting Filesystem
Feb 21 16:42:02 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:02 hayes-01 kernel: print_req_error: 248 callbacks suppressed
Feb 21 16:42:02 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 2088 flags 0
Feb 21 16:42:02 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:02 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 40 flags 0
Feb 21 16:42:02 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:02 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 2088 flags 0
Feb 21 16:42:02 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:02 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 2088 flags 0
Feb 21 16:42:02 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:02 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 40 flags 0
Feb 21 16:42:14 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:14 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 10280 flags 1
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin: 253:32: metadata operation 'dm_pool_commit_metadata' failed: error = -5
Feb 21 16:42:14 hayes-01 systemd[1]: Starting dnf makecache...
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin: 253:32: aborting current metadata transaction
Feb 21 16:42:14 hayes-01 kernel: sd 0:2:9:0: rejecting I/O to offline device
Feb 21 16:42:14 hayes-01 kernel: print_req_error: I/O error, dev sdj, sector 10280 flags 0
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin: 253:32: failed to abort metadata transaction
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin: 253:32: switching pool to fail mode
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin metadata: couldn't read superblock
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin: 253:32: failed to set 'needs_check' flag in metadata
Feb 21 16:42:14 hayes-01 kernel: device-mapper: thin: 253:32: dm_pool_get_metadata_transaction_id returned -22
Feb 21 16:42:14 hayes-01 stratisd[20221]: thread 'main' panicked at 'Kernel must return at least 8 values from thin pool status', /builddir/build/BUILD/stratisd-1.0.3/vendor/devicemapper/src/thinpooldev.rs:488:9
Feb 21 16:42:14 hayes-01 stratisd[20221]: note: Run with `RUST_BACKTRACE=1` for a backtrace.
Feb 21 16:42:14 hayes-01 systemd[1]: stratisd.service: Main process exited, code=exited, status=101/n/a
Feb 21 16:42:14 hayes-01 systemd[1]: stratisd.service: Failed with result 'exit-code'.

[root@hayes-01 ~]# systemctl status stratisd
รข stratisd.service - A daemon that manages a pool of block devices to create flexible file systems
   Loaded: loaded (/usr/lib/systemd/system/stratisd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2019-02-21 16:42:14 CST; 8min ago
     Docs: man:stratisd(8)
  Process: 20221 ExecStart=/usr/libexec/stratisd --debug (code=exited, status=101)
 Main PID: 20221 (code=exited, status=101)

Feb 21 16:35:40 hayes-01.lab.msp.redhat.com stratisd[20221]:         }: 0,
Feb 21 16:35:40 hayes-01.lab.msp.redhat.com stratisd[20221]:         DmNameBuf {
Feb 21 16:35:40 hayes-01.lab.msp.redhat.com stratisd[20221]:             inner: "stratis-1-private-8d7300e29a4c46048b00e059efe3a2e3-flex-thinmeta"
Feb 21 16:35:40 hayes-01.lab.msp.redhat.com stratisd[20221]:         }: 0
Feb 21 16:35:40 hayes-01.lab.msp.redhat.com stratisd[20221]:     }
Feb 21 16:35:40 hayes-01.lab.msp.redhat.com stratisd[20221]: }
Feb 21 16:42:14 hayes-01.lab.msp.redhat.com stratisd[20221]: thread 'main' panicked at 'Kernel must return at least 8 values from thin pool status', /builddir/build/BUILD/stratisd-1.0.3/vendor/devicemapper/src/thinpooldev.rs:488:9
Feb 21 16:42:14 hayes-01.lab.msp.redhat.com stratisd[20221]: note: Run with `RUST_BACKTRACE=1` for a backtrace.
Feb 21 16:42:14 hayes-01.lab.msp.redhat.com systemd[1]: stratisd.service: Main process exited, code=exited, status=101/n/a
Feb 21 16:42:14 hayes-01.lab.msp.redhat.com systemd[1]: stratisd.service: Failed with result 'exit-code'.


[root@hayes-01 ~]# stratis pool list
Execution failure caused by:
Message recipient disconnected from message bus without replying

[root@hayes-01 ~]# stratis fs list
Execution failure caused by:
Could not get owner of name 'org.storage.stratis1': no such name
    which in turn caused:
The name org.storage.stratis1 was not provided by any .service files

Most likely stratis is unable to connect to the stratisd D-Bus service.




Version-Release number of selected component (if applicable):
stratisd-1.0.3-1.el8.x86_64
stratis-cli-1.0.2-1.el8.noarch

Comment 2 mulhern 2019-02-27 21:25:30 UTC
We have a further upstream issue, https://github.com/stratis-storage/devicemapper-rs/issues/431, as we have reproduced
the problem and obtained a bit further information on what that unexpected value actually is using the code merged
in https://github.com/stratis-storage/devicemapper-rs/pull/417.

Comment 3 mulhern 2019-03-01 13:57:23 UTC
Also, further upstream issue: https://github.com/stratis-storage/stratisd/issues/1148.

Comment 4 mulhern 2019-04-17 13:10:52 UTC
Newest release of devicemapper includes fix to properly parse unparsed ioctl return value which was causing assertion error: https://github.com/stratis-storage/devicemapper-rs/issues/431.

Comment 5 mulhern 2019-04-17 13:12:14 UTC
stratisd PR which handles devicemapper changes merged: https://github.com/stratis-storage/stratisd/pull/1461.

Comment 8 Jakub Krysl 2019-07-31 08:13:56 UTC
Tested with stratisd-1.0.4-2.el8.x86_64.

stratisd no longer panics and keeps running, but stratis-cli is unusable. The only action I found that is still working is creating new pool. Anything else including trying to remove the new pool ends with:
no total physical size computed for pool with uuid 7386cbb0-b9ab-476b-81d8-55e5818c66b0

This behaviour persist even after restarting stratisd.

Restarting the system with either reconnecting the device or keeping the device removed results in stratis-cli showing no issues and all commands finally working. If the device is not there, stratisd just complains about inability to bring up the pool. If it is there (tested with iSCSI device - disconnecting and connecting) stratisd brings it up and it is working.



This seems like another issue hidden under the panic, because the panic is gone and stratisd keeps running. Anne, is that correct?
If so, I can close this one as verified and open a new one for stratis-cli.
Thanks

Comment 9 mulhern 2019-07-31 13:44:29 UTC
Jakub,

You're right, we have eliminated the panic, but that exposes a new problem. We already have a new bz for this problem: https://bugzilla.redhat.com/show_bug.cgi?id=1730493.

Comment 10 Jakub Krysl 2019-07-31 14:04:36 UTC
Anne,

Thanks for the fast response and link. I completely forgot about that BZ...

Setting this to VERIFIED as the panic is fixed.

Comment 13 errata-xmlrpc 2019-11-05 21:04:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3414