Bug 1730493 - Straits-cli should display blockdev, pool, and filesystem information in sub-optimal conditions and generally have fuller function than it currently does
Summary: Straits-cli should display blockdev, pool, and filesystem information in sub-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: stratis-cli
Version: 8.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: John Baublitz
QA Contact: guazhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-16 21:58 UTC by Dennis Keefe
Modified: 2020-04-28 15:42 UTC (History)
3 users (show)

Fixed In Version: 2.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 15:41:56 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1634 0 None None None 2020-04-28 15:42:06 UTC

Description Dennis Keefe 2019-07-16 21:58:02 UTC
Description of problem:

If there is a device failure in a pool, stratis-cli may only return information saying that it couldn't collect information for that pool.  The stratis-cli doesn't provide any additional information about the other devices, pools, or filesystems that may exist. 

Version-Release number of selected component (if applicable):
RHEL 8.1
Stratisd and Stratis-cli 1.0.4

How reproducible:
Using 2 or more devices, create two stratis pools and one filesystem.

[root@9 ~]# stratis fs
Pool Name  Name  Used      Created            Device           UUID                            
p1         fs1   1.95 GiB  Jul 16 2019 13:30  /stratis/p1/fs1  89a2d0c4e0674c198a10cf2ab1ec8584
[root@9 ~]# stratis pool
Name  Total Physical Size  Total Physical Used
p1                  3 GiB             2.01 GiB
p2                  2 GiB               56 MiB
[root@9 ~]# stratis blockdev
Pool Name  Device Node  Physical Size  State  Tier
p1         /dev/loop3           1 GiB  InUse  Data
p1         /dev/loop4           1 GiB  InUse  Data
p1         /dev/sda             1 GiB  InUse  Data
p2         /dev/loop1           1 GiB  InUse  Data
p2         /dev/loop2           1 GiB  InUse  Data
[root@9 ~]# mount /stratis/p1/fs1 /strat1
[root@9 ~]# rsync --progress --recursive /usr/ /strat1
[root@9 ~]# echo offline > /sys/block/sda/device/state
[root@9 ~]# sync

The sync causes what is left in the page cache to be flushed, but /dev/sda is offline.

As a result the filesystem and device mapper report problems:

kernel: XFS (dm-5): writeback error on sector 1744885488
kernel: device-mapper: thin: 252:3: switching pool to fail mode
kernel: device-mapper: thin metadata: couldn't read superblock
kernel: device-mapper: thin: 252:3: failed to set 'needs_check' flag in metadata
kernel: device-mapper: thin: 252:3: failed to abort metadata transaction
kernel: XFS (dm-5): writeback error on sector 2013366240
kernel: device-mapper: thin metadata: couldn't read superblock
kernel: device-mapper: thin: 252:3: failed to set 'needs_check' flag in metadata
kernel: device-mapper: thin: 252:3: dm_pool_get_metadata_transaction_id returned -22
stratisd[2223]:  WARN libstratis::engine::strat_engine::thinpool::thinpool: Devicemapper could not obtain the status for devicemapper thinpool device 252:3 belonging to pool with UUID c146db62-cb22-4335-aaf6-156ced6db729
stratisd[2223]: ERROR libstratis::engine::strat_engine::thinpool::thinpool: Thinpool status is fail -> Failed

Stratis-cli, when questioned about filesystem, pool, and blockdev returns an execution failure.

[root@9 ~]# stratis fs
Execution failure caused by:
no total physical size computed for pool with uuid c146db62-cb22-4335-aaf6-156ced6db729
[root@c9 ~]# stratis pool
Execution failure caused by:
no total physical size computed for pool with uuid c146db62-cb22-4335-aaf6-156ced6db729
[root@c9 ~]# stratis blockdev
Execution failure caused by:
no total physical size computed for pool with uuid c146db62-cb22-4335-aaf6-156ced6db729

In the logs Stratisd dumps its configuration and it knows that the pool is in a failed state.  It also knows
what is configured for filesystem, pool, and blockdev.

stratisd[2223]:  INFO stratisd: Dump timer expired, dumping state
stratisd[2223]: DEBUG stratisd: Engine state:
...    
                 pool_state: Failed,
...

Expected results:

Stratis-cli should be able to provide most of the filesystem, pool, and device information 
even if it detects a problem with one or a few data points.  In this case, stratis-cli
could also report that the pool has failed.

Comment 1 mulhern 2019-07-17 16:05:00 UTC
Note that the stratis-cli failure to report information
for existing pools in good condition is caused by
the behavior of the D-Bus layer in stratisd.

So, although the problem is described as being
one of the CLI, a significant portion of the
work must occur in stratisd (as well as in the CLI).

There are some upstream issues and a PR:
https://github.com/stratis-storage/stratisd/issues/1148
https://github.com/stratis-storage/stratisd/issues/1267

https://github.com/stratis-storage/stratisd/pull/1540

Comment 2 mulhern 2019-07-24 16:22:15 UTC
New, omnibus, upstream issue: https://github.com/stratis-storage/project/issues/52.

Comment 3 mulhern 2019-07-31 13:43:46 UTC
Also see: https://bugzilla.redhat.com/show_bug.cgi?id=1679818#c8.

The problem here is that virtually no CLI commands will fail once it becomes impossible to use the D-Bus GetManagedObjects call to obtain information about the current state.

Comment 4 Jakub Krysl 2019-10-02 11:32:58 UTC
Mass migration to Guangwu.

Comment 5 mulhern 2019-10-02 13:01:50 UTC
This is really more in assigned than post. All the teardown work has been done, but the build up is waiting to happen.

Comment 6 mulhern 2019-10-21 21:37:36 UTC
Upstream PR: https://github.com/stratis-storage/stratisd/pull/1662

Comment 8 guazhang@redhat.com 2019-11-26 04:07:57 UTC
Hello

test pass with the fixed version 

#  stratis fs
Pool Name  Name  Used      Created            Device           UUID                            
p1         fs1   3.16 GiB  Nov 25 2019 22:57  /stratis/p1/fs1  1c345365d52f4f47ba5da617a37a5b67
# stratis pool
Name                     Total Physical
p1     21.83 TiB / 3.23 GiB / 21.83 TiB
p2    16.37 TiB / 65.18 MiB / 16.37 TiB
#  stratis blockdev
Pool Name  Device Node  Physical Size  Tier
p1         /dev/sdb          5.46 TiB  Data
p1         /dev/sdc          5.46 TiB  Data
p1         /dev/sdd          5.46 TiB  Data
p1         /dev/sde          5.46 TiB  Data
p2         /dev/sdf          5.46 TiB  Data
p2         /dev/sdg          5.46 TiB  Data
p2         /dev/sdh          5.46 TiB  Data
 
# mount /stratis/p1/fs1 /strat1
# rsync --progress --recursive /usr/ /strat1
# echo offline > /sys/block/sdb/device/state
# sync

# cat  /sys/block/sdb/device/state
offline


#  stratis blockdev
Pool Name  Device Node  Physical Size  Tier
p1         /dev/sdb          5.46 TiB  Data
p1         /dev/sdc          5.46 TiB  Data
p1         /dev/sdd          5.46 TiB  Data
p1         /dev/sde          5.46 TiB  Data
p2         /dev/sdf          5.46 TiB  Data
p2         /dev/sdg          5.46 TiB  Data
p2         /dev/sdh          5.46 TiB  Data

Comment 10 errata-xmlrpc 2020-04-28 15:41:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1634


Note You need to log in before you can comment on or make changes to this bug.