Bug 1596129
Summary: | dmstats list fails after dmstats delete | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jakub Krysl <jkrysl> | ||||
Component: | lvm2 | Assignee: | Bryn M. Reeves <bmr> | ||||
lvm2 sub component: | dmsetup | QA Contact: | cluster-qe <cluster-qe> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | agk, bmr, cluster-qe, cmarthal, heinzm, jbrassow, jkrysl, mcsontos, msnitzer, prajnoha, rhandlin, zkabelac | ||||
Version: | 7.6 | Keywords: | Regression | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.180-2.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-30 11:03:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jakub Krysl
2018-06-28 09:52:00 UTC
There's another report of what appears to be the same behaviour in bug 1591504. At first sight this looks like either a change in the kernel, or in the lower level device-mapper code: libdm-stats has not changed at all in 7.6 vs. 7.5, aside from three minor correctness fixes. Could you post the output of "dmsetup info -c"? This would show what device is what: the error appears to happen during processing of device dm-0: dm info (253:0) [ opencount flush ] [16384] (*1) dm message (253:0) [ opencount flush ] @stats_list dmstats [16384] (*1) dm_stats_walk_init: initialised flags to 4000000000000 starting stats walk with GROUP dm_stats_walk_init: initialised flags to f000000000000 starting stats walk with AREA REGION GROUP SKIP <backtrace> Command failed. Which suggests rhel_storageqe--74-root (based on typical minor number assignments during boot). This is a bit odd as it should be unrelated to the preceding delete operation (and both the remaining devices appear to have been correctly reported). I'll try installing the kernel you are running on a system here & see if I can get it to reproduce. # dmsetup info -c Name Maj Min Stat Open Targ Event UUID rhel_storageqe--74-home 253 2 L--w 1 1 0 LVM-Os1HfUHxiMDYdf2cGs6ZDkf8nEngz4bV5YcYkwH6hTMjd2O7Untd0nZIB7pQ66Lj rhel_storageqe--74-swap 253 1 L--w 2 1 0 LVM-Os1HfUHxiMDYdf2cGs6ZDkf8nEngz4bV7BDwnFCSvOxFrpB2nkd8mv3GH7N8TWyA rhel_storageqe--74-root 253 0 L--w 1 1 0 LVM-Os1HfUHxiMDYdf2cGs6ZDkf8nEngz4bVDsHaQtTn7VckqY8BMWV6ugJ8Wxmd5PBS Does not appear to follow the kernel version: [root@bmr-rhel7-vm1 ~]# uname -r 3.10.0-308.el7.x86_64 [root@bmr-rhel7-vm1 ~]# dmstats create --alldevices --precise rhel-lv_test: Created new region with 1 area(s) as region ID 0 mpatha: Created new region with 1 area(s) as region ID 0 rhel-swap: Created new region with 1 area(s) as region ID 0 rhel-root: Created new region with 1 area(s) as region ID 0 rhel-lv_xfs0: Created new region with 1 area(s) as region ID 0 [root@bmr-rhel7-vm1 ~]# dmstats list --histogram Name GrpID RgID ObjType RgStart RgSize #Areas ArSize #Bins Histogram Bounds rhel-lv_test - 0 region 0 1.00g 1 1.00g 0 mpatha - 0 region 0 1.00g 1 1.00g 0 rhel-swap - 0 region 0 820.00m 1 820.00m 0 rhel-root - 0 region 0 10.71g 1 10.71g 0 rhel-lv_xfs0 - 0 region 0 3.10g 1 3.10g 0 [root@bmr-rhel7-vm1 ~]# dmstats delete --allregions rhel-lv_xfs0 [root@bmr-rhel7-vm1 ~]# dmstats list --histogram Name GrpID RgID ObjType RgStart RgSize #Areas ArSize #Bins Histogram Bounds rhel-lv_test - 0 region 0 1.00g 1 1.00g 0 mpatha - 0 region 0 1.00g 1 1.00g 0 rhel-swap - 0 region 0 820.00m 1 820.00m 0 rhel-root - 0 region 0 10.71g 1 10.71g 0 Vs: [root@bmr-rhel7-vm1 ~]# uname -r 3.10.0-915.el7.x86_64 [root@bmr-rhel7-vm1 ~]# dmstats create --alldevices --precise rhel-lv_test: Created new region with 1 area(s) as region ID 0 mpatha: Created new region with 1 area(s) as region ID 0 rhel-swap: Created new region with 1 area(s) as region ID 0 rhel-root: Created new region with 1 area(s) as region ID 0 rhel-lv_xfs0: Created new region with 1 area(s) as region ID 0 [root@bmr-rhel7-vm1 ~]# dmstats list --histogram Name GrpID RgID ObjType RgStart RgSize #Areas ArSize #Bins Histogram Bounds rhel-lv_test - 0 region 0 1.00g 1 1.00g 0 mpatha - 0 region 0 1.00g 1 1.00g 0 rhel-swap - 0 region 0 820.00m 1 820.00m 0 rhel-root - 0 region 0 10.71g 1 10.71g 0 rhel-lv_xfs0 - 0 region 0 3.10g 1 3.10g 0 [root@bmr-rhel7-vm1 ~]# dmstats delete --allregions rhel-lv_xfs0 [root@bmr-rhel7-vm1 ~]# dmstats list --histogram Name GrpID RgID ObjType RgStart RgSize #Areas ArSize #Bins Histogram Bounds rhel-lv_test - 0 region 0 1.00g 1 1.00g 0 mpatha - 0 region 0 1.00g 1 1.00g 0 rhel-swap - 0 region 0 820.00m 1 820.00m 0 rhel-root - 0 region 0 10.71g 1 10.71g 0 [root@bmr-rhel7-vm1 ~]# dmstats list Name GrpID RgID ObjType RgStart RgSize #Areas ArSize ProgID rhel-lv_test - 0 region 0 1.00g 1 1.00g dmstats mpatha - 0 region 0 1.00g 1 1.00g dmstats rhel-swap - 0 region 0 820.00m 1 820.00m dmstats rhel-root - 0 region 0 10.71g 1 10.71g dmstats [root@bmr-rhel7-vm1 ~]# dmstats list -v Name GrpID RgID ObjType RgStart RgSize #Areas ArID ArStart ArSize ProgID rhel-lv_test - 0 area 0 1.00g 1 0 0 1.00g dmstats mpatha - 0 area 0 1.00g 1 0 0 1.00g dmstats rhel-swap - 0 area 0 820.00m 1 0 0 820.00m dmstats rhel-root - 0 area 0 10.71g 1 0 0 10.71g dmstats What commands came before all this? (from comment #0) # dmstats create --alldevices --precise rhel_storageqe--74-home: Created new region with 1 area(s) as region ID 1 rhel_storageqe--74-swap: Created new region with 1 area(s) as region ID 1 rhel_storageqe--74-root: Created new region with 1 area(s) as region ID 1 # dmstats list --histogram Name GrpID RgID ObjType RgStart RgSize #Areas ArSize #Bins Histogram Bounds rhel_storageqe--74-home - 0 region 0 407.00g 1 407.00g 0 rhel_storageqe--74-home - 1 region 0 407.00g 1 407.00g 0 rhel_storageqe--74-swap - 0 region 0 7.75g 1 7.75g 0 rhel_storageqe--74-swap - 1 region 0 7.75g 1 7.75g 0 rhel_storageqe--74-root - 0 region 0 50.00g 1 50.00g 0 rhel_storageqe--74-root - 1 region 0 50.00g 1 50.00g 0 This cannot be the start of the operation - the assigned regions are all '1', and the list output indicates a pre-existing set of (identical) regions already in place. OK: this only happens on the very latest lvm2 builds. It is not present in any 2.02.177, but is present in the 2.02.179 build I just installed - in fact, I'm not certain the create/delete are necessary. After deleting all regions with the previous build I updated the lvm2 and device-mapper packages and ran a plain "dmstats list": # dmstats list Command failed. Turns out to be a regression caused by an unrelated change that exposed a bug in the stats reporting code in dmsetup: previously, this internal helper function's return status was unchecked. In one case the wrong value is returned indicating an error when no error has occurred (a device is found with no stats regions). The commit that exposed this was: commit 3f351466f7d2789b1f480cd0e370f978df8eb09b Author: Zdenek Kabelac <zkabelac> Date: Mon Mar 12 11:56:54 2018 +0100 dmsetup: update _display_info Handle error code. I've now pushed a fix to both the master and stable branches to correct this case: commit 29b9ccd261be025aaf75e58e5d2547e818ef22c3 (HEAD -> master) Author: Bryn M. Reeves <bmr> Date: Thu Jun 28 14:25:30 2018 +0100 dmsetup: fix error propagation in _display_info_cols() Commit 3f35146 added a check on the value returned by the _display_info_cols() function: 1024 if (!_switches[COLS_ARG]) 1025 _display_info_long(dmt, &info); 1026 else 1027 r = _display_info_cols(dmt, &info); 1028 1029 return r; This exposes a bug in the dmstats code in _display_info_cols: the fact that a device has no regions is explicitly not an error (and is documented as such in the code), but since the return code is not changed before leaving the function it is now treated as an error leading to: # dmstats list Command failed. When no regions exist. Set the return code to the correct value before returning. *** Bug 1591504 has been marked as a duplicate of this bug. *** Created attachment 1455333 [details] test log on clean server (In reply to Bryn M. Reeves from comment #6) > What commands came before all this? (from comment #0) > > # dmstats create --alldevices --precise > rhel_storageqe--74-home: Created new region with 1 area(s) as region ID 1 > rhel_storageqe--74-swap: Created new region with 1 area(s) as region ID 1 > rhel_storageqe--74-root: Created new region with 1 area(s) as region ID 1 > # dmstats list --histogram > Name GrpID RgID ObjType RgStart RgSize #Areas ArSize > #Bins Histogram Bounds > rhel_storageqe--74-home - 0 region 0 407.00g 1 407.00g 0 > rhel_storageqe--74-home - 1 region 0 407.00g 1 407.00g 0 > rhel_storageqe--74-swap - 0 region 0 7.75g 1 7.75g 0 > rhel_storageqe--74-swap - 1 region 0 7.75g 1 7.75g 0 > rhel_storageqe--74-root - 0 region 0 50.00g 1 50.00g 0 > rhel_storageqe--74-root - 1 region 0 50.00g 1 50.00g 0 > > This cannot be the start of the operation - the assigned regions are all > '1', and the list output indicates a pre-existing set of (identical) regions > already in place. This is taken from manually reproducing it on just provisioned server. There was the reproducer running on the server just before the reservation, you can find logs to that reproducer in this attachment. If this is a DUP of bug 1591504, then it's not yet fixed. Jul 24 16:28:08 host-073 qarshd[25667]: Running cmdline: dmstats create --filemap /mnt/takeover/host-073_load [root@host-073 ~]# df -h /dev/mapper/centipede2-takeover 2.8G 65M 2.8G 3% /mnt/takeover [root@host-073 ~]# ls /mnt/takeover/host-073_load /mnt/takeover/host-073_load [root@host-073 ~]# dmstats list --group Command failed. Name GrpID RgID ObjType RgStart RgSize #Areas ArSize ProgID host-073_load 0 0 group 33.62m 980.00k 1 980.00k dmstats [root@host-073 ~]# echo $? 1 3.10.0-925.el7.x86_64 lvm2-2.02.180-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 lvm2-libs-2.02.180-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 lvm2-cluster-2.02.180-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 lvm2-python-boom-0.9-4.el7 BUILT: Fri Jul 20 12:23:30 CDT 2018 cmirror-2.02.180-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 device-mapper-1.02.149-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 device-mapper-libs-1.02.149-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 device-mapper-event-1.02.149-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 device-mapper-event-libs-1.02.149-1.el7 BUILT: Fri Jul 20 12:21:35 CDT 2018 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017 Crap. I think this is just a patching error on my side: it looks as though I committed the fix to master, and pushed, and then cherry-picked it into 2018-06-01-stable, and did not push... I will rebase the stable version and get it pushed today, and talk to Marian about how we can get it included in a build. $ git checkout 2018-06-01-stable Switched to branch '2018-06-01-stable' Your branch is ahead of 'origin/2018-06-01-stable' by 1 commit. (use "git push" to publish your local commits) It looks as though Marian cherry picked and pushed the commit this morning: https://sourceware.org/git/?p=lvm2.git;a=commit;h=951676a59eb2b3130abb9eec690206665708b0d0 Marking verified in the latest rpms. 3.10.0-931.el7.x86_64 lvm2-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 lvm2-libs-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 lvm2-cluster-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 lvm2-python-boom-0.9-5.el7 BUILT: Wed Aug 1 11:24:13 CDT 2018 cmirror-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 device-mapper-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 device-mapper-libs-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 device-mapper-event-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 device-mapper-event-libs-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017 [root@host-087 ~]# ls /mnt/synced_primary_raid1_2legs_1/host-087_load /mnt/synced_primary_raid1_2legs_1/host-087_load [root@host-087 ~]# dmstats create --filemap /mnt/synced_primary_raid1_2legs_1/host-087_load /mnt/synced_primary_raid1_2legs_1/host-087_load: Created new group with 1 region(s) as group ID 0. [root@host-087 ~]# dmstats list --histogram Name GrpID RgID ObjType RgStart RgSize #Areas ArSize #Bins Histogram Bounds host-087_load 0 0 region 29.73m 980.00k 1 980.00k 0 host-087_load 0 0 group 29.73m 980.00k 1 980.00k 0 [root@host-087 ~]# dmstats list --group Name GrpID RgID ObjType RgStart RgSize #Areas ArSize ProgID host-087_load 0 0 group 29.73m 980.00k 1 980.00k dmstats [root@host-087 ~]# dmstats list --region Name GrpID RgID ObjType RgStart RgSize #Areas ArSize ProgID host-087_load 0 0 region 29.73m 980.00k 1 980.00k dmstats ### Does "report" ever display anything? [root@host-087 ~]# dmstats report [root@host-087 ~]# dmstats report --group [root@host-087 ~]# dmstats report --region Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3193 |