Bug 1629889
Summary: | Device removal leaves garbage on the device returning "ok", breaking further attempts to "add" device back | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Valerii Ponomarov <vponomar> | ||||||
Component: | heketi | Assignee: | John Mulligan <jmulligan> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Valerii Ponomarov <vponomar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | cns-3.10 | CC: | akrishna, hchiramm, kramdoss, madam, ndevos, pprakash, rgeorge, rhs-bugs, rtalur, sankarshan, storage-qa-internal, vinug, vponomar | ||||||
Target Milestone: | --- | ||||||||
Target Release: | OCS 3.11 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | heketi-7.0.0-12.el7rhgs | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, heketi ignored pvremove and vgremove errors when a device was removed while the device remove commands were used. Attempting to add the same disk again failed because it had not been properly removed in the first place. Heketi no longer ignores pvremove and vgremove errors, ensuring that devices are removed correctly, and can be re-added to Heketi after removal. Alternatively, you can also use the "--force-forget" flag with the device remove command to ignore any errors to ensure the same device can be added back to Heketi.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-10-24 04:51:02 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1629575 | ||||||||
Attachments: |
|
Description
Valerii Ponomarov
2018-09-17 14:56:40 UTC
Created attachment 1484065 [details]
Heketi DB dump
Adding Heketi DB dump
Created attachment 1484066 [details]
Heketi server logs
Adding Heketi server logs
Also, after seeing error, I was told to try to run "wipefs" command and retry again. It didn't help, results: [root@vp-ansible-v310-ga-master-0 ~]# heketi-cli device add --name /dev/sdd --node 8966bb38131b92389d55e09862ca9cce Error: Can't open /dev/sdd exclusively. Mounted filesystem? [root@vp-ansible-v310-ga-master-0 ~]# ssh root@vp-ansible-v310-ga-app-cns-1 Last login: Mon Sep 17 08:06:55 2018 from vp-ansible-v310-ga-master-0 [root@vp-ansible-v310-ga-app-cns-1 ~]# wipefs /dev/sdd offset type ---------------------------------------------------------------- 0x218 LVM2_member [raid] UUID: gIk6eH-yHuU-py77-8PCo-lbuk-98fI-K9OveQ ==================================================== [root@vp-ansible-v310-ga-app-cns-1 ~]# wipefs -a /dev/sdd wipefs: error: /dev/sdd: probing initialization failed: Device or resource busy ==================================================== [root@vp-ansible-v310-ga-app-cns-1 ~]# pvscan PV /dev/sdd VG vg_68c9ef3f5a2e31d2976565f9f187a6cf lvm2 [99.87 GiB / <97.85 GiB free] PV /dev/sda2 VG rhel_dhcp46-210 lvm2 [<39.00 GiB / 0 free] PV /dev/sdb1 VG docker-vol lvm2 [<40.00 GiB / 0 free] PV /dev/sdf VG vg_1517317e9a1fc96c830c9493c49833f9 lvm2 [99.87 GiB / <97.85 GiB free] PV /dev/sde VG vg_ca1fd84ddfbd19783537ea7d61e39f9b lvm2 [199.87 GiB / <196.84 GiB free] Total: 5 [<478.61 GiB] / in use: 5 [<478.61 GiB] / in no VG: 0 [0 ] ==================================================== [root@vp-ansible-v310-ga-app-cns-1 ~]# wipefs --force --all /dev/sdf /dev/sdf: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31 [root@vp-ansible-v310-ga-app-cns-1 ~]# wipefs --force --all /dev/sdd /dev/sdd: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31 ==================================================== [root@vp-ansible-v310-ga-app-cns-1 ~]# pvscan PV /dev/sda2 VG rhel_dhcp46-210 lvm2 [<39.00 GiB / 0 free] PV /dev/sdb1 VG docker-vol lvm2 [<40.00 GiB / 0 free] PV /dev/sde VG vg_ca1fd84ddfbd19783537ea7d61e39f9b lvm2 [199.87 GiB / <196.84 GiB free] Total: 3 [278.86 GiB] / in use: 3 [278.86 GiB] / in no VG: 0 [0 ] =================================================== [root@vp-ansible-v310-ga-master-0 ~]# heketi-cli device add --name /dev/sdd --node 8966bb38131b92389d55e09862ca9cce Error: Can't open /dev/sdd exclusively. Mounted filesystem? [root@vp-ansible-v310-ga-master-0 ~]# heketi-cli device add --name /dev/sdf --node 8966bb38131b92389d55e09862ca9cce Error: Can't open /dev/sdf exclusively. Mounted filesystem? Do you have the cluster in this state such that I could log on to the cluster and reproduce the error condition myself? If not, does the problem go away after you either (a) restart the gluster pod or (b) reboot the node? John, I didn't reboot neither pod nor node. I have cluster used for it. Will send you cluster creds in email. It's not so much that Heketi is failing to try to clean up the node but rather that the actual resources are still in use by the kernel (device mapper). You an see that certain vgs are missing from the lvm output but are present in the device mapper list. == vgs inside the container == sh-4.2# vgs VG #PV #LV #SN Attr VSize VFree docker-vol 1 1 0 wz--n- <40.00g 0 rhel_dhcp46-210 1 2 0 wz--n- <39.00g 0 vg_ca1fd84ddfbd19783537ea7d61e39f9b 1 4 0 wz--n- 199.87g <196.84g == lvs inside the container == sh-4.2# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert dockerlv docker-vol -wi-ao---- <40.00g root rhel_dhcp46-210 -wi-ao---- <35.00g swap rhel_dhcp46-210 -wi-a----- 4.00g brick_83a6d5e8b1072640197ae231f52a48af vg_ca1fd84ddfbd19783537ea7d61e39f9b Vwi-aotz-- 1.00g tp_5c46b2de672ea8f555d523b679a4c2d6 1.39 brick_aea6efb6d0301b9b35b66e3b7821606e vg_ca1fd84ddfbd19783537ea7d61e39f9b Vwi-aotz-- 2.00g tp_534b36a164af13c58dd8d4f14368e7f0 0.70 tp_534b36a164af13c58dd8d4f14368e7f0 vg_ca1fd84ddfbd19783537ea7d61e39f9b twi-aotz-- 2.00g 0.70 0.33 tp_5c46b2de672ea8f555d523b679a4c2d6 vg_ca1fd84ddfbd19783537ea7d61e39f9b twi-aotz-- 1.00g 1.39 0.49 == lvs on the node == [root@vp-ansible-v310-ga-app-cns-1 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert dockerlv docker-vol -wi-ao---- <40.00g root rhel_dhcp46-210 -wi-ao---- <35.00g swap rhel_dhcp46-210 -wi-a----- 4.00g brick_83a6d5e8b1072640197ae231f52a48af vg_ca1fd84ddfbd19783537ea7d61e39f9b Vwi-aotz-- 1.00g tp_5c46b2de672ea8f555d523b679a4c2d6 1.39 brick_aea6efb6d0301b9b35b66e3b7821606e vg_ca1fd84ddfbd19783537ea7d61e39f9b Vwi-aotz-- 2.00g tp_534b36a164af13c58dd8d4f14368e7f0 0.70 tp_534b36a164af13c58dd8d4f14368e7f0 vg_ca1fd84ddfbd19783537ea7d61e39f9b twi-aotz-- 2.00g 0.70 0.33 tp_5c46b2de672ea8f555d523b679a4c2d6 vg_ca1fd84ddfbd19783537ea7d61e39f9b twi-aotz-- 1.00g 1.39 0.49 == dmsetup ls on the node == [root@vp-ansible-v310-ga-app-cns-1 ~]# dmsetup ls vg_1517317e9a1fc96c830c9493c49833f9-tp_ff12ee2104892099503b087db5b8aefd-tpool (253:10) vg_1517317e9a1fc96c830c9493c49833f9-tp_ff12ee2104892099503b087db5b8aefd_tdata (253:9) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_5c46b2de672ea8f555d523b679a4c2d6-tpool (253:15) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_5c46b2de672ea8f555d523b679a4c2d6_tdata (253:14) vg_ca1fd84ddfbd19783537ea7d61e39f9b-brick_83a6d5e8b1072640197ae231f52a48af (253:17) vg_1517317e9a1fc96c830c9493c49833f9-tp_ff12ee2104892099503b087db5b8aefd_tmeta (253:8) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_5c46b2de672ea8f555d523b679a4c2d6_tmeta (253:13) vg_68c9ef3f5a2e31d2976565f9f187a6cf-tp_e04cc4729e00e4ec21aa1b028651d599-tpool (253:5) vg_68c9ef3f5a2e31d2976565f9f187a6cf-tp_e04cc4729e00e4ec21aa1b028651d599_tdata (253:4) vg_68c9ef3f5a2e31d2976565f9f187a6cf-tp_e04cc4729e00e4ec21aa1b028651d599_tmeta (253:3) docker--vol-dockerlv (253:2) vg_68c9ef3f5a2e31d2976565f9f187a6cf-brick_e04cc4729e00e4ec21aa1b028651d599 (253:7) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_534b36a164af13c58dd8d4f14368e7f0-tpool (253:20) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_534b36a164af13c58dd8d4f14368e7f0_tdata (253:19) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_534b36a164af13c58dd8d4f14368e7f0_tmeta (253:18) vg_1517317e9a1fc96c830c9493c49833f9-tp_ff12ee2104892099503b087db5b8aefd (253:11) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_5c46b2de672ea8f555d523b679a4c2d6 (253:16) vg_ca1fd84ddfbd19783537ea7d61e39f9b-tp_534b36a164af13c58dd8d4f14368e7f0 (253:21) rhel_dhcp46--210-swap (253:1) rhel_dhcp46--210-root (253:0) vg_ca1fd84ddfbd19783537ea7d61e39f9b-brick_aea6efb6d0301b9b35b66e3b7821606e (253:22) vg_1517317e9a1fc96c830c9493c49833f9-brick_651d564bd98fb500018dadbe35ee60b0 (253:12) I think this is not directly related to Heketi, but rather, the way we're running lvm on the gluster pods. More to come... The problem is that wipefs does not completely remove the LVM metadata from the block devices. When a device is scanned after wipefs, the VolumeGroup and LogicalVolumes may return again. I have seen this in my test environments on occasion too. This is what I do to completely remove everything: # lvremove -y $(cd /dev/mapper ; ls vg_* | sed s,-,/,) # pvremove --force --force -y /dev/vdb /dev/vdc # wipefs --force --all /dev/vdb /dev/vdc # sync # reboot It might be overdoing it a bit, but it works for me :) I can confirm that this is a bug in heketi, it did not successfully delete the vg but removed the device from the db anyway. It should not have allowed this. Updated the Doc Text field, Kindly verify. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2986 |