Bug 1549272

Summary: "Failed to lock logical volume" during raid scrub check after partial activation
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac
Version: 7.5   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1886597 (view as bug list) Environment:
Last Closed: 2021-02-15 07:35:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1886597    
Attachments:
Description Flags
verbose lvchange attempt none

Description Corey Marthaler 2018-02-26 21:00:05 UTC
Description of problem:

Scenario leading up to the scrub attempt:

SCENARIO (raid1) - [vgcfgrestore_raid_with_missing_pv]

Create a raid, force remove a leg, and then restore it's VG
host-083: lvcreate  --nosync --type raid1 -m 1 -n missing_pv_raid -L 100M raid_sanity
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
Deactivating missing_pv_raid raid

Backup the VG config
host-083 vgcfgbackup -f /tmp/raid_sanity.bkup.10114 raid_sanity

Force removing PV /dev/sda1 (used in this raid)
host-083: 'pvremove -ff --yes /dev/sda1'
  WARNING: PV /dev/sda1 is used by VG raid_sanity.
  WARNING: Wiping physical volume label from /dev/sda1 of volume group "raid_sanity".

Verifying that this VG is now corrupt
  WARNING: Device for PV 5PJqhe-7OLo-1iiv-3VDw-btRs-JmFx-GswaZn not found or rejected by a filter.
  Failed to find physical volume "/dev/sda1".

Attempt to restore the VG back to it's original state (should not segfault BZ 1348327)
host-083 vgcfgrestore -f /tmp/raid_sanity.bkup.10114 raid_sanity
  Couldn't find device with uuid 5PJqhe-7OLo-1iiv-3VDw-btRs-JmFx-GswaZn.
  Cannot restore Volume Group raid_sanity with 1 PVs marked as missing.
  Restore failed.
Checking syslog to see if vgcfgrestore segfaulted

Activating VG in partial readonly mode
host-083 vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  WARNING: Device for PV 5PJqhe-7OLo-1iiv-3VDw-btRs-JmFx-GswaZn not found or rejected by a filter.

Recreating PV using it's old uuid
host-083 pvcreate --norestorefile --uuid "5PJqhe-7OLo-1iiv-3VDw-btRs-JmFx-GswaZn" /dev/sda1
  WARNING: Device for PV 5PJqhe-7OLo-1iiv-3VDw-btRs-JmFx-GswaZn not found or rejected by a filter.
Restoring the VG back to it's original state
host-083 vgcfgrestore -f /tmp/raid_sanity.bkup.10114 raid_sanity
Reactivating VG

[root@host-083 ~]# lvs -a -o +devices
  LV                         VG          Attr       LSize    Cpy%Sync Devices
  missing_pv_raid            raid_sanity Rwi-a-r-r- 100.00m  100.00   missing_pv_raid_rimage_0(0),missing_pv_raid_rimage_1(0)
  [missing_pv_raid_rimage_0] raid_sanity Iwi-aor-r- 100.00m           /dev/sda1(1)
  [missing_pv_raid_rimage_1] raid_sanity iwi-aor--- 100.00m           /dev/sdb1(1)
  [missing_pv_raid_rmeta_0]  raid_sanity ewi-aor-r-   4.00m           /dev/sda1(0)
  [missing_pv_raid_rmeta_1]  raid_sanity ewi-aor---   4.00m           /dev/sdb1(0)

[root@host-083 ~]# lvchange --syncaction repair raid_sanity/missing_pv_raid
[root@host-083 ~]# echo $?
0

perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/missing_pv_raid

[422860.991394] md: requested-resync of RAID array mdX
Feb 26 14:27:12 host-083 kernel: md: requested-resync of RAID array mdX
[422861.004645] md: mdX: requested-resync done.
Feb 26 14:27:12 host-083 kernel: md: mdX: requested-resync done.
Feb 26 14:27:12 host-083 lvm[22228]: WARNING: Device #0 of raid1 array, raid_sanity-missing_pv_raid, has failed.
Feb 26 14:27:12 host-083 lvm[22228]: WARNING: Disabling lvmetad cache for repair command.
Feb 26 14:27:12 host-083 lvm[22228]: WARNING: Not using lvmetad because of repair.
[422861.239234] device-mapper: raid: Failed to read superblock of device at position 0
[422861.247114] device-mapper: raid: Device 1 specified for rebuild; clearing superblock
Feb 26 14:27:13 host-083 kernel: device-mapper: raid: Failed to read superblock of device at position 0
Feb 26 14:27:13 host-083 kernel: device-mapper: raid: Device 1 specified for rebuild; clearing superblock
[422861.259156] md/raid1:mdX: active with 0 out of 2 mirrors
[422861.260594] mdX: failed to create bitmap (-5)
[422861.262601] device-mapper: table: 253:12: raid: Failed to run raid array
[422861.264326] device-mapper: ioctl: error adding target to table
Feb 26 14:27:13 host-083 kernel: md/raid1:mdX: active with 0 out of 2 mirrors
Feb 26 14:27:13 host-083 kernel: mdX: failed to create bitmap (-5)
Feb 26 14:27:13 host-083 kernel: device-mapper: table: 253:12: raid: Failed to run raid array
Feb 26 14:27:13 host-083 kernel: device-mapper: ioctl: error adding target to table
Feb 26 14:27:13 host-083 lvm[22228]: device-mapper: reload ioctl on  (253:12) failed: Input/output error
Feb 26 14:27:13 host-083 lvm[22228]: Failed to lock logical volume raid_sanity/missing_pv_raid.
Feb 26 14:27:13 host-083 lvm[22228]: Failed to replace faulty devices in raid_sanity/missing_pv_raid.
Feb 26 14:27:13 host-083 lvm[22228]: Repair of RAID device raid_sanity-missing_pv_raid failed.
Feb 26 14:27:13 host-083 lvm[22228]: Failed to process event for raid_sanity-missing_pv_raid.


Version-Release number of selected component (if applicable):
3.10.0-854.el7.x86_64

lvm2-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-libs-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-cluster-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-lockd-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
cmirror-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-libs-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-event-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-event-libs-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017


How reproducible:
Most times

Comment 2 Corey Marthaler 2018-02-26 21:10:16 UTC
Created attachment 1401029 [details]
verbose lvchange attempt

Comment 3 Corey Marthaler 2020-10-08 17:17:25 UTC
This is an issue in the current rhel8.3 as well (this test has been turned off since 2018). We can open a new bug to track and fix (if warranted), in rhel8 and close this issue.

kernel-4.18.0-234.el8.x86_64

[root@hayes-03 ~]# lvs -a -o +devices
  LV                         VG          Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                
  missing_pv_raid            raid_sanity Rwi-a-r-r- 100.00m                                    100.00           missing_pv_raid_rimage_0(0),missing_pv_raid_rimage_1(0)
  [missing_pv_raid_rimage_0] raid_sanity Iwi-aor-r- 100.00m                                                     /dev/sdb1(1)                                           
  [missing_pv_raid_rimage_1] raid_sanity iwi-aor--- 100.00m                                                     /dev/sdc1(1)                                           
  [missing_pv_raid_rmeta_0]  raid_sanity ewi-aor-r-   4.00m                                                     /dev/sdb1(0)                                           
  [missing_pv_raid_rmeta_1]  raid_sanity ewi-aor---   4.00m                                                     /dev/sdc1(0)                                           
[root@hayes-03 ~]# lvchange --syncaction check raid_sanity/missing_pv_raid
[root@hayes-03 ~]# echo $?
0

Oct  8 12:11:00 hayes-03 kernel: md: mdX: data-check done.
Oct  8 12:11:00 hayes-03 lvm[546077]: WARNING: Device #0 of raid1 array, raid_sanity-missing_pv_raid, has failed.
Oct  8 12:11:00 hayes-03 kernel: device-mapper: raid: Failed to read superblock of device at position 0
Oct  8 12:11:00 hayes-03 kernel: device-mapper: raid: Device 1 specified for rebuild; clearing superblock
Oct  8 12:11:00 hayes-03 kernel: md: pers->run() failed ...
Oct  8 12:11:00 hayes-03 kernel: device-mapper: table: 253:6: raid: Failed to run raid array
Oct  8 12:11:00 hayes-03 kernel: device-mapper: ioctl: error adding target to table
Oct  8 12:11:00 hayes-03 lvm[546077]: device-mapper: reload ioctl on  (253:6) failed: Invalid argument
Oct  8 12:11:00 hayes-03 lvm[546077]: Failed to suspend logical volume raid_sanity/missing_pv_raid.
Oct  8 12:11:00 hayes-03 lvm[546077]: Failed to replace faulty devices in raid_sanity/missing_pv_raid.
Oct  8 12:11:00 hayes-03 lvm[546077]: Repair of RAID device raid_sanity-missing_pv_raid failed.
Oct  8 12:11:00 hayes-03 lvm[546077]: Failed to process event for raid_sanity-missing_pv_raid.

Comment 6 RHEL Program Management 2021-02-15 07:35:28 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.