This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1269608 - RFE: add command to restore missing lvmlock LV
RFE: add command to restore missing lvmlock LV
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.2
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: David Teigland
cluster-qe@redhat.com
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-07 13:09 EDT by Corey Marthaler
Modified: 2017-10-03 21:21 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2015-10-07 13:09:16 EDT
Description of problem:
Make sure you don't kill your lvmlock lv.

================================================================================
Iteration 0.2 started at Wed Oct  7 11:24:47 CDT 2015
================================================================================
Scenario kill_primary_synced_raid1_2legs: Kill primary leg of synced 2 leg raid1 volume(s)
 
********* RAID hash info for this scenario *********
* names:              synced_primary_raid1_2legs_1
* sync:               1
* type:               raid1
* -m |-i value:       2
* leg devices:        /dev/sdb1 /dev/sdg1 /dev/sdf1
* spanned legs:        0
* failpv(s):          /dev/sdb1
* failnode(s):        host-113.virt.lab.msp.redhat.com
* lvmetad:            0
* raid fault policy:  warn
******************************************************                                                                                                          
Creating raids(s) on host-113.virt.lab.msp.redhat.com...                                                                                                                              
host-113.virt.lab.msp.redhat.com: lvcreate -aye --type raid1 -m 2 -n synced_primary_raid1_2legs_1 -L 500M black_bird /dev/sdb1:0-2400 /dev/sdg1:0-2400 /dev/sdf1:0-2400               
                                                                                                                                                                                      
Current mirror/raid device structure(s):                                                                                                                                              
  LV                                       Attr       LSize   Cpy%Sync Devices
   [lvmlock]                               -wi-ao---- 256.00m          /dev/sdb1(0)
   synced_primary_raid1_2legs_1            rwi-aor--- 500.00m 5.60     synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),synced_primary_raid1_2legs_1_rimage_2(0)
   [synced_primary_raid1_2legs_1_rimage_0] Iwi-aor--- 500.00m          /dev/sdb1(65)
   [synced_primary_raid1_2legs_1_rimage_1] Iwi-aor--- 500.00m          /dev/sdg1(1)
   [synced_primary_raid1_2legs_1_rimage_2] Iwi-aor--- 500.00m          /dev/sdf1(1)
   [synced_primary_raid1_2legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sdb1(64)
   [synced_primary_raid1_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdg1(0)
   [synced_primary_raid1_2legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdf1(0)
   [lvmlock]                               -wi-ao---- 256.00m          /dev/sdc1(0)
                                                                                                                                                                                      
                                                                                                                                                                                      
Waiting until all mirror|raid volumes become fully syncd...                                                                                                                           
   1/1 mirror(s) are fully synced: ( 100.00% )                                                                                                                                        
Sleeping 15 sec                                                                                                                                                                       
 
Creating gfs2 on top of mirror(s) on host-113.virt.lab.msp.redhat.com...
mkfs.gfs2 -J 32M -j 1 -p lock_nolock /dev/black_bird/synced_primary_raid1_2legs_1 -O
Mounting mirrored gfs2 filesystems on host-113.virt.lab.msp.redhat.com...
 
PV=/dev/sdb1
        lvmlock: 2
        synced_primary_raid1_2legs_1_rimage_0: 2
        synced_primary_raid1_2legs_1_rmeta_0: 2
 
Writing verification files (checkit) to mirror(s) on...
        ---- host-113.virt.lab.msp.redhat.com ----
 
 
<start name="host-113.virt.lab.msp.redhat.com_synced_primary_raid1_2legs_1" pid="24939" time="Wed Oct  7 11:25:45 2015" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
        ---- host-113.virt.lab.msp.redhat.com ----
 
 
 
Disabling device sdb on host-113.virt.lab.msp.redhat.com
 
simple pvs failed
 
[machine reboots]
 
 
 
 
 
console:
 
Oct  7 11:26:06 host-113 qarshd[4977]: Running cmdline: echo offline > /sys/block/sdb/device/state
Oct  7 11:26:10 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:10 host-113 kernel: md: super_written gets error=-5, uptodate=0
Oct  7 11:26:10 host-113 kernel: md/raid1:mdX: Disk failure on dm-5, disabling device.#012md/raid1:mdX: Operation continuing on 2 devices.
Oct  7 11:26:10 host-113 lvm[3693]: Device #0 of raid1 array, black_bird-synced_primary_raid1_2legs_1, has failed.
Oct  7 11:26:10 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:10 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:10 host-113 sanlock[2299]: 2015-10-07 11:26:10-0500 883 [2305]: r150 acquire_token disk error -5
Oct  7 11:26:10 host-113 sanlock[2299]: 2015-10-07 11:26:10-0500 883 [2305]: r150 cmd_acquire 3,9,2311 acquire_token -5
Oct  7 11:26:10 host-113 lvmlockd[2311]: 1444235170 S lvm_black_bird R VGLK lock_san acquire error -5
Oct  7 11:26:10 host-113 lvmlockd[2311]: 1444235170 S lvm_black_bird R VGLK res_lock lm error -218
Oct  7 11:26:10 host-113 lvm[3693]: VG black_bird lock skipped: storage errors for sanlock leases
Oct  7 11:26:10 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:10 host-113 lvm[3693]: /dev/sdb1: read failed after 0 of 4096 at 4096: Input/output error
Oct  7 11:26:10 host-113 lvm[3693]: Volume group "black_bird" not found
Oct  7 11:26:10 host-113 lvm[3693]: Cannot process volume group black_bird
Oct  7 11:26:10 host-113 lvm[3693]: Re-scan of RAID device black_bird-synced_primary_raid1_2legs_1 failed.
Oct  7 11:26:10 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:10 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:10 host-113 sanlock[2299]: 2015-10-07 11:26:10-0500 883 [2306]: r151 acquire_token disk error -5
Oct  7 11:26:10 host-113 sanlock[2299]: 2015-10-07 11:26:10-0500 883 [2306]: r151 cmd_acquire 3,9,2311 acquire_token -5
Oct  7 11:26:10 host-113 lvmlockd[2311]: 1444235170 S lvm_black_bird R VGLK lock_san acquire error -5
Oct  7 11:26:10 host-113 lvmlockd[2311]: 1444235170 S lvm_black_bird R VGLK res_lock lm error -218
Oct  7 11:26:10 host-113 lvm[3693]: VG black_bird lock failed: storage errors for sanlock leases
Oct  7 11:26:10 host-113 lvm[3693]: Repair of RAID device black_bird-synced_primary_raid1_2legs_1 failed.
Oct  7 11:26:10 host-113 lvm[3693]: Failed to process event for black_bird-synced_primary_raid1_2legs_1
Oct  7 11:26:11 host-113 systemd: Started qarsh Per-Connection Server (10.15.80.47:34004).
Oct  7 11:26:11 host-113 systemd: Starting qarsh Per-Connection Server (10.15.80.47:34004)...
Oct  7 11:26:11 host-113 qarshd[4983]: Talking to peer ::ffff:10.15.80.47:34004 (IPv6)
Oct  7 11:26:11 host-113 qarshd[4983]: Running cmdline: pvs -a
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:12 host-113 sanlock[2299]: 2015-10-07 11:26:12-0500 885 [2305]: r153 acquire_token disk error -5
Oct  7 11:26:12 host-113 sanlock[2299]: 2015-10-07 11:26:12-0500 885 [2305]: r153 cmd_acquire 3,9,2311 acquire_token -5
Oct  7 11:26:12 host-113 lvmlockd[2311]: 1444235172 S lvm_black_bird R VGLK lock_san acquire error -5
Oct  7 11:26:12 host-113 lvmlockd[2311]: 1444235172 S lvm_black_bird R VGLK res_lock lm error -218
Oct  7 11:26:12 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:21 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:21 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:21 host-113 sanlock[2299]: 2015-10-07 11:26:21-0500 894 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:26:21 host-113 sanlock[2299]: 2015-10-07 11:26:21-0500 894 [3523]: s2 renewal error -5 delta_length 0 last_success 874
Oct  7 11:26:22 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:22 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:22 host-113 sanlock[2299]: 2015-10-07 11:26:22-0500 895 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:26:22 host-113 sanlock[2299]: 2015-10-07 11:26:22-0500 895 [3523]: s2 renewal error -5 delta_length 0 last_success 874
Oct  7 11:26:22 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:22 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:22 host-113 sanlock[2299]: 2015-10-07 11:26:22-0500 895 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:26:22 host-113 sanlock[2299]: 2015-10-07 11:26:22-0500 895 [3523]: s2 renewal error -5 delta_length 0 last_success 874
Oct  7 11:26:23 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:23 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:23 host-113 sanlock[2299]: 2015-10-07 11:26:23-0500 896 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:26:23 host-113 sanlock[2299]: 2015-10-07 11:26:23-0500 896 [3523]: s2 renewal error -5 delta_length 0 last_success 874
Oct  7 11:26:23 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:23 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:23 host-113 sanlock[2299]: 2015-10-07 11:26:23-0500 896 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:26:23 host-113 sanlock[2299]: 2015-10-07 11:26:23-0500 896 [3523]: s2 renewal error -5 delta_length 0 last_success 874
Oct  7 11:26:24 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:24 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:26:24 host-113 sanlock[2299]: 2015-10-07 11:26:24-0500 897 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:26:24 host-113 sanlock[2299]: 2015-10-07 11:26:24-0500 897 [3523]: s2 renewal error -5 delta_length 0 last_success 874
[...]
Oct  7 11:27:54 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:27:54 host-113 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  7 11:27:54 host-113 sanlock[2299]: 2015-10-07 11:27:54-0500 987 [3523]: s2 delta_renew read rv -5 offset 0 /dev/mapper/black_bird-lvmlock
Oct  7 11:27:54 host-113 sanlock[2299]: 2015-10-07 11:27:54-0500 987 [3523]: s2 renewal error -5 delta_length 0 last_success 874

[system dies]

Oct  7 16:29:06 host-113 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="599" x-info="http://www.rsyslog.com"] start
Oct  7 11:29:02 host-113 journal: Runtime journal is using 6.2M (max allowed 49.6M, trying to leave 74.4M free of 490.3M available รข current limit 49.6M).
Oct  7 11:29:02 host-113 kernel: Initializing cgroup subsys cpuset
Oct  7 11:29:02 host-113 kernel: Initializing cgroup subsys cpu
Oct  7 11:29:02 host-113 kernel: Initializing cgroup subsys cpuacct
Oct  7 11:29:02 host-113 kernel: Linux version 3.10.0-306.el7.x86_64 (mockbuild@x86-024.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Mon Aug 17 16:47:42 EDT 2015
Oct  7 11:29:02 host-113 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-306.el7.x86_64 root=/dev/mapper/rhel_host--113-root ro crashkernel=auto rd.lvm.lv=rhel_host-113/root rd.lvm.lv=rhel_host-113/swap console=ttyS0,115200 LANG=en_US.UTF-8


Version-Release number of selected component (if applicable):
3.10.0-306.el7.x86_64

lvm2-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-libs-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-cluster-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
Comment 2 David Teigland 2015-10-07 13:48:20 EDT
If the PV is lost that held the lvmlock LV, the steps to recreate the LV should be simpler (or perhaps automatic at some point).  Below are the steps that I used previously, but that was when I had other special options to override the lvmlockd locking for recovery.  I'd need to define a new way to override the normal locking for these recovery commands, or define a new command/option that simplifies these steps (and automatically skips the broken locking.)

1. Remove the missing device from the VG.

vgreduce --removemissing --force --config VG

2. If step 1 did not remove the lvmlock LV, then do that directly.

lvremove VG/lvmlock

3. Change the lock type to "none", i.e. make it a local VG.

vgchange --lock-type none --force VG

4. VG space is needed to recreate the locks. If there is not enough space, vgextend the VG.

5. Change the lock type back to sanlock. This creates a new internal lvmlock LV and recreates locks.

vgchange --lock-type sanlock VG

Perhaps these steps could all be combined in a new command like:
vgchange --lock-restore VG

Note You need to log in before you can comment on or make changes to this bug.