Bug 1445924

Summary: Volume groups not activating due to mdadm devices reporting degraded
Product: Red Hat Enterprise Linux 7 Reporter: John Pittman <jpittman>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: medium    
Priority: medium CC: agk, cmarthal, cww, dledford, guazhang, hartsjc, heinzm, jbrassow, jpittman, jshortt, loberman, msnitzer, ncroxon, prajnoha, prockai, thornber, vanhoof, xni, zkabelac
Version: 7.3Flags: jpittman: needinfo+
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-06 14:12:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420851    
Attachments:
Description Flags
lvmetad udev rules to collect env var debug info none

Description John Pittman 2017-04-26 19:28:07 UTC
Description of problem:

Volume groups are failing to activate at boot, seemingly due to mdadm devices not assembling as expected.  The issue can be worked around by disabling lvmetad from /etc/lvm/lvm.conf and failing back to legacy activation.

Version-Release number of selected component (if applicable):

mdadm-3.4-14.el7_3.1.x86_64
lvm2-2.02.166-1.el7_3.2.x86_64
device-mapper-1.02.135-1.el7_3.2.x86_64
systemd-219-30.el7_3.6.x86_64

How reproducible:

Seen in customer environment.  This issue is present on 1 of a few nodes that are almost identical.  Seems some sort of race due to this and the lvmetad workaround.

Actual results:

VG activation fails

Expected results:

VG should activate, either because md build should complete as expected, or PV scan should execute even with degraded array.

Additional info:

- Checking the devices needed for the 2 volume groups:

[john@host debug_sos]$ grep device etc/lvm/backup/vg0[23]
etc/lvm/backup/vg02:			device = "/dev/md0"	# Hint only
etc/lvm/backup/vg03:			device = "/dev/md2"	# Hint only
etc/lvm/backup/vg03:			device = "/dev/md3"	# Hint only
etc/lvm/backup/vg03:			device = "/dev/md4"	# Hint only

- Devices all available at boot, however PV scan is not run:

Apr 25 17:45:42 host kernel: md/raid10:md2: active with 24 out of 24 devices
Apr 25 17:45:42 host kernel: md/raid10:md4: active with 24 out of 24 devices
Apr 25 17:45:42 host kernel: md/raid10:md3: active with 24 out of 24 devices
Apr 25 17:45:42 host kernel: md/raid10:md0: active with 24 out of 24 devices

Apr 25 17:45:42 host kernel: md4: detected capacity change from 0 to 4799449792512
Apr 25 17:45:42 host kernel: md3: detected capacity change from 0 to 4799449792512
Apr 25 17:45:42 host kernel: md0: detected capacity change from 0 to 4799449792512
Apr 25 17:45:42 host kernel: md2: detected capacity change from 0 to 4799449792512

<PV scan did not kick off>

- Assuming it's because systemd thinks raid is degraded

Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead -> start
Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though degraded...
Apr 25 17:46:14 host systemd[1]: About to execute: /usr/sbin/mdadm --run /dev/md2
Apr 25 17:46:14 host systemd[1]: Forked /usr/sbin/mdadm as 3967
Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead -> start
Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though degraded...
Apr 25 17:46:14 host systemd[1]: About to execute: /usr/sbin/mdadm --run /dev/md0
Apr 25 17:46:14 host systemd[1]: Forked /usr/sbin/mdadm as 3968
Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead -> start
Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though degraded...
Apr 25 17:46:14 host systemd[3966]: Executing: /usr/sbin/mdadm --run /dev/md3
Apr 25 17:46:14 host systemd[1]: About to execute: /usr/sbin/mdadm --run /dev/md4
Apr 25 17:46:14 host systemd[1]: Forked /usr/sbin/mdadm as 3969
Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead -> start
Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though degraded...
Apr 25 17:46:14 host systemd[3967]: Executing: /usr/sbin/mdadm --run /dev/md2

Apr 25 17:47:02 host systemd[1]: Job dev-mapper-vg03\x2ddata_lv.device/start timed out.
Apr 25 17:47:02 host systemd[1]: Job dev-mapper-vg03\x2ddata_lv.device/start finished, result=timeout
Apr 25 17:47:02 host systemd[1]: Timed out waiting for device dev-mapper-vg03\x2ddata_lv.device.
Apr 25 17:47:02 host systemd[1]: Job data-influxdb\x2ddata.mount/start finished, result=dependency

Apr 25 17:47:02 host systemd[1]: Timed out waiting for device dev-mapper-vg02\x2dwal_lv.device.
Apr 25 17:47:02 host systemd[1]: Job data-influxdb\x2dwal.mount/start finished, result=dependency
Apr 25 17:47:02 host systemd[1]: Dependency failed for /data/influxdb-wal.
Apr 25 17:47:02 host systemd[1]: Job data-influxdb\x2dwal.mount/start failed with result 'dependency'.
Apr 25 17:47:02 host systemd[1]: Job dev-mapper-vg02\x2dwal_lv.device/start failed with result 'timeout'.

- Post-boot, the devices look fine.  Also, the volume groups can be manually activated.  Setting 'use_lvmetad = 0' in /etc/lvm/lvm.conf works around the issue.

md0 : active raid10 sdj[6] sdaa[18] sdm[8] sdae[21] sdac[20] sdy[17] sda[0] sdag[23] sdab[19] sdaf[22] sdu[14] sdh[5] sds[13] sdk[7] sdg[4] sdn[9] sdx[16] sde[3] sdv[15] sdd[2] sdq[11] sdo[10] sdr[12] sdc[1]
      4686962688 blocks super 1.2 256K chunks 2 near-copies [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU]
      bitmap: 0/35 pages [0KB], 65536KB chunk
md2 : active raid10 sdcb[1] sdcc[2] sdcj[8] sdci[7] sdcm[11] sdbr[15] sdbp[13] sdcu[18] sdcl[10] sdcz[21] sdcg[5] sdcf[4] sdct[16] sddd[23] sdch[6] sdby[0] sdcd[3] sddb[22] sdcv[19] sdbv[17] sdcx[20] sdcq[14] sdck[9] sdcn[12]
      4686962688 blocks super 1.2 256K chunks 2 near-copies [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU]
      bitmap: 0/35 pages [0KB], 65536KB chunk
md3 : active raid10 sddc[10] sdce[4] sdcs[18] sdbo[7] sdbt[0] sddk[21] sdcp[13] sdde[12] sdbn[5] sddj[20] sdcw[6] sdda[9] sddl[22] sddi[19] sdcy[8] sdcr[16] sddh[17] sddm[23] sddf[14] sdbw[1] sddg[15] sdca[3] sdbz[2] sdco[11]
      4686962688 blocks super 1.2 256K chunks 2 near-copies [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU]
      bitmap: 0/35 pages [0KB], 65536KB chunk
md4 : active raid10 sdl[14] sdbu[10] sdp[15] sdbq[4] sdw[17] sdak[21] sdb[11] sddo[6] sdaq[23] sdt[16] sddn[5] sddp[8] sdbk[0] sdz[18] sdi[13] sdf[12] sdbx[3] sdbs[7] sdad[19] sdah[20] sdbm[2] sdan[22] sdbl[1] sddq[9]
      4686962688 blocks super 1.2 256K chunks 2 near-copies [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU]
      bitmap: 0/35 pages [0KB], 65536KB chunk

* Why does mdadm think the array is degraded even though all devices are present?  Is it taking too long to build?
* If we're activating, even though degraded, shouldn't LVM kick off a PV scan afterwards?

I've I've opened this under the wrong component, my apologies.  Please feel free to adjust.  Normally I'd open against LVM, but it looks as if the event driven lvmetad is waiting for an event from mdadm.

Comment 5 XiaoNi 2017-04-29 09:33:51 UTC
(In reply to John Pittman from comment #0)
> Description of problem:
> 
> 
The boot message which has failed information is too larger than the normal one. There are too many error messages, not only the md/lvm. It looks like the environment was broken. Why there are so many messages? 

> Apr 25 17:45:42 host kernel: md/raid10:md2: active with 24 out of 24 devices
> Apr 25 17:45:42 host kernel: md/raid10:md4: active with 24 out of 24 devices
> Apr 25 17:45:42 host kernel: md/raid10:md3: active with 24 out of 24 devices
> Apr 25 17:45:42 host kernel: md/raid10:md0: active with 24 out of 24 devices
> 
> Apr 25 17:45:42 host kernel: md4: detected capacity change from 0 to
> 4799449792512
> Apr 25 17:45:42 host kernel: md3: detected capacity change from 0 to
> 4799449792512
> Apr 25 17:45:42 host kernel: md0: detected capacity change from 0 to
> 4799449792512
> Apr 25 17:45:42 host kernel: md2: detected capacity change from 0 to
> 4799449792512
> 
> <PV scan did not kick off>
> 
> - Assuming it's because systemd thinks raid is degraded
> 
> Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead
> -> start
> Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though
> degraded...
> Apr 25 17:46:14 host systemd[1]: About to execute: /usr/sbin/mdadm --run
> /dev/md2
> Apr 25 17:46:14 host systemd[1]: Forked /usr/sbin/mdadm as 3967
> Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead
> -> start
> Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though
> degraded...
> Apr 25 17:46:14 host systemd[1]: About to execute: /usr/sbin/mdadm --run
> /dev/md0
> Apr 25 17:46:14 host systemd[1]: Forked /usr/sbin/mdadm as 3968
> Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead
> -> start
> Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though
> degraded...
> Apr 25 17:46:14 host systemd[3966]: Executing: /usr/sbin/mdadm --run /dev/md3
> Apr 25 17:46:14 host systemd[1]: About to execute: /usr/sbin/mdadm --run
> /dev/md4
> Apr 25 17:46:14 host systemd[1]: Forked /usr/sbin/mdadm as 3969
> Apr 25 17:46:14 host systemd[1]: mdadm-last-resort changed dead
> -> start
> Apr 25 17:46:14 host systemd[1]: Starting Activate md array even though
> degraded...
> Apr 25 17:46:14 host systemd[3967]: Executing: /usr/sbin/mdadm --run /dev/md2
> 
> Apr 25 17:47:02 host systemd[1]: Job dev-mapper-vg03\x2ddata_lv.device/start
> timed out.
> Apr 25 17:47:02 host systemd[1]: Job dev-mapper-vg03\x2ddata_lv.device/start
> finished, result=timeout
> Apr 25 17:47:02 host systemd[1]: Timed out waiting for device
> dev-mapper-vg03\x2ddata_lv.device.
> Apr 25 17:47:02 host systemd[1]: Job data-influxdb\x2ddata.mount/start
> finished, result=dependency
> 
> Apr 25 17:47:02 host systemd[1]: Timed out waiting for device
> dev-mapper-vg02\x2dwal_lv.device.
> Apr 25 17:47:02 host systemd[1]: Job data-influxdb\x2dwal.mount/start
> finished, result=dependency
> Apr 25 17:47:02 host systemd[1]: Dependency failed for /data/influxdb-wal.
> Apr 25 17:47:02 host systemd[1]: Job data-influxdb\x2dwal.mount/start failed
> with result 'dependency'.
> Apr 25 17:47:02 host systemd[1]: Job dev-mapper-vg02\x2dwal_lv.device/start
> failed with result 'timeout'.

There are error messages. It looks like the disks are broken. Are they connected by iscsi? Is there a network broken? 

Apr 25 17:48:07 carf-metrics-influx01 kernel: sd 11:1:47:0: [sdb] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:07 carf-metrics-influx01 kernel: sd 11:1:47:0: [sdb] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:10 carf-metrics-influx01 kernel: sd 11:1:48:0: [sdf] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:10 carf-metrics-influx01 kernel: sd 11:1:48:0: [sdf] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:12 carf-metrics-influx01 kernel: sd 0:0:5:0: [sdh] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:12 carf-metrics-influx01 kernel: sd 0:0:5:0: [sdh] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:14 carf-metrics-influx01 kernel: sd 11:1:49:0: [sdi] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:14 carf-metrics-influx01 kernel: sd 11:1:49:0: [sdi] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:16 carf-metrics-influx01 kernel: sd 0:0:6:0: [sdj] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:16 carf-metrics-influx01 kernel: sd 0:0:6:0: [sdj] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:18 carf-metrics-influx01 kernel: sd 0:0:7:0: [sdk] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:18 carf-metrics-influx01 kernel: sd 0:0:7:0: [sdk] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:20 carf-metrics-influx01 kernel: sd 11:1:50:0: [sdl] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:20 carf-metrics-influx01 kernel: sd 11:1:50:0: [sdl] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:22 carf-metrics-influx01 kernel: sd 0:0:8:0: [sdm] tag#0 Sense Key : Recovered Error [current] 
Apr 25 17:48:22 carf-metrics-influx01 kernel: sd 0:0:8:0: [sdm] tag#0 Add. Sense: Defect list not found
Apr 25 17:48:25 carf-metrics-influx01 kernel: sd 11:1:51:0: [sdp] tag#0 Sense Key : Recovered Error [current] 

> 
> * Why does mdadm think the array is degraded even though all devices are
> present?  Is it taking too long to build?
> * If we're activating, even though degraded, shouldn't LVM kick off a PV
> scan afterwards?

Maybe it's the devices are offline. Md can't read data from underlying devices.
Anyway, the boot message which has problem is very messed. There should be something error about the environment. 

Thanks
Xiao

Comment 6 James Hartsock 2017-04-29 13:08:28 UTC
They are SAS connected drives to the system via cable to the back of chassis.

While clearly see the errors, why do you think they are broken when the only issue is LVM activation on the md array's in the "bad" system?

It seems that whatever is the cause/trigger this is also what prevents the mdadm-last-resort@.timer from being canceled (and thus mdadm-last-resrort@.service) runs.  Follow this by logical volumes not being able to mount and system aborts boot asking admin to correct (which is lv activation && mount).  Is all of this the rest of the error you mentioned above?

Comment 9 Peter Rajnoha 2017-05-05 09:28:39 UTC
If you're able to reproduce, can you please follow the instructions as described in bug #1324028 comment #10 ? With this, I'll have full sequence of events as they appeared since boot - this should help me to identify why the lvm2-pvscan@major:minor service was not executed.

I can see from the sosreport that the udev db has information about the MD device where there's no SYSTEMD_WANTS="lvm2-pvscan@9:2.service" while it's still properly identified as LVM2 PV by blkid (here for md2 device with 9:2 major:minor pair). So I need to see all the previous udev events and what variables were set there to inspect this more and to check whether at least one of them had SYSTEMD_WANTS="lvm2-pvscan..." set and if not, we can inspect futher why. But I think this is exactly the cause of the problem - the SYSTEMD_WANTS="lvm2-pvscan..." was not set at all in any of the events for MD devices.

Comment 14 John Pittman 2017-05-08 13:53:27 UTC
Requested data sent; please let us know if any additional data is needed.

Comment 19 guazhang@redhat.com 2017-05-25 00:39:16 UTC
Hello
if the bug only reproduce on customer environment or not ? if yes ,QE just do  sanity test here and request OtherQA. Could someone help to check if customer can help to feedback test result?

thanks
Guazhang

Comment 23 Peter Rajnoha 2017-05-25 13:50:39 UTC
Created attachment 1282268 [details]
lvmetad udev rules to collect env var debug info

If possible, please install these modified 69-dm-lvm-metad.rules in /etc/udev/rules.d directory - I've added some more variables to store in udev database so we can see which rule path is taken.

Also, following the instructions as in bug #1324028 comment #10.

(After collecting all the logs, you can remove the /etc/udev/rules.d/69-dm-lvm-metad.rules.)

Comment 39 James Hartsock 2017-10-30 17:28:45 UTC
Case contact was updated from Sebastian to Lucas.
Lucas attempted to provide us the information; however, looks like he did the test on the system that has always been working (*02) and not the failing host (*01).  We have requested he retry on the failing host and provide the data!

Comment 40 James Hartsock 2018-01-24 22:38:12 UTC
We never did get data from the customer, and per conversion with them they are going to be migrating off this hardware.  So I think we can probably close this BZ with insufficient data.

Comment 41 Peter Rajnoha 2018-06-06 14:12:00 UTC
(In reply to James Hartsock from comment #40)
> We never did get data from the customer, and per conversion with them they
> are going to be migrating off this hardware.  So I think we can probably
> close this BZ with insufficient data.

Closing for now, if the problem appears again, please, feel free to reopen this report.