Hide Forgot
Created attachment 1128673 [details] journalctl part 1 Description of problem: With mdadm devices that are very large, at boot, the device will not create fast enough causing device timeout and udev settle to fail. Version-Release number of selected component (if applicable): 3.10.0-327.4.5.el7.x86_64 How reproducible: Create a very large mdadm device. At boot, from what I can tell, assembly takes too long and we timeout. Steps to Reproduce: 1.Create mdadm device - My example ~400TB w/ 74 legs 2.Format w/ XFS and add line to fstab 3.Allow to assemble, reboot Actual results: Boot fails with: Feb 11 06:50:41 host systemd[1]: dev-md1.device changed dead -> tentative Feb 11 06:50:41 host systemd[1]: Installed new job dev-md1.device/start as 2456 Feb 11 06:53:42 host systemd[1]: Job dev-md1.device/start timed out. Feb 11 06:53:42 host systemd[1]: Job dev-md1.device/start finished, result=timeout Feb 11 06:53:42 host systemd[1]: Timed out waiting for device /dev/md1. Feb 11 06:53:42 host systemd[1]: Job dev-md1.device/start failed with result 'timeout'. (1 of 2) A start job is running for dev-md1.device (51s / 1min 30s) (1 of 2) A start job is running for dev-md1.device (51s / 1min 30s) (2 of 2) A start job is running for udev Wait for Complete Device Initialization (49s / 2min 57s) (2 of 2) A start job is running for udev Wait for Complete Device Initialization (49s / 2min 57s) (2 of 2) A start job is running for udev Wait for Complete Device Initialization (50s / 2min 57s) A start job is running for udev Wait for Complete Device Initialization (1min 57s / 2min 57s) Failed to start udev Wait for Complete Device Initialization. See 'systemctl status systemd-udev-settle.service' for details. Starting Activation of DM RAID sets... Started Activation of DM RAID sets. Reached target Local File Systems. UNIT ACTIVE SUB DESCRIPTION dev-md1.device loaded activating tentative /dev/md1 systemd-udev-settle.service loaded failed failed udev Wait for Complete Device Initialization Expected results: We should wait for the md device to complete initialization. I found https://bugzilla.redhat.com/show_bug.cgi?id=1208680 ; I was going to ask there if this issue could be resolved by the possible fix mentioned there, but did not want to crowd the bug in case I was incorrect. Additional info: Workaround for issue is to add nofail to the fstab entry for the md device, then mount manually post boot. The disks are "JBOD"; seagate ST6000NM0034 behind Fusion MPT SAS Host I've had a bit of a hard time fully isolating things from the logs, so I will attach the full logs to the bug. Also, still waiting on /proc/mdstat output, and array_state at the moment of failure; will update with that when I get it.
74 legs? Second, please provide some _reasonable_ log files. The files > 300MB and go back 5 days - please reduce it to something that can actually be analyzed. Last, you talk about failure - did the array synchronize prior to reboot, or is it still in the process of needing to be synchronized? It sounds like systemd kills the script during startup because it takes too long. How long does it take if you assemble the array on the command line? Jes
Thanks for looking into this Jes; I will get the information you asked for asap. Wanted to mention as well, the customer has a second environment with 44 disk raid6 array showing the same symptoms. I tried a recreate with 45 disks but was not successful; I am assuming due to disk size. For clarity /proc/mdstat from 74 disk array after boot. md1 : active raid6 sdbl1[28] sdbn1[30] sdbr1[34] sdbu1[37] sdbd1[20] sdce1[48] sdau1[10] sdax1[13] sdaz1[15] sdbj1[26] sdbq1[33] sdbg1[23] sdbs1[35] sdca1[44] sdby1[41] sdv1[69] sds1[66] sdbf1[22] sdaw1[12] sdcc1[46] sdcf1[49] sdbx1[40] sdbi1[25] sdaq1[6] sdbk1[27] sdar1[7] sdap1[5] sdcb1[45] sdbt1[36] sdav1[11] sday1[14] sdbv1[38] sdbw1[39] sdao1[4] sdy1[72] sdbe1[21] sdba1[17] sdcg1[50] sdat1[9] sdal1[1] sdcd1[47] sdas1[8] sdbz1[42] sdam1[2] sdw1[70] sdbm1[29] sdbp1[32] sdbh1[24] sdbc1[19] sdbo1[31] sdx1[71] sdu1[68] sdbb1[18] sdt1[67] sdan1[3] sdak1[0] sdr1[65] sdz1[73] sdq1[64] sdp1[63] sdk1[58] sdj1[57] sdi1[56] sdh1[55] sdg1[54] sde1[52] sdf1[53] sdd1[51] sdo1[62] sdm1[60] sdl1[59] sdn1[61] sdc1[43] sdb1[16] 421948071936 blocks super 1.2 level 6, 512k chunk, algorithm 2 [74/74] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] bitmap: 20/44 pages [80KB], 65536KB chunk John
Hi Jes, I attached a smaller log file; only the messages from Feb 13th. The customer is working within their next maintenance window to grab us the additional data you asked for. Will update as soon as I get the next set of data. John
Hello re_install OS and same test result as before test steps: [root@storageqe-09 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 930.5G 0 part ├─rhel_storageqe--09-root 253:0 0 50G 0 lvm / ├─rhel_storageqe--09-swap 253:1 0 7.8G 0 lvm [SWAP] └─rhel_storageqe--09-home 253:2 0 872.7G 0 lvm /home ├─sdg1 8:97 0 1.8T 0 part └─sdg2 8:98 0 1.8T 0 part sdh 8:112 0 3.7T 0 disk [root@storageqe-09 ~]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdg[12] --assume-clean mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. [root@storageqe-09 ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdg2[1] sdg1[0] 1953376256 blocks super 1.2 [2/2] [UU] bitmap: 0/15 pages [0KB], 65536KB chunk mkfs.ext4 /dev/md0 mount /dev/md0 /mnt/bug sdg 8:96 0 3.7T 0 disk ├─sdg1 8:97 0 1.8T 0 part │ └─md0 9:0 0 1.8T 0 raid1 /mnt/bug └─sdg2 8:98 0 1.8T 0 part └─md0 9:0 0 1.8T 0 raid1 /mnt/bug vim /etc/fstable /dev/md0 /mnt/bug default 1 2 reboot [ TIME ] Timed out waiting for device dev-md0.device. [DEPEND] Dependency failed for /mnt/bug. [DEPEND] Dependency failed for Local File Systems. [DEPEND] Dependency failed for Relabel all filesystems, if necessary. [DEPEND] Dependency failed for Migrate local... structure to the new structure. [DEPEND] Dependency failed for File System Check on /dev/md0. system drop into "Welcome to emergGive root password for maintenance" ├─sdg1 8:97 0 1.8T 0 part │ └─md127 9:127 0 1.8T 0 raid1 └─sdg2 8:98 0 1.8T 0 part └─md127 9:127 0 1.8T 0 raid1 sdh 8:112 0 3.7T 0 disk [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md127 : active (auto-read-only) raid1 sdg1[0] sdg2[1] 1953376256 blocks super 1.2 [2/2] [UU] bitmap: 0/15 pages [0KB], 65536KB chunk
Awesome job Guazhang Thank you. It's reproduced. Xiao this is very easy as you can see now to reproduce. Please can we now make progress. Perhaps Nigel Croxon can also help. Regards Laurence
I am able to reproduce this issue at will: # mdadm --examine --scan --verbose ARRAY /dev/md/0 level=raid1 metadata=1.2 num-devices=2 UUID=91d394e5:23b21c73:0715f5f2:8e9a5607 name=0 devices=/dev/sdd,/dev/sdc # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdd[1] sdc[0] 1953383488 blocks super 1.2 [2/2] [UU] [======>..............] resync = 31.3% (613192512/1953383488) finish=172.4min speed=129488K/sec bitmap: 11/15 pages [44KB], 65536KB chunk Adding either the mdraid uuid or /dev/md0 to fstab results in the same boot failure: #UUID=91d394e5-23b2-1c73-0715-f5f28e9a5607 /test xfs defaults 1 2 /dev/md0 /test xfs defaults 1 2 [ TIME ] Timed out waiting for device dev-md0.device. [DEPEND] Dependency failed for /test. [DEPEND] Dependency failed for Local File Systems. [DEPEND] Dependency failed for File Systems Check on /dev/md0. After further debugging the root cause appears to be some contention and/or race condition between dmraid and mdraid components. The key to the reproducer is raid devices need to be attached to Intel RST controller. 00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05) pci-0000:00:11.4-ata-3.0 -> ../../sdc pci-0000:00:11.4-ata-4.0 -> ../../sdd # dmraid -b /dev/sdd: 3907029168 total, "WD-WCC4M0267803" /dev/sdc: 3907029168 total, "WD-WCC4M0262885" By default both dmraid and mdraid components are enabled, which it seems dmraid fires first and activates the raid w ism devname and uuid which then results in fstab and dependency failures. The solution is to disable dmraid component by adding rd.dm=0 to GRUB_CMDLINE_LINUX in /etc/default/grub: GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=vg_root/root rd.luks=0 rd.dm=0 rd.lvm.lv=vg_root/swap rhgb quiet" Rebuild grub2: # grub2-mkconfig -o /boot/grub2/grub.cfg # grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg reboot. The system will now reboot successfully with either mdadm's devname /dev/md0 or uuid.
Hello, asking jon and john as they have been more recently active in this case. Regards Laurence
Hi Xiao, from the initial description we see "Started Activation of DM RAID sets.", which makes it seem as if Jon's suspicion in comment 46 is correct. Also, in regard to the rc.local edit, that was a workaround provided to the customer by Laurence, so that would not negate Jon's suspicion if I am understanding correctly. What we thought was due to large, numerous devices, may be due to the dmraid component. The customer has long been unresponsive. If you have a test package, we can likely test on Jon's system.
Regardless of mdadm test package I can no longer reproduce on the latest kernel, even using various versions of mdadm and systemd. The largest devices I have available are 2TB each which was used below. [root@storage-test ~]# uname -r 3.10.0-957.10.1.el7.x86_64 [root@storage-test ~]# rpm -qa |grep mdadm mdadm-4.1-rc1_2.el7.x86_64 [root@storage-test ~]# lsscsi [3:0:0:0] disk ATA Samsung SSD 850 2B6Q /dev/sda [4:0:0:0] disk ATA WDC WD20EZRX-19D 0A81 /dev/sdb [5:0:0:0] disk ATA WDC WD20EZRX-19D 0A81 /dev/sdc [root@storage-test ~]# dmraid -b /dev/sdc: 3907029168 total, "WD-WCC4M0375096" /dev/sdb: 3907029168 total, "WD-WCC4M0258187" /dev/sda: 500118192 total, "S251NXAH345920N" [root@storage-test ~]# mdadm -C /dev/md0 --metadata=1.2 --level raid1 -n 2 /dev/sd[b-c] mdadm: array /dev/md0 started. [root@storage-test ~]# mkfs.xfs /dev/md0 [root@storage-test ~]# mount /dev/md0 /test Tested mounting by dev name and uuid: UUID=7debbf22-0c1d-450c-9569-920ab0357c22 /test xfs defaults 1 2 /dev/md0 /test xfs defaults 1 2 [root@storage-test ~]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-957.10.1.el7.x86_64 root=/dev/mapper/vg_root-root ro crashkernel=auto rd.lvm.lv=vg_root/root rd.lvm.lv=vg_root/swap rhgb quiet LANG=en_US.UTF-8 [root@storage-test ~]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-862.14.4.el7.x86_64 root=/dev/mapper/vg_root-root ro crashkernel=auto rd.lvm.lv=vg_root/root rd.lvm.lv=vg_root/swap rhgb quiet [root@storage-test ~]# yum history Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager ID | Command line | Date and time | Action(s) | Altered ------------------------------------------------------------------------------- 61 | update systemd | 2019-04-10 09:46 | Update | 10 60 | update mdadm | 2019-04-10 09:40 | Update | 1 59 | update kernel | 2019-04-10 09:39 | E, I, U | 14 EE [root@storage-test ~]# yum history undo 61 [root@storage-test ~]# yum history undo 60 4.0-13.el7.x86_64 @anaconda/7.5 4.1-rc1_2.el7.x86_64 @rhel-7-server-rpms The one thing that differs from original reproducer from latest is the following is now observed: --- Apr 10 10:16:33 storage-test systemd: Stopped Timer to wait for more drives before activating degraded array md0.. Which looks to be logged from unit mdadm-last-resort My raid is still resync'ing which is painfully taking a long time, but I will continue testing once this completes and see if behavior changes. Will also downgrade udev. [root@storage-test ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdc[1] sdb[0] 1953382464 blocks super 1.2 [2/2] [UU] [====>................] resync = 20.0% (391244352/1953382464) finish=201.8min speed=128992K/sec bitmap: 13/15 pages [52KB], 65536KB chunk unused devices: <none>