Created attachment 591811 [details] Console log of udevadm hang with rc.debug Description of problem: When I tried to test kdump with iSCSI targets, the second kernel booting was blocked during LVM scanning. This bug blocks my further beta testing for kdump iSCSI and multipath targets. Version-Release number of selected component (if applicable): kexec-tools-2.0.3-49.fc18.x86_64. dracut-018-55.git20120611.el7.noarch with the required patch cherry picked from upstream (commit 8be5a0fa945d130ef8ebb464ebbe1a865949353d). How reproducible: Always. Steps to Reproduce: 1. Set up an iSCSI device: yum install -y iscsi-initiator-utils echo "InitiatorName=iqn.1994-05.com.redhat:kdump-chao-01" >/etc/iscsi/initiatorname.iscsi iscsiadm -m discovery -t st -p 10.16.43.127 iscsiadm -m node -p 10.16.43.127 -l Now it looks like this: /dev/sda1: UUID="0b0ffeb9-31d2-4709-a988-0418b52e4931" TYPE="ext4" /dev/sda2: UUID="RZUTS5-9E67-6vO9-H0fQ-SyFl-ng4U-7K6PAn" TYPE="LVM2_member" /dev/mapper/vg_storageqe--12-lv_swap: UUID="b40522ec-4dee-4f5d-ae5e-fd5b2d29df76" TYPE="swap" /dev/mapper/vg_storageqe--12-lv_root: UUID="b761bbbc-be51-4c08-b810-b4d5cebf2048" TYPE="ext4" /dev/mapper/vg_storageqe--12-lv_home: UUID="08d533cf-7691-4b56-a1ce-5baf7743b297" TYPE="ext4" /dev/sdb1: UUID="19d3bd2e-a27a-41ac-add3-8e04d175755c" TYPE="ext4" 2. Edit /etc/kdump.conf and service kdump restart: ext4 UUID=19d3bd2e-a27a-41ac-add3-8e04d175755c 3. echo c >/proc/sysrq-trigger Actual results: Dracut hanged when trying to set up LVM for the non-iSCSI device. ... [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.3.0-0.13.el7.x86_64 root=/dev/mapper/vg_storageqe--12-lv_root ro rd.md=0 LANG=en_US.UTF-8 rd.lvm.lv=vg_storageqe-12/lv_root KEYTABLE=us SYSFONT=True console=ttyS1,115200 rd.lvm.lv=vg_storageqe-12/lv_swap rd.dm=0 rd.luks=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off memmap=exactmap memmap=573K@64K memmap=261544K@589824K elfcorehdr=851368K memmap=192K#3111024K ... [ 25.011391] ADDRCONF(NETDEV_CHANGE): em1: link becomes ready iscsi:@10.16.43.127::::iqn.1992-08.com.netapp:sn.151753773 iscsistart: transport class version 2.0-870. iscsid version 2.0-872.18.f17 iscsistart: Logging into iqn.1992-08.com.netapp:sn.151753773 10.16.43.127:3260,1 iscsistart: can not connect to iSCSI daemon (111)! iscsistart: version 2.0-872.18.f17 [ 36.320639] scsi4 : iSCSI Initiator over TCP/IP iscsistart: Connection1:0 to [target: iqn.1992-08.com.netapp:sn.151753773, portal: 10.16.43.127,3260] through [iface: default] is operational now [ 36.920804] scsi 4:0:0:0: Direct-Access NETAPP LUN 8020 PQ: 0 ANSI: 5 [ 37.014535] sd 4:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB) [ 37.153655] sd 4:0:0:0: [sda] Write Protect is off [ 37.181094] sd 4:0:0:0: [sda] Mode Sense: bd 00 00 08 [ 37.210614] sd 4:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 37.268348] sda: sda1 [ 37.287133] sd 4:0:0:0: [sda] Attached SCSI disk [ 40.612141] connection1:0: detected conn error (1020) [ 54.352513] hpsa 0000:03:00.0: controller message 03:00 succeeded [ 54.390380] hpsa 0000:03:00.0: MSIX [ 54.411395] hpsa 0000:03:00.0: irq 79 for MSI/MSI-X [ 54.441917] hpsa 0000:03:00.0: irq 80 for MSI/MSI-X [ 54.472357] hpsa 0000:03:00.0: irq 81 for MSI/MSI-X [ 54.501182] hpsa 0000:03:00.0: irq 82 for MSI/MSI-X [ 54.551361] hpsa 0000:03:00.0: hpsa0: <0x323a> at IRQ 79 using DAC [ 54.608315] scsi5 : hpsa [ 54.653380] hpsa 0000:03:00.0: RAID device c5b0t0l0 added. [ 54.654012] hpsa 0000:03:00.0: Direct-Access device c5b0t0l1 added. [ 54.746462] scsi 5:0:0:0: RAID HP P410i 5.14 PQ: 0 ANSI: 5 [ 54.792637] scsi 5:0:0:1: Direct-Access HP LOGICAL VOLUME 5.14 PQ: 0 ANSI: 5 [ 54.846240] sd 5:0:0:1: [sdb] 1953459632 512-byte logical blocks: (1.00 TB/931 GiB) [ 54.890740] sd 5:0:0:1: [sdb] Write Protect is off [ 54.918968] sd 5:0:0:1: [sdb] Mode Sense: 6b 00 00 08 [ 54.949246] sd 5:0:0:1: [sdb] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA [ 55.009362] sdb: sdb1 sdb2 [ 55.036581] sd 5:0:0:1: [sdb] Attached SCSI disk [ 55.471848] dracut: Scanning devices sdb2 for LVM logical volumes vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap (waited too long here) With rd.debug, here it printed: [ 62.450124] dracut: /init@158(): check_finished [ 62.477257] dracut: /lib/dracut-lib.sh@271(check_finished): local f [ 62.514314] dracut: /lib/dracut-lib.sh@272(check_finished): for f in '$hookdir/initqueue/finished/*.sh' [ 62.569781] dracut: /lib/dracut-lib.sh@273(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']' [ 62.678767] dracut: /lib/dracut-lib.sh@274(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh' ']' [ 62.765361] dracut: /lib/dracut-lib.sh@274(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh' [ 62.846270] dracut: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh'[' -e /dev/mapper/vg_storageqe--12-lv_root ']' [ 62.932852] dracut: /lib/dracut-lib.sh@274(check_finished): return 1 [ 62.970813] dracut: /init@161(): udevadm settle --timeout=0 (hangs) Or instead, it printed this: [ 346.326615] dracut: /sbin/lvm_scan@113(): lvm lvchange -ay --sysinit vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap [ 494.046094] INFO: task lvm:762 blocked for more than 120 seconds. [ 494.082642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.130933] lvm D ffff88002cfb2598 4944 762 741 0x00000000 [ 494.173781] ffff88002c61dae8 0000000000000046 ffff88002d73cd00 ffff88002c61dfd8 [ 494.219593] ffff88002c61dfd8 ffff88002c61dfd8 ffffffff8193f020 ffff88002d73cd00 [ 494.262079] 000000002c61db08 ffff88002cfb25a0 ffff88002c61c000 ffff88002c61dfd8 [ 494.306281] Call Trace: [ 494.319784] [<ffffffff816610bf>] schedule+0x3f/0x60 [ 494.347080] [<ffffffff8165ed6a>] mutex_lock_nested+0x19a/0x3a0 [ 494.380697] [<ffffffff811f9c5e>] ? __blkdev_get+0x6e/0x4c0 [ 494.411097] [<ffffffff811f9c5e>] ? __blkdev_get+0x6e/0x4c0 [ 494.442536] [<ffffffff811f9c5e>] __blkdev_get+0x6e/0x4c0 [ 494.472112] [<ffffffff811fa265>] blkdev_get+0x1b5/0x330 [ 494.501785] [<ffffffff81662c2b>] ? _raw_spin_unlock+0x2b/0x50 [ 494.533756] [<ffffffff811fa44e>] blkdev_open+0x6e/0x90 [ 494.563313] [<ffffffff811baccb>] __dentry_open+0x30b/0x4f0 [ 494.594174] [<ffffffff811fa3e0>] ? blkdev_get+0x330/0x330 [ 494.626779] [<ffffffff811bc284>] nameidata_to_filp+0x74/0x80 [ 494.660691] [<ffffffff811ce640>] do_last+0x460/0x900 [ 494.690384] [<ffffffff811cebf2>] path_openat+0xd2/0x410 [ 494.721281] [<ffffffff811cf051>] do_filp_open+0x41/0xa0 [ 494.752977] [<ffffffff81662c2b>] ? _raw_spin_unlock+0x2b/0x50 [ 494.787649] [<ffffffff811dd616>] ? alloc_fd+0x186/0x210 [ 494.819199] [<ffffffff811bc37d>] do_sys_open+0xed/0x1c0 [ 494.850682] [<ffffffff811bc471>] sys_open+0x21/0x30 [ 494.877995] [<ffffffff8166b869>] system_call_fastpath+0x16/0x1b [ 494.911584] INFO: lockdep is turned off. (repeats this INFO every two minutes) See attachments for full debug log.
Created attachment 591812 [details] Console log with rc.debug and rc.udev.debug
Hi, can you test without iscsi setup? just local fs dump with lvm..
(In reply to comment #3) > Hi, can you test without iscsi setup? just local fs dump with lvm.. I have tested that. Okay here is the log for comparison: [ 54.350475] hpsa 0000:03:00.0: RAID device c3b0t0l0 added. [ 54.351022] hpsa 0000:03:00.0: Direct-Access device c3b0t0l1 added. [ 54.511352] scsi 3:0:0:0: RAID HP P410i 5.14 PQ: 0 ANSI: 5 [ 54.558556] scsi 3:0:0:1: Direct-Access HP LOGICAL VOLUME 5.14 PQ: 0 ANSI: 5 [ 54.628607] sd 3:0:0:1: [sda] 1953459632 512-byte logical blocks: (1.00 TB/931 GiB) [ 54.673671] sd 3:0:0:1: [sda] Write Protect is off [ 54.700362] sd 3:0:0:1: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA [ 54.760509] sda: sda1 sda2 [ 54.781300] sd 3:0:0:1: [sda] Attached SCSI disk [ 55.298924] dracut: Scanning devices sda2 for LVM logical volumes vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap [ 55.414534] dracut: inactive '/dev/vg_storageqe-12/lv_swap' [127.84 GiB] inherit [ 55.461428] dracut: inactive '/dev/vg_storageqe-12/lv_home' [753.12 GiB] inherit [ 55.506560] dracut: inactive '/dev/vg_storageqe-12/lv_root' [50.00 GiB] inherit [ 58.558227] EXT4-fs (dm-1): INFO: recovery required on readonly filesystem [ 58.599451] EXT4-fs (dm-1): write access will be enabled during recovery [ 59.317330] EXT4-fs (dm-1): recovery complete [ 59.360226] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null) [ 59.409539] mount used greatest stack depth: 3440 bytes left [ 59.639924] dracut: Checking ext4: /dev/mapper/vg_storageqe--12-lv_root [ 59.677412] dracut: issuing e2fsck -a /dev/mapper/vg_storageqe--12-lv_root [ 59.824945] dracut: /dev/mapper/vg_storageqe--12-lv_root: clean, 83245/3276800 files, 704630/13107200 blocks [ 59.899552] dracut: Remounting /dev/mapper/vg_storageqe--12-lv_root with -o ro [ 60.039060] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null) [ 60.120997] dracut: Mounted root filesystem /dev/mapper/vg_storageqe--12-lv_root [ 60.168234] dracut: Mounting from /etc/fstab [ 60.211239] dracut: Mounting /dev/mapper/vg_storageqe--12-lv_root [ 60.256652] dracut: mount: /dev/mapper/vg_storageqe--12-lv_root is already mounted or /sysroot busy [ 60.309392] dracut: /dev/mapper/vg_storageqe--12-lv_root is already mounted on /sysroot ++ KDUMP_PATH=/var/crash ... (kdump stuff all good)
not sure what is happening here. So lvm hangs. lvm lvchange -ay --sysinit vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap And it happens when iscsi target is dump target and does not happen when we are dumping to root disk. Strange. Only significant change I see is disk renaming. With iscsi, root disk has been renamed to sdb instead of sda. I am hoping that should not cause a problem.
(In reply to comment #7) > not sure what is happening here. > > So lvm hangs. > > lvm lvchange -ay --sysinit vg_storageqe-12/lv_root vg_storageqe-12/lv_root > vg_storageqe-12/lv_swap > > And it happens when iscsi target is dump target and does not happen when we > are dumping to root disk. Strange. > > Only significant change I see is disk renaming. With iscsi, root disk has > been renamed to sdb instead of sda. I am hoping that should not cause a > problem. Extra care has to be taken to ensure correct kdump restart. iscsi service will not start at boot because of bug #831572, so you have to manually start it and mount the iscsi device to somewhere (don't use fstab, or boot fails), as required by kdumpctl. And touch /etc/kdump.conf otherwise kdumpctl won't mkdumprd. Until iscsi works at all, I don't think I can talk about multipath. I only picked your patch from dracut. I'll try rawhide on it.
On that same system, iSCSI works fine with rawhide, kexec-tools-2.0.3-49.fc18.x86_64, dracut-019-40.git20120618.fc18.noarch, kernel-3.3.4-5.fc17.x86_64 (kdump hanged before the second kernel with kernel-3.5.0-0.rc3.git0.2.fc18.x86_64). I'll try updating the involved components and re-test.
After quite some testing, I didn't manage to reproduce this bug. There was certain subtlety in setting up the iSCSI/multipath environment which I didn't test and document well. I guess it was caused by some multipath misconfiguration, as the device used wes actually multipath'ed. Yet later I encountered similar hang with blkid. I didn't flush multipath device maps before `service multipathd stop` (this doesn't flush those by design). Then further operations on multipath devices got blocked infinitely, and the kernel printed similar call traces.
No newer reproduction has been observed. Consider this fixed.