832034 – hang during crash dump on iSCSI targets

Bug 832034 - hang during crash dump on iSCSI targets

Summary: hang during crash dump on iSCSI targets

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kexec-tools
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Dave Young
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-06-14 11:02 UTC by Lingzhu Xiang
Modified:	2013-07-04 20:56 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-09-27 02:53:07 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Console log of udevadm hang with rc.debug (114.77 KB, text/plain) 2012-06-14 11:02 UTC, Lingzhu Xiang	no flags	Details
Console log with rc.debug and rc.udev.debug (123.81 KB, text/plain) 2012-06-14 11:04 UTC, Lingzhu Xiang	no flags	Details
View All

Description Lingzhu Xiang 2012-06-14 11:02:41 UTC

Created attachment 591811 [details]
Console log of udevadm hang with rc.debug

Description of problem:

When I tried to test kdump with iSCSI targets, the second kernel booting was blocked during LVM scanning.

This bug blocks my further beta testing for kdump iSCSI and multipath targets.


Version-Release number of selected component (if applicable):

kexec-tools-2.0.3-49.fc18.x86_64.

dracut-018-55.git20120611.el7.noarch with the required patch cherry picked from upstream (commit 8be5a0fa945d130ef8ebb464ebbe1a865949353d).


How reproducible:
Always.


Steps to Reproduce:
1. Set up an iSCSI device:
    yum install -y iscsi-initiator-utils
    echo "InitiatorName=iqn.1994-05.com.redhat:kdump-chao-01" >/etc/iscsi/initiatorname.iscsi
    iscsiadm -m discovery -t st -p 10.16.43.127
    iscsiadm -m node -p 10.16.43.127 -l

Now it looks like this:

/dev/sda1: UUID="0b0ffeb9-31d2-4709-a988-0418b52e4931" TYPE="ext4" 
/dev/sda2: UUID="RZUTS5-9E67-6vO9-H0fQ-SyFl-ng4U-7K6PAn" TYPE="LVM2_member" 
/dev/mapper/vg_storageqe--12-lv_swap: UUID="b40522ec-4dee-4f5d-ae5e-fd5b2d29df76" TYPE="swap" 
/dev/mapper/vg_storageqe--12-lv_root: UUID="b761bbbc-be51-4c08-b810-b4d5cebf2048" TYPE="ext4" 
/dev/mapper/vg_storageqe--12-lv_home: UUID="08d533cf-7691-4b56-a1ce-5baf7743b297" TYPE="ext4" 
/dev/sdb1: UUID="19d3bd2e-a27a-41ac-add3-8e04d175755c" TYPE="ext4"

2. Edit /etc/kdump.conf and service kdump restart:
    ext4 UUID=19d3bd2e-a27a-41ac-add3-8e04d175755c

3. echo c >/proc/sysrq-trigger


Actual results:

Dracut hanged when trying to set up LVM for the non-iSCSI device.
 
...
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.3.0-0.13.el7.x86_64 root=/dev/mapper/vg_storageqe--12-lv_root ro rd.md=0 LANG=en_US.UTF-8 rd.lvm.lv=vg_storageqe-12/lv_root KEYTABLE=us SYSFONT=True console=ttyS1,115200 rd.lvm.lv=vg_storageqe-12/lv_swap rd.dm=0 rd.luks=0  irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off memmap=exactmap memmap=573K@64K memmap=261544K@589824K elfcorehdr=851368K memmap=192K#3111024K
...
[   25.011391] ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
iscsi:@10.16.43.127::::iqn.1992-08.com.netapp:sn.151753773
iscsistart: transport class version 2.0-870. iscsid version 2.0-872.18.f17
iscsistart: Logging into iqn.1992-08.com.netapp:sn.151753773 10.16.43.127:3260,1
iscsistart: can not connect to iSCSI daemon (111)!
iscsistart: version 2.0-872.18.f17
[   36.320639] scsi4 : iSCSI Initiator over TCP/IP
iscsistart: Connection1:0 to [target: iqn.1992-08.com.netapp:sn.151753773, portal: 10.16.43.127,3260] through [iface: default] is operational now
[   36.920804] scsi 4:0:0:0: Direct-Access     NETAPP   LUN              8020 PQ: 0 ANSI: 5
[   37.014535] sd 4:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
[   37.153655] sd 4:0:0:0: [sda] Write Protect is off
[   37.181094] sd 4:0:0:0: [sda] Mode Sense: bd 00 00 08
[   37.210614] sd 4:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   37.268348]  sda: sda1
[   37.287133] sd 4:0:0:0: [sda] Attached SCSI disk
[   40.612141]  connection1:0: detected conn error (1020)
[   54.352513] hpsa 0000:03:00.0: controller message 03:00 succeeded
[   54.390380] hpsa 0000:03:00.0: MSIX
[   54.411395] hpsa 0000:03:00.0: irq 79 for MSI/MSI-X
[   54.441917] hpsa 0000:03:00.0: irq 80 for MSI/MSI-X
[   54.472357] hpsa 0000:03:00.0: irq 81 for MSI/MSI-X
[   54.501182] hpsa 0000:03:00.0: irq 82 for MSI/MSI-X
[   54.551361] hpsa 0000:03:00.0: hpsa0: <0x323a> at IRQ 79 using DAC
[   54.608315] scsi5 : hpsa
[   54.653380] hpsa 0000:03:00.0: RAID              device c5b0t0l0 added.
[   54.654012] hpsa 0000:03:00.0: Direct-Access     device c5b0t0l1 added.
[   54.746462] scsi 5:0:0:0: RAID              HP       P410i            5.14 PQ: 0 ANSI: 5
[   54.792637] scsi 5:0:0:1: Direct-Access     HP       LOGICAL VOLUME   5.14 PQ: 0 ANSI: 5
[   54.846240] sd 5:0:0:1: [sdb] 1953459632 512-byte logical blocks: (1.00 TB/931 GiB)
[   54.890740] sd 5:0:0:1: [sdb] Write Protect is off
[   54.918968] sd 5:0:0:1: [sdb] Mode Sense: 6b 00 00 08
[   54.949246] sd 5:0:0:1: [sdb] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
[   55.009362]  sdb: sdb1 sdb2
[   55.036581] sd 5:0:0:1: [sdb] Attached SCSI disk
[   55.471848] dracut: Scanning devices sdb2  for LVM logical volumes vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap

(waited too long here)


With rd.debug, here it printed:

[   62.450124] dracut: /init@158(): check_finished
[   62.477257] dracut: /lib/dracut-lib.sh@271(check_finished): local f
[   62.514314] dracut: /lib/dracut-lib.sh@272(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[   62.569781] dracut: /lib/dracut-lib.sh@273(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[   62.678767] dracut: /lib/dracut-lib.sh@274(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh' ']'
[   62.765361] dracut: /lib/dracut-lib.sh@274(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh'
[   62.846270] dracut: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2fvg_storageqe--12-lv_root.sh'[' -e /dev/mapper/vg_storageqe--12-lv_root ']'
[   62.932852] dracut: /lib/dracut-lib.sh@274(check_finished): return 1
[   62.970813] dracut: /init@161(): udevadm settle --timeout=0

(hangs)

Or instead, it printed this:

[  346.326615] dracut: /sbin/lvm_scan@113(): lvm lvchange -ay --sysinit vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap
[  494.046094] INFO: task lvm:762 blocked for more than 120 seconds.
[  494.082642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.130933] lvm             D ffff88002cfb2598  4944   762    741 0x00000000
[  494.173781]  ffff88002c61dae8 0000000000000046 ffff88002d73cd00 ffff88002c61dfd8
[  494.219593]  ffff88002c61dfd8 ffff88002c61dfd8 ffffffff8193f020 ffff88002d73cd00
[  494.262079]  000000002c61db08 ffff88002cfb25a0 ffff88002c61c000 ffff88002c61dfd8
[  494.306281] Call Trace:
[  494.319784]  [<ffffffff816610bf>] schedule+0x3f/0x60
[  494.347080]  [<ffffffff8165ed6a>] mutex_lock_nested+0x19a/0x3a0
[  494.380697]  [<ffffffff811f9c5e>] ? __blkdev_get+0x6e/0x4c0
[  494.411097]  [<ffffffff811f9c5e>] ? __blkdev_get+0x6e/0x4c0
[  494.442536]  [<ffffffff811f9c5e>] __blkdev_get+0x6e/0x4c0
[  494.472112]  [<ffffffff811fa265>] blkdev_get+0x1b5/0x330
[  494.501785]  [<ffffffff81662c2b>] ? _raw_spin_unlock+0x2b/0x50
[  494.533756]  [<ffffffff811fa44e>] blkdev_open+0x6e/0x90
[  494.563313]  [<ffffffff811baccb>] __dentry_open+0x30b/0x4f0
[  494.594174]  [<ffffffff811fa3e0>] ? blkdev_get+0x330/0x330
[  494.626779]  [<ffffffff811bc284>] nameidata_to_filp+0x74/0x80
[  494.660691]  [<ffffffff811ce640>] do_last+0x460/0x900
[  494.690384]  [<ffffffff811cebf2>] path_openat+0xd2/0x410
[  494.721281]  [<ffffffff811cf051>] do_filp_open+0x41/0xa0
[  494.752977]  [<ffffffff81662c2b>] ? _raw_spin_unlock+0x2b/0x50
[  494.787649]  [<ffffffff811dd616>] ? alloc_fd+0x186/0x210
[  494.819199]  [<ffffffff811bc37d>] do_sys_open+0xed/0x1c0
[  494.850682]  [<ffffffff811bc471>] sys_open+0x21/0x30
[  494.877995]  [<ffffffff8166b869>] system_call_fastpath+0x16/0x1b
[  494.911584] INFO: lockdep is turned off.

(repeats this INFO every two minutes)

See attachments for full debug log.

Comment 1 Lingzhu Xiang 2012-06-14 11:04:08 UTC

Created attachment 591812 [details]
Console log with rc.debug and rc.udev.debug

Comment 3 Dave Young 2012-06-15 02:17:03 UTC

Hi, can you test without iscsi setup? just local fs dump with lvm..

Comment 4 Lingzhu Xiang 2012-06-15 03:54:47 UTC

(In reply to comment #3)
> Hi, can you test without iscsi setup? just local fs dump with lvm..

I have tested that. Okay here is the log for comparison:

[   54.350475] hpsa 0000:03:00.0: RAID              device c3b0t0l0 added.
[   54.351022] hpsa 0000:03:00.0: Direct-Access     device c3b0t0l1 added.
[   54.511352] scsi 3:0:0:0: RAID              HP       P410i            5.14 PQ: 0 ANSI: 5
[   54.558556] scsi 3:0:0:1: Direct-Access     HP       LOGICAL VOLUME   5.14 PQ: 0 ANSI: 5
[   54.628607] sd 3:0:0:1: [sda] 1953459632 512-byte logical blocks: (1.00 TB/931 GiB)
[   54.673671] sd 3:0:0:1: [sda] Write Protect is off
[   54.700362] sd 3:0:0:1: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
[   54.760509]  sda: sda1 sda2
[   54.781300] sd 3:0:0:1: [sda] Attached SCSI disk
[   55.298924] dracut: Scanning devices sda2  for LVM logical volumes vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap
[   55.414534] dracut: inactive '/dev/vg_storageqe-12/lv_swap' [127.84 GiB] inherit
[   55.461428] dracut: inactive '/dev/vg_storageqe-12/lv_home' [753.12 GiB] inherit
[   55.506560] dracut: inactive '/dev/vg_storageqe-12/lv_root' [50.00 GiB] inherit
[   58.558227] EXT4-fs (dm-1): INFO: recovery required on readonly filesystem
[   58.599451] EXT4-fs (dm-1): write access will be enabled during recovery
[   59.317330] EXT4-fs (dm-1): recovery complete
[   59.360226] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[   59.409539] mount used greatest stack depth: 3440 bytes left
[   59.639924] dracut: Checking ext4: /dev/mapper/vg_storageqe--12-lv_root
[   59.677412] dracut: issuing e2fsck -a  /dev/mapper/vg_storageqe--12-lv_root
[   59.824945] dracut: /dev/mapper/vg_storageqe--12-lv_root: clean, 83245/3276800 files, 704630/13107200 blocks
[   59.899552] dracut: Remounting /dev/mapper/vg_storageqe--12-lv_root with -o ro
[   60.039060] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[   60.120997] dracut: Mounted root filesystem /dev/mapper/vg_storageqe--12-lv_root
[   60.168234] dracut: Mounting from /etc/fstab
[   60.211239] dracut: Mounting /dev/mapper/vg_storageqe--12-lv_root
[   60.256652] dracut: mount: /dev/mapper/vg_storageqe--12-lv_root is already mounted or /sysroot busy
[   60.309392] dracut: /dev/mapper/vg_storageqe--12-lv_root is already mounted on /sysroot
++ KDUMP_PATH=/var/crash
...
(kdump stuff all good)

Comment 7 Vivek Goyal 2012-06-19 15:02:58 UTC

not sure what is happening here.

So lvm hangs.

lvm lvchange -ay --sysinit vg_storageqe-12/lv_root vg_storageqe-12/lv_root vg_storageqe-12/lv_swap

And it happens when iscsi target is dump target and does not happen when we are dumping to root disk. Strange.

Only significant change I see is disk renaming. With iscsi, root disk has been renamed to sdb instead of sda. I am hoping that should not cause a problem.

Comment 9 Lingzhu Xiang 2012-06-20 01:29:27 UTC

(In reply to comment #7)
> not sure what is happening here.
> 
> So lvm hangs.
> 
> lvm lvchange -ay --sysinit vg_storageqe-12/lv_root vg_storageqe-12/lv_root
> vg_storageqe-12/lv_swap
> 
> And it happens when iscsi target is dump target and does not happen when we
> are dumping to root disk. Strange.
> 
> Only significant change I see is disk renaming. With iscsi, root disk has
> been renamed to sdb instead of sda. I am hoping that should not cause a
> problem.

Extra care has to be taken to ensure correct kdump restart. iscsi service will not start at boot because of bug #831572, so you have to manually start it and mount the iscsi device to somewhere (don't use fstab, or boot fails), as required by kdumpctl. And touch /etc/kdump.conf otherwise kdumpctl won't mkdumprd. Until iscsi works at all, I don't think I can talk about multipath.

I only picked your patch from dracut. I'll try rawhide on it.

Comment 10 Lingzhu Xiang 2012-06-20 06:59:14 UTC

On that same system, iSCSI works fine with rawhide, kexec-tools-2.0.3-49.fc18.x86_64, dracut-019-40.git20120618.fc18.noarch, kernel-3.3.4-5.fc17.x86_64 (kdump hanged before the second kernel with kernel-3.5.0-0.rc3.git0.2.fc18.x86_64).

I'll try updating the involved components and re-test.

Comment 11 Lingzhu Xiang 2012-06-21 10:32:38 UTC

After quite some testing, I didn't manage to reproduce this bug. There was certain subtlety in setting up the iSCSI/multipath environment which I didn't test and document well. I guess it was caused by some multipath misconfiguration, as the device used wes actually multipath'ed. 

Yet later I encountered similar hang with blkid. I didn't flush multipath device maps before `service multipathd stop` (this doesn't flush those by design). Then further operations on multipath devices got blocked infinitely, and the kernel printed similar call traces.

Comment 14 Lingzhu Xiang 2012-09-27 02:53:07 UTC

No newer reproduction has been observed. Consider this fixed.

Note You need to log in before you can comment on or make changes to this bug.