Bug 1247478 - kdump operation with two iscsi paths is picking up only one path or non-functional at all with a full offload ISCSI card (Emulex OneConnect)
kdump operation with two iscsi paths is picking up only one path or non-funct...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dracut (Show other bugs)
7.3
x86_64 Linux
high Severity urgent
: rc
: ---
Assigned To: dracut-maint
Release Test Team
: OtherQA
Depends On:
Blocks: 1203710 1289485 1313485
  Show dependency treegraph
 
Reported: 2015-07-28 02:24 EDT by Daniel Kwon
Modified: 2016-11-04 03:58 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 03:58:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log captured after shutting down one of the client ports to disable one of the multipaths (72.55 KB, text/plain)
2015-09-30 08:41 EDT, loberman
no flags Details
Clean log with all paths up shows it works fine (61.59 KB, text/plain)
2015-09-30 08:53 EDT, loberman
no flags Details
Test tar file of latest test with all updates, still fails when we have one path down. (15.56 MB, application/x-tar)
2016-01-06 20:43 EST, loberman
no flags Details
Latest tar file of test with suggested patch that still fails (18.37 MB, application/x-tar)
2016-01-31 19:00 EST, loberman
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2530 normal SHIPPED_LIVE dracut bug fix and enhancement update 2016-11-03 10:17:01 EDT

  None (edit)
Comment 5 loberman 2015-08-11 11:04:37 EDT
Update:

I was able to reproduce the interface issue on a partial offload ISCSI lab configuration using Broadcom adapters.

I will be testing the patches Daniel Kwon wanted tested. 

I may create a new BZ for the full ISCSI offload issue with the Emulex OneConnect.
Comment 8 loberman 2015-08-11 15:30:24 EDT
Hello
The following workaround was tested in my lab and provided to the customer until we get this resolved.
The kexec kernel boots without multipath support and we don't block waiting for the second NIC to come up.

Procedure
---------
Remove multipath support from the kernel and boot into the O/S using:

dracut --force --omit multipath 
Reboot

Now run the following to re-create the kdump initramfs without multipath
touch /etc/kdump.conf
systemctl restart kdump.service

Add multipath back to the main kernel

Then run:
dracut --force --add multipath
Reboot

Then test kdump using
echo c > /proc/sysrq-trigger

This is working for me fine in the lab with my reproducer.

Many Thanks
Laurence Oberman
Comment 9 Harald Hoyer 2015-08-19 10:08:00 EDT
please recheck with dracut-033-314.el7
Comment 10 loberman 2015-08-24 14:05:03 EDT
Hello Harald,

Back from vacation, was away last week.
Testing now with the Dracut version recommended.

Thanks
Laurence
Comment 19 loberman 2015-09-18 09:36:26 EDT
Hello Harald

Please widen the browser so the log text does not wrap when you read, it was short enough so I did not attach a file.

So since I applied patches as you know, the issue in my test lab was resolved. I am using Broadcom interfaces on Gigabit however and the customer is using Emulex 10Gige.

I am going to see about reproducing using Emulex 10GigE.

They are still having issues with the latest patches when they have a path down prior to testing kdump

It seems they still stall waiting on an interface to come up and kdump will complete but takes > 30 minutes to timeout the down path until they continue

They are asking the following and I quote.

- Could you answer to the following question from Hitachi?
As you can see in the log, it retries to connect to
a NIC or a target many times if it failed.
So, to shorten the time, we have the following ideas.
What do you think?
 - Check if a NIC is available in advance before trials.
 - Minimize the number of retrials 

Logs

With a path down prior to testing

[Thu Sep 17 16:55:49.081 2015] [   37.668488] dracut-initqueue[402]: Warning: Could bring interface enp12s0f0 up!
[Thu Sep 17 16:55:49.107 2015] [   37.706113] dracut-initqueue[402]: RTNETLINK answers: Network is unreachable
[Thu Sep 17 16:55:51.137 2015] [   39.733190] dracut-initqueue[402]: Warning: Kernel command line option 'iscsi_firmware' is deprecated, use 'rd.iscsi.firmware' instead.
[Thu Sep 17 16:55:51.160 2015] [   39.749147] dracut-initqueue[402]: iscsistart: Can not bind IPC socket
[Thu Sep 17 16:55:51.161 2015] [   39.757152] dracut-initqueue[402]: iscsistart: Could not setup mgmt ipc
[Thu Sep 17 16:56:45.856 2015] [ TIME ] Timed out waiting for device dev-mapper-mpathb1.device.
[Thu Sep 17 16:56:45.859 2015] [DEPEND] Dependency failed for File System Check on /dev/mapper/mpathb1.
[Thu Sep 17 16:56:45.860 2015] [DEPEND] Dependency failed for /kdumproot/var/crash.
[Thu Sep 17 16:56:45.860 2015] [DEPEND] Dependency failed for Initrd Root File System.
[Thu Sep 17 16:56:45.860 2015] [DEPEND] Dependency failed for Reload Configuration from the Real Root.
[Thu Sep 17 16:56:45.861 2015] [  OK  ] Closed Open-iSCSI iscsiuio Socket.
[Thu Sep 17 16:56:45.861 2015] [  OK  ] Stopped Kdump Vmcore Save Service.
[Thu Sep 17 16:56:45.861 2015] [  OK  ] Stopped dracut pre-pivot and cleanup hook.


With both paths up

Thu Sep 17 16:42:26.566 2015] [   17.369064] dracut-initqueue[403]: RTNETLINK answers: Network is unreachable
[Thu Sep 17 16:42:28.637 2015] [   19.443422] dracut-initqueue[403]: RTNETLINK answers: Network is unreachable
[Thu Sep 17 16:42:28.999 2015] [   20.928462] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5005076049002764), phys(8)
[Thu Sep 17 16:42:30.666 2015] [   21.468562] dracut-initqueue[403]: Warning: Kernel command line option 'iscsi_firmware' is deprecated, use 'rd.iscsi.firmware' instead.
[Thu Sep 17 16:42:30.680 2015] [   21.484849] dracut-initqueue[403]: iscsistart: Can not bind IPC socket
[Thu Sep 17 16:42:30.690 2015] [   21.493144] dracut-initqueue[403]: iscsistart: Could not setup mgmt ipc
[Thu Sep 17 16:42:30.796 2015] [   22.733985] scsi: waiting for bus probes to complete ...
[Thu Sep 17 16:42:34.132 2015] [   26.053071] mpt2sas0: port enable: SUCCESS
[Thu Sep 17 16:42:34.133 2015] [   26.059700] scsi 0:0:0:0: Direct-Access     HITACHI  DF600F           0000 PQ: 0 ANSI: 4
[Thu Sep 17 16:42:34.151 2015] [   26.073454] sd 0:0:0:0: [sdc] 67108864 512-byte logical blocks: (34.3 GB/32.0 GiB)
[Thu Sep 17 16:42:34.164 2015] [   26.083479] sd 0:0:0:0: [sdc] Write Protect is off
[Thu Sep 17 16:42:34.165 2015] [   26.091085] sd 0:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[Thu Sep 17 16:42:34.181 2015] [   26.106688]  sdc: sdc1 sdc2
[Thu Sep 17 16:42:34.190 2015] [   26.115375] sd 0:0:0:0: [sdc] Attached SCSI disk
[Thu Sep 17 16:42:34.280 2015] [   26.201740] device-mapper: multipath service-time: version 0.2.0 loaded
[Thu Sep 17 16:42:34.310 2015] [   26.232214] scsi 1:0:1:0: Direct-Access     HITACHI  DF600F           0000 PQ: 0 ANSI: 4
[Thu Sep 17 16:42:34.337 2015] [   26.263725] sd 1:0:1:0: [sdd] 67108864 512-byte logical blocks: (34.3 GB/32.0 GiB)
[Thu Sep 17 16:42:34.357 2015] [   26.289596] sd 1:0:1:0: [sdd] Write Protect is off
[Thu Sep 17 16:42:34.378 2015] [   26.303412] sd 1:0:1:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[Thu Sep 17 16:42:34.398 2015] [  OK  ] Found device /dev/disk/by-uuid/ce1e4631-8dcc-4575-b77a-6cefef404b3c.
[Thu Sep 17 16:42:34.412 2015] [   26.333814]  sdd: sdd1 sdd2
[Thu Sep 17 16:42:34.650 2015] [   26.344387] sd 1:0:1:0: [sdd] Attached SCSI disk
[Thu Sep 17 16:42:34.651 2015]          Starting Login iSCSI Target iqn.2016-07.local.faines:dt.db...
[Thu Sep 17 16:42:34.690 2015]          Starting Login iSCSI Target iqn.2016-07.local.faines:dt.db...
[Thu Sep 17 16:42:34.768 2015]          Starting Login iSCSI Target iqn.2016-07.local.faines:os.db01...
[Thu Sep 17 16:42:34.783 2015] [   26.706383] scsi host4: iSCSI Initiator over TCP/IP
[Thu Sep 17 16:42:34.791 2015] [   26.717436] scsi host5: iSCSI Initiator over TCP/IP
[Thu Sep 17 16:42:34.808 2015] [   26.738310] scsi 5:0:0:0: Direct-Access     HITACHI  DF600F           0000 PQ: 0 ANSI: 4
[Thu Sep 17 16:42:34.822 2015] [   26.748296] scsi 4:0:0:0: Direct-Access     HITACHI  DF600F           0000 PQ: 0 ANSI: 4
Thu Sep 17 16:42:34.839 2015] [FAILED   26.763067] sd 4:0:0:0: [sdf] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Thu Sep 17 16:42:34.852 2015] 0m] Failed to st[   26.773710] sd 5:0:0:0: [sde] 67108864 512-byte logical blocks: (34.3 GB/32.0 GiB)
[Thu Sep 17 16:42:34.862 2015] art Login iSCSI Target iqn.2016-07.local.faines:dt.db.
[Thu Sep 17 16:42:34.863 2015] See 'systemctl status "iscsistar[   26.793013] sd 4:0:0:0: [sdf] Write Protect is off
[Thu Sep 17 16:42:34.873 2015] t_\\x40...l.fain[   26.799758] sd 5:0:0:0: [sde] Write Protect is off
[Thu Sep 17 16:42:34.884 2015] es:dt.db.service"' for details.
[Thu Sep 17 16:42:34.893 2015] [  OK  [   26.812690] sd 5:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[Thu Sep 17 16:42:34.903 2015] ] Started Login [   26.824164] sd 4:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[Thu Sep 17 16:42:34.914 2015] iSCSI Target iqn.2016-07.local.faines:os.db01.
[Thu Sep 17 16:42:34.937 2015] [   26.867854]  sde: sde1 sde2
[Thu Sep 17 16:42:34.961 2015] [   26.884884]  sdf: sdf1
[Thu Sep 17 16:42:34.971 2015] [   26.894188] sd 4:0:0:0: [sdf] Attached SCSI disk
[Thu Sep 17 16:42:34.987 2015] [   26.915698] sd 5:0:0:0: [sde] Attached SCSI disk
[Thu Sep 17 16:42:35.007 2015]          Starting Login iSCSI Target iqn.2016-07.local.faines:os.db01...
[Thu Sep 17 16:42:35.111 2015] [  OK  ] Found device /dev/mapper/mpathb1.
[Thu Sep 17 16:42:35.112 2015] [  OK  ] Found device /dev/disk/by-uuid/27094b31-9a8a-4a34-9866-42559d3911fa.
[Thu Sep 17 16:42:35.150 2015] [   25.948732] dracut-initqueue[403]: Warning: Kernel command line option 'iscsi_firmware' is deprecated, use 'rd.iscsi.firmware' instead.
[Thu Sep 17 16:42:35.177 2015] [   25.984767] dracut-initqueue[403]: Scanning devices dm-2  for LVM logical volumes rhel/root rhel/swap rhel/root rhel/swap
[Thu Sep 17 16:42:35.202 2015] [   25.999163] dracut-initqueue[403]: inactive '/dev/rhel/root' [28.27 GiB] inherit
[Thu Sep 17 16:42:35.212 2015] [   26.008156] dracut-initqueue[403]: inactive '/dev/rhel/swap' [3.20 GiB] inherit
[Thu Sep 17 16:42:35.371 2015] [  OK  ] Found device /dev/mapper/rhel-root.
[Thu Sep 17 16:42:35.371 2015]          Starting File System Check on /dev/mapper/rhel-root...
[Thu Sep 17 16:42:35.397 2015] [  OK  ] Started File System Check on /dev/mapper/rhel-root.
[Thu Sep 17 16:42:35.410 2015]          Starting File System Check on /dev/mapper/mpathb1...
[Thu Sep 17 16:42:35.427 2015] [   26.231901] systemd-fsck[1184]: /dev/mapper/mpathb1: recovering journal
[Thu Sep 17 16:42:35.491 2015] [   26.287858] systemd-fsck[1184]: /dev/mapper/mpathb1: clean, 10/610800 files, 80338/2441406 blocks
[Thu Sep 17 16:42:35.501 2015] [  OK  ] Started File System Check on /dev/mapper/mpathb1.
[Thu Sep 17 16:42:35.587 2015] [  OK  ] Started dracut initqueue hook.
[Thu Sep 17 16:42:35.601 2015]          Mounting /sysroot...
[Thu Sep 17 16:42:35.601 2015] [  OK  ] Reached target Remote File Systems (Pre).
[Thu Sep 17 16:42:35.612 2015]          Mounting /kdumproot/var/crash...
[Thu Sep 17 16:42:35.621 2015] [  OK  ] Reached target Remote File Systems.
[Thu Sep 17 16:42:35.637 2015] [   27.565529] SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
[Thu Sep 17 16:42:35.651 2015] [   27.576816] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
[Thu Sep 17 16:42:35.672 2015] [  OK  [   27.588170] XFS (dm-5): Mounting V4 Filesystem
[Thu Sep 17 16:42:35.678 2015] ] Mounted /kdumproot/var/crash.
[Thu Sep 17 16:42:35.707 2015] [   27.634044] XFS (dm-5): Starting recovery (logdev: internal)
[Thu Sep 17 16:42:35.860 2015] [   27.783269] scsi host6: iSCSI Initiator over TCP/IP
[Thu Sep 17 16:42:35.880 2015] [   27.801813] scsi 6:0:0:0: Direct-Access     HITACHI  DF600F           0000 PQ: 0 ANSI: 4
[Thu Sep 17 16:42:35.890 2015] [  OK  [   27.815857] sd 6:0:0:0: [sdg] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Thu Sep 17 16:42:35.901 2015] ] Started Login [   27.827933] sd 6:0:0:0: [sdg] Write Protect is off
[Thu Sep 17 16:42:35.912 2015] iSCSI Target iqn.2016-07.local.f[   27.835613] sd 6:0:0:0: [sdg] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[Thu Sep 17 16:42:35.922 2015] aines:dt.db.
[Thu Sep 17 16:42:35.950 2015] [   27.870336]  sdg: sdg1
[Thu Sep 17 16:42:35.951 2015] [   27.877984] sd 6:0:0:0: [sdg] Attached SCSI disk
[Thu Sep 17 16:42:36.047 2015] [   27.976144] XFS (dm-5): Ending recovery (logdev: internal)
[Thu Sep 17 16:42:36.061 2015] [  OK  ] Mounted /sysroot.
[Thu Sep 17 16:42:36.062 2015] [  OK  ] Reached target Initrd Root File System.
[Thu Sep 17 16:42:36.072 2015]          Starting Reload Configuration from the Real Root...
[Thu Sep 17 16:42:36.137 2015] [  OK  ] Started Reload Configuration from the Real Root.
[Thu Sep 17 16:42:36.151 2015] [  OK  ] Reached target Initrd File Systems.
[Thu Sep 17 16:42:36.151 2015] [  OK  ] Reached target Initrd Default Target.
[Thu Sep 17 16:42:36.163 2015]          Starting dracut pre-pivot and cleanup hook...
[Thu Sep 17 16:42:36.298 2015] [  OK  ] Started dracut pre-pivot and cleanup hook.
[Thu Sep 17 16:42:36.311 2015] [  OK  ] Stopped udev Kernel Device Manager.
[Thu Sep 17 16:42:36.321 2015]          Starting udev Kernel Device Manager...
..
..
Thu Sep 17 16:42:36.311 2015] [  OK  ] Stopped udev Kernel Device Manager.
[Thu Sep 17 16:42:36.321 2015]          Starting udev Kernel Device Manager...
[Thu Sep 17 16:42:36.331 2015]          Starting Kdump Vmcore Save Service...
[Thu Sep 17 16:42:36.331 2015] [  OK  ] Started udev Kernel Device Manager.
[Thu Sep 17 16:42:36.370 2015] kdump: dump target is /dev/mappe[   28.292525] EXT4-fs (dm-4): re-mounted. Opts: (null)
[Thu Sep 17 16:42:36.380 2015] r/mpathb1
[Thu Sep 17 16:42:36.380 2015] kdump: saving to /kdumproot/var/crash//127.0.0.1-2015.09.17-16:41:21/
[Thu Sep 17 16:42:36.381 2015] kdump: saving vmcore-dmesg.txt
[Thu Sep 17 16:42:36.391 2015] kdump: saving vmcore-dmesg.txt complete
[Thu Sep 17 16:42:36.392 2015] kdump: saving vmcore
[Thu Sep 17 16:42:36.401 2015] Excluding unnecessary pages        : [  0.0 %] /Excluding unnecessary pages        : [100.0 %] |Excluding unnecessary pages        : [100.0 %] \Copying data                       : [ 43.9 %] -Copying data                       : [ 93.8 %] /Copying data                       : [100.0 %] |Copying data                       : [100.0 %] \
[Thu Sep 17 16:42:38.481 2015] kdump: saving vmcore complete
[Thu Sep 17 16:42:38.482 2015] [  OK  ] Stopped Login iSCSI Target iqn.2016-07.local.faines:os.db01.
[Thu Sep 17 16:42:38.491 2015]          Stopping Login iSCSI Target iqn.2016-07.local.faines:os.db01...
[Thu Sep 17 16:42:38.501 2015] [  OK  ] Stopped target Timers.
[Thu Sep 17 16:42:38.502 2015] [  OK  ] Removed slice system-systemd\x2dfsck.slice.
[Thu Sep 17 16:42:38.512 2015]          Stopping Device-Mapper Multipath Device Controller..[   30.436382] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Comment 21 loberman 2015-09-30 08:41 EDT
Created attachment 1078635 [details]
Log captured after shutting down one of the client ports to disable one of the multipaths

Hello Harald

I configured a managed switch and re-ran my tests by shutting down the second Gigabit interface on the client and I have reproduced this now.

My prior testing was performed by disabling one of the ISCSI server ports and I was unable to reproduce the issue after applying the following.

device-mapper-multipath-0.4.9-82.el7.x86_64.rpm
device-mapper-multipath-libs-0.4.9-82.el7.x86_64.rpm
device-mapper-multipath-sysvinit-0.4.9-82.el7.x86_64.rpm
dracut-033-328.el7.x86_64.rpm
dracut-config-rescue-033-328.el7.x86_64.rpm
dracut-network-033-328.el7.x86_64.rpm
kmod-20-5.el7.x86_64.rpm
kmod-libs-20-5.el7.x86_64.rpm
kpartx-0.4.9-82.el7.x86_64.rpm
libgudev1-219-11.el7.x86_64.rpm
libgudev1-devel-219-11.el7.x86_64.rpm
systemd-219-11.el7.x86_64.rpm
systemd-libs-219-11.el7.x86_64.rpm
systemd-python-219-11.el7.x86_64.rpm
systemd-sysv-219-11.el7.x86_64.rpm

However now taking down one of the client ports I get into problems.

Please review the log attached.

Thanks
Laurence
Comment 22 loberman 2015-09-30 08:53 EDT
Created attachment 1078637 [details]
Clean log with all paths up shows it works fine

Hello Harald,

Here is a clean log for comparison with no disabled ports showing we function fine.
Thanks
Laurence
Comment 23 Harald Hoyer 2015-09-30 12:41:49 EDT
(In reply to loberman from comment #21)
> Created attachment 1078635 [details]
> Log captured after shutting down one of the client ports to disable one of
> the multipaths

I see on the kernel cmdline: "iscsi_initiator=iqn.1994-05.com.redhat:rhel7"


[  154.829676] dracut-initqueue[401]: /sbin/iscsiroot: line 120: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  154.832249] dracut-initqueue[401]: /sbin/iscsiroot: line 131: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  154.835259] dracut-initqueue[401]: /sbin/iscsiroot: line 138: [: iqn.1994-05.com.redhat:rhel7: binary operator expected


Alright... that is dracut-033-328.el7

we are now @ dracut-033-353.el7 with a lot of changes in the iSCSI department.

Care to retest with the new version, before I dig into debugging mode for the old one?
Comment 24 loberman 2015-09-30 12:47:32 EDT
Will do.
Standby, back tomorrow with results.

Thanks!!
Laurence
Comment 37 Harald Hoyer 2015-10-14 11:16:28 EDT
So, in the initramfs, there is a file dracut-neednet.conf, which specifies:

# cat /etc/cmdline.d/dracut-neednet.conf
rd.neednet=1

So with "rd.neednet=1" dracut explicitly wants all network interfaces to come up and thus bails out, if not all found. If you remove that, it might work.
Comment 38 loberman 2015-10-14 11:26:21 EDT
Trying that now
Thanks!!!
Laurence
Comment 39 loberman 2015-10-14 12:43:46 EDT
Harald, Awesome, seems to work.

Adding rd.neednet=0 to grub line, I can now boot with one interface down.
I will etst kdump, but it will work as its the same issue.

This should be the default please from now on :)

Thanks
Laurence
Comment 40 loberman 2015-10-14 12:49:32 EDT
Just tested kdump and this now works as it should


Thanks
Laurence
Comment 41 Harald Hoyer 2015-10-15 07:15:39 EDT
(In reply to loberman from comment #39)
> This should be the default please from now on :)

It is... But I think kdump added it to be sure to have network to mount nfs or similar.
Comment 44 loberman 2015-10-22 11:09:17 EDT
Hello Harald

My lab tests are passing now with rd.neednet=0, but I have a partial offload ISCSI configuration different to the customer.

The customer is still having issues with Emulex Oneconnect full offload card timing out in kdump.

Remember that a kdump with both interfaces up works fine for them here.

Can you have a look at this log please.

dracut-cmdline[80]: Warning: Please supply bootdev argument for multiple ip= lines
dracut-cmdline[80]: Warning: Setting bootdev to 'enp12s0f0'                           ********* Note
[  OK  ] Started dracut cmdline hook.
         Starting dracut pre-udev hook...
[  OK  ] Started dracut pre-udev hook.
         Starting udev Kernel Device Manager...
[  OK  [   15.315879] systemd-udevd[351]: starting version 208
] Started udev Kernel Device Manager.
         Starting udev Coldplug all Devices...
[   15.336474] usb-storage 2-1.1.3:1.0: USB Mass Storage device detected
         Mounting Configuration File System...
[  OK  ] Mounted Configuration File System.
[   15.380252] scsi host2: usb-storage 2-1.1.3:1.0
[  OK  ] Started udev C[   15.388552] usbcore: registered new interface driver usb-storage
oldplug all Devices.
         Starting dracut initqueue hook...
[  OK  ] Reached target[   15.407021] be2net 0000:0c:00.0: be2net version is 10.4r
 System Initialization.
[  OK  ] Reached target Basic System.
[   15.430803] be2net 0000:0c:00.0: PCIe error reporting enabled
[   15.613942] be2net 0000:0c:00.0: FW config: function_mode=0x10003, function_caps=0x7
[   15.633924] be2net 0000:0c:00.0: Max: txqs 16, rxqs 17, rss 16, eqs 16, vfs 0
[   15.641882] be2net 0000:0c:00.0: Max: uc-macs 30, mc-macs 64, vlans 64
[   15.649221] be2net 0000:0c:00.0: enabled 1 MSI-x vector(s) for NIC
[   15.671885] be2net 0000:0c:00.0: created 1 TX queue(s)
[   15.680877] be2net 0000:0c:00.0: created 0 RSS queue(s) and 1 default RX queue
[   15.701962] be2net 0000:0c:00.0: FW version is 10.2.340.10
[   15.717328] be2net 0000:0c:00.0: Emulex OneConnect(be3): PF  port 0
[   15.724394] be2net 0000:0c:00.1: be2net version is 10.4r
[   15.731068] be2net 0000:0c:00.1: PCIe error reporting enabled
[   15.903658] be2net 0000:0c:00.1: FW config: function_mode=0x10003, function_caps=0x7
[   15.923640] be2net 0000:0c:00.1: Max: txqs 16, rxqs 17, rss 16, eqs 16, vfs 0
[   15.931599] be2net 0000:0c:00.1: Max: uc-macs 30, mc-macs 64, vlans 64
[   15.938933] be2net 0000:0c:00.1: enabled 1 MSI-x vector(s) for NIC
[   15.961602] be2net 0000:0c:00.1: created 1 TX queue(s)
[   15.970594] be2net 0000:0c:00.1: created 0 RSS queue(s) and 1 default RX queue
[   15.991578] be2net 0000:0c:00.1: FW version is 10.2.340.10
[   16.007022] be2net 0000:0c:00.1: Emulex OneConnect(be3): PF  port 1
..
..

Hitachi, not HP, but using Emulex OneConnect full-offload card

[   17.543152] Hardware name: HITACHI BladeSymphony 520HB2/520H B2         , BIOS EP1066 01/22/2015
[   17.552970]  ffff88003049f200 00000000bd236838 ffff880034c03e40 ffffffff81603f36
[   17.561278]  ffff880034c03e68 ffffffff8110d852 ffff88003049f200 0000000000000010
[   17.569585]  0000000000000000 ffff880034c03ea8 ffffffff8110dc72 00000000bd236838
[   17.577890] Call Trace:
[   17.580620]  <IRQ>  [<ffffffff81603f36>] dump_stack+0x19/0x1b
[   17.587067]  [<ffffffff8110d852>] __report_bad_irq+0x32/0xd0
[   17.593391]  [<ffffffff8110dc72>] note_interrupt+0x132/0x1f0
[   17.599714]  [<ffffffff8110b381>] handle_irq_event_percpu+0xe1/0x1e0
[   17.606812]  [<ffffffff8110b4bd>] handle_irq_event+0x3d/0x60
[   17.613135]  [<ffffffff8110e92a>] handle_fasteoi_irq+0x5a/0x100
[   17.619752]  [<ffffffff81015c9f>] handle_irq+0xbf/0x150
[   17.625589]  [<ffffffff81077d27>] ? irq_enter+0x17/0xa0
[   17.632139]  [<ffffffff8161626f>] do_IRQ+0x4f/0xf0
[   17.638192]  [<ffffffff8160b4ed>] common_interrupt+0x6d/0x6d
[   17.645196]  <EOI>
[   17.647343] handlers:
[   17.651449] [<ffffffff81414190>] usb_hcd_irq
[   17.656897] Disabling IRQ #16
[   17.860932] systemd-udevd[352]: renamed network interface eth0 to enp12s0f0
[   17.883120] systemd-udevd[352]: renamed network interface eth1 to enp12s0f1
[   17.906160] systemd-udevd[352]: renamed network interface eth2 to enp22s0f0
[   17.916896] systemd-udevd[352]: renamed network interface eth3 to enp22s0f1
[   17.944971] systemd-udevd[352]: renamed network interface eth4 to enp22s0f2
[   17.955499] systemd-udevd[352]: renamed network interface eth5 to enp22s0f3
[   17.995041] IPv6: ADDRCONF(NETDEV_UP): enp12s0f0: link is not ready
..
dracut-initqueue[363]: iscsistart: can not connect to iSCSI daemon (111)!                *********
dracut-initqueue[363]: iscsistart: version 6.2.0.873-28
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host) *** And timeout because of these
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[363]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
[ TIME ] Timed out waiting for device dev-mapper-mpathb1.device.                         **** Reason why they fail to complete kdump
[DEPEND] Dependency failed for /kdumproot/var/crash.
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
Comment 45 loberman 2015-11-09 08:18:50 EST
Hello Harald

This has become very urgent.
Please see the latest needinfo and customers comments below.

We need to get this resolved.

Could you tell me Red Hat's opinion about the following Hitachi's thoughts?

---
0. The 2nd OS for kdump starts.

1. Dracut sets the IP address of the NIC enp12s0f0 on the server (on the down path) and it links up.

2. The iscsistart command, called by dracut, tries to connect the NIC enp12s0f0(192.168.4.100) on the server to the target NIC on the storage (192.168.4.2). However, it fails because the path is physically disconnected.

3. While dracut retries to call the iscsistart command over and over, the systemd unit for detecting disks ("dev-mapper-mpathb1.device") times out. (The time out is a systemd's default behavior).

4. Because the detection of the dump disk fails, the unit ("kdump-capture.service") for kdump is canceled. (The units dependencey is not met.)
---

--- LOG ---
dracut-initqueue[359]: /usr/sbin/iscsiroot@169(handle_netroot): iscsistart -i iqn.2016-07.local.intdb01 -t iqn.2016-07.local.faines:dt.db -g 1 -a 192.168.4.2 -p 3260
dracut-initqueue[359]: iscsistart: can not connect to iSCSI daemon (111)!
dracut-initqueue[359]: iscsistart: version 6.2.0.873-28
dracut-initqueue[359]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[359]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[359]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[359]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[359]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
dracut-initqueue[359]: iscsistart: connect to 192.168.4.2:3260 failed (No route to host)
Job dev-mapper-mpathb1.device/start timed out.
Job dev-mapper-mpathb1.device/start finished, result=timeout
[ TIME ] Timed out waiting for device dev-mapper-mpathb1.device.   <----Look!
Job kdumproot-var-crash.mount/start finished, result=dependency
[DEPEND] Dependency failed for /kdumproot/var/crash.               <----Look!
Job initrd-root-fs.target/start finished, result=dependency
[DEPEND] Dependency failed for Initrd Root File System.
Job initrd-parse-etc.service/start finished, result=dependency
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
...
Job kdump-capture.service/stop finished, result=canceled           <----Look!
Job dracut-pre-pivot.service/stop finished, result=canceled
Comment 54 loberman 2015-11-12 22:29:20 EST
Hello Folks,

In the /sbin/ifup, this is the function we need to modify.
I am looking at doing this myself tonight.
My question is, If we were to ignore this and not return 1 would this bypass and not wait.

# Handle static ip configuration
do_static() {
    strstr $ip '*:*:*' && load_ipv6

    if ! linkup $netif; then
        warn "Could not bring interface $netif up!"
        return 1        ***** What happens if we dont return 1 here 
    fi

    ip route get "$ip" | {
        read a rest
        if [ "$a" = "local" ]; then
            warn "Not assigning $ip to interface $netif, cause it is already assigned!"
            return 1
        fi
        return 0
    } || return 1

    [ -n "$macaddr" ] && ip link set address $macaddr dev $netif
    [ -n "$mtu" ] && ip link set mtu $mtu dev $netif
    if strstr $ip '*:*:*'; then
        # note no ip addr flush for ipv6
        ip addr add $ip/$mask ${srv:+peer $srv} dev $netif
        wait_for_ipv6_dad $netif
    else
        if ! arping -f -q -D -c 2 -I $netif $ip; then
            warn "Duplicate address detected for $ip for interface $netif."
            return 1
        fi
        ip addr flush dev $netif
        ip addr add $ip/$mask ${srv:+peer $srv} brd + dev $netif
    fi

    [ -n "$gw" ] && echo ip route replace default via $gw dev $netif > /tmp/net.$netif.gw
    [ -n "$hostname" ] && echo "echo $hostname > /proc/sys/kernel/hostname" > /tmp/net.$netif.hostname

    return 0
}
Comment 55 loberman 2015-11-12 22:42:28 EST
Harald,

Here we go

[   68.235196] dracut-initqueue[423]: + case $autoopt in
[   68.237199] dracut-initqueue[423]: + do_static
[   68.239196] dracut-initqueue[423]: + strstr 10.0.5.19 '*:*:*'
[   68.241197] dracut-initqueue[423]: + '[' 10.0.5.19 '!=' 10.0.5.19 ']'
[   68.243197] dracut-initqueue[423]: + linkup enp3s0f1
[   68.245197] dracut-initqueue[423]: + wait_for_if_link enp3s0f1
[   68.247196] dracut-initqueue[423]: + ip link set enp3s0f1 up
[   68.349195] dracut-initqueue[423]: + wait_for_if_up enp3s0f1
[   68.351197] dracut-initqueue[423]: + warn 'Could not bring interface enp3s0f1 up!'
[   68.353198] dracut-initqueue[423]: + echo 'Warning: Could not bring interface enp3s0f1 up!'
[   68.355200] dracut-initqueue[423]: Warning: Could not bring interface enp3s0f1 up!
[   68.357197] dracut-initqueue[423]: + return 1
[   68.359196] dracut-initqueue[423]: + ret=1
[   68.361196] dracut-initqueue[423]: + '[' -e /sys/class/net/enp3s0f1/address ']'
[   68.363204] dracut-initqueue[423]: ++ cat /sys/class/net/enp3s0f1/address
[   68.365204] dracut-initqueue[423]: + case $autoconf in
[   68.367196] dracut-initqueue[423]: + '[' 1 -eq 0 ']'
[   68.369196] dracut-initqueue[423]: + exit 0   

I continue here ********, customer Hangs
We want to enforce  Warning: dracut-initqueue timeout - starting timeout scripts if the link is down even in the customers case 


[  183.898853] dracut-initqueue[423]: Warning: dracut-initqueue timeout - starting timeout scripts
[  184.032386] dracut-initqueue[423]: /sbin/iscsiroot: line 114: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  184.033601] dracut-initqueue[423]: /sbin/iscsiroot: line 125: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  184.036200] dracut-initqueue[423]: /sbin/iscsiroot: line 132: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
         Starting Login iSCSI Target iqn.2003-01.org.linux-iscsi.localhost...
[  184.937589] scsi host7: iSCSI Initiator over TCP/IP
[  184.945236] scsi 7:0:0:0: Direct-Access     LIO-ORG  iscsiboot7       4.0  PQ: 0 ANSI: 5
[  184.950258] scsi 7:0:0:0: alua: supports implicit and explicit TPGS
[  184.954062] scsi 7:0:0:0: alua: port group 00 rel port 01
[  184.957234] scsi 7:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
[  184.961087] scsi 7:0:0:0: alua: Attached
[  184.964872] scsi 7:0:0:1: Direct-Access     LIO-ORG  iscsiboot5       4.0  PQ: 0 ANSI: 5
[  184.970058] scsi 7:0:0:1: alua: supports implicit and explicit TPGS
[  184.974580] scsi 7:0:0:1: alua: port group 00 rel port 01
[  184.977909] scsi 7:0:0:1: alua: port group 00 state A non-preferred supports TOlUSNA
[  184.981881] scsi 7:0:0:1: alua: Attached
[  OK  ] Started Login iSCSI Target iqn.2003-01.org.linux-iscsi.localhost.
[  184.988730] sd 7:0:0:0: [sda] 125829120 512-byte logical blocks: (64.4 GB/60.0 GiB)
[  184.992521] sd 7:0:0:0: [sda] 4096-byte physical blocks
[  184.996818] sd 7:0:0:1: [sdb] 125829120 512-byte logical blocks: (64.4 GB/60.0 GiB)
[  185.000772] sd 7:0:0:1: [sdb] 4096-byte physical blocks
[  185.010193] sd 7:0:0:0: [sda] Write Protect is off
[  185.014526] sd 7:0:0:1: [sdb] Write Protect is off
[  185.020683] sd 7:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  185.025260] sd 7:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  185.037338]  sda: sda1 sda2
[  185.039303]  sdb: sdb1
[  185.043145] sd 7:0:0:0: [sda] Attached SCSI disk
[  185.045856] sd 7:0:0:1: [sdb] Attached SCSI disk
[  184.250296] dracut-initqueue[423]: /sbin/iscsiroot: line 114: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  184.253238] dracut-initqueue[423]: /sbin/iscsiroot: line 125: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  184.256203] dracut-initqueue[423]: /sbin/iscsiroot: line 132: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
         Starting Login iSCSI Target iqn.2003-01.org.linux-iscsi.localhost...
[  184.369922] dracut-initqueue[423]: /sbin/iscsiroot: line 114: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  184.372203] dracut-initqueue[423]: /sbin/iscsiroot: line 125: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
[  184.374658] dracut-initqueue[423]: /sbin/iscsiroot: line 132: [: iqn.1994-05.com.redhat:rhel7: binary operator expected
         Starting Login iSCSI Target iqn.2003-01.org.linux-iscsi.localhost...
[  186.371679] device-mapper: multipath round-robin: version 1.0.0 loaded
[  OK  ] Found device /dev/disk/by-uuid/06f46f58-d05f-409d-824d-ef57f67f1fce.
[  OK  ] Found device /dev/mapper/rhel-home.
[  OK  ] Found device /dev/mapper/rhel-root.
         Starting File System Check on /dev/mapper/rhel-root...
[  OK  ] Started File System Check on /dev/mapper/rhel-root.
[  OK  ] Found device /dev/mapper/vmcore-dump.
         Starting File System Check on /dev/mapper/vmcore-dump...
[  186.270207] systemd-fsck[2976]: /sbin/fsck.xfs: XFS file system.
[  OK  ] Started File System Check on /dev/mapper/vmcore-dump.
         Mounting /kdumproot/vmcore...
[  187.183881] SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
[  187.196319] XFS (dm-8): Mounting V4 Filesystem
[  OK  ] Started dracut initqueue hook.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Mounting /sysroot...
[  187.305670] XFS (dm-8): Starting recovery (logdev: internal)
[  187.421060] XFS (dm-7): Mounting V4 Filesystem
[  187.566178] XFS (dm-7): Starting recovery (logdev: internal)
[  187.866982] XFS (dm-8): Ending recovery (logdev: internal)
[  OK  ] Mounted /kdumproot/vmcore.
[  188.190204] XFS (dm-7): Ending recovery (logdev: internal)
[  OK  ] Mounted /sysroot.
[  OK  ] Reached target Initrd Root File System.
         Starting Reload Configuration from the Real Root...
         Mounting /sysroot/home...
         Mounting /sysroot/boot...
[  OK  ] Started Reload Configuration from the Real Root.
[  188.270607] XFS (dm-5): Mounting V4 Filesystem
[  188.296599] XFS (dm-2): Mounting V4 Filesystem
[  188.510086] XFS (dm-2): Starting recovery (logdev: internal)
[  188.535417] XFS (dm-5): Ending clean mount
[  OK  ] Mounted /sysroot/home.
[  188.618316] XFS (dm-2): Ending recovery (logdev: internal)
[  OK  ] Mounted /sysroot/boot.
[  OK  ] Reached target Initrd File Systems.
[  OK  ] Reached target Initrd Default Target.
         Starting dracut pre-pivot and cleanup hook...
[  OK  ] Started dracut pre-pivot and cleanup hook.
[  OK  ] Stopped udev Kernel Device Manager.
         Starting udev Kernel Device Manager...
         Starting Kdump Vmcore Save Service...
[  OK  ] Started udev Kernel Device Manager.
kdump: dump target is /dev/mapper/vmcore-dump
kdump: saving to /kdumproot/vmcore//var/crash/127.0.0.1-2015.11.12-22:37:36/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data                       : [ 52.1 %] |^[
Comment 57 loberman 2015-11-13 00:06:16 EST
I tested this patch in my lab wnd it short circuits thd stall I am seeing

diff -Nurp a/40network/ifup.sh b/40network/ifup.sh
--- a/40network/ifup.sh	2015-11-12 22:47:23.606240594 -0500
+++ b/40network/ifup.sh	2015-11-12 22:49:12.815227918 -0500
@@ -158,14 +158,14 @@ do_static() {
 
     if ! linkup $netif; then
         warn "Could not bring interface $netif up!"
-        return 1
+        #return 1
     fi
 
     ip route get "$ip" | {
         read a rest
         if [ "$a" = "local" ]; then
             warn "Not assigning $ip to interface $netif, cause it is already assigned!"
-            return 1
+            #return 1
         fi
         return 0
     } || return 1

I asked Hitachi to tty it.
Comment 58 Harald Hoyer 2015-11-13 04:21:05 EST
one more option to try: "rd.iscsi.waitnet=0"
Comment 59 loberman 2015-11-13 07:58:14 EST
Harald

Wow!! , we may have it resolved.
Lets wait and see.

That option seems to do what my patch was doing and just keep going past the interface that is not online.

Fully tested in my lab and passed on to Hitachi.

On another note, we need to document these parameters in our storage guide.
I will work to get that done.

Thanks!!

Laurence
Comment 60 loberman 2015-11-19 08:36:09 EST
Harald, Chris (Leech)

Sadly customer is still seeing the timeout on the mpath device.

Here is his message, please try and get me an update ASAP as thay want to test whatever we recommend tomorrow.
This is holding them up from moving to RHEL7 now.

Thanks
Laurence

Message as it was sent to me from Hitachi

"
Most recent comment: On 2015-11-19 21:57:34, Nakagawa, Masaharu commented:
"Laurence-san,

I'm sorry to have kept you waiting.
Please let me share some results of Hitachi's tests with the options "rd.iscsi.waitnet=0" and "rd.neednet=0".

Even with the two options, the systemd unit for detecting disks ("dev-mapper-mpathb1.device") timed out as the following logs.

-----
[Thu Nov 19 19:06:26.737 2015] [    4.070887] systemd[1]: Expecting device dev-mapper-mpathb1.device...
[Thu Nov 19 19:06:26.747 2015]          Expecting device dev-mapper-mpathb1.device...
...
[Thu Nov 19 19:06:41.026 2015] dracut-initqueue[361]: iscsistart: can not connect to iSCSI daemon (111)!
[Thu Nov 19 19:06:41.036 2015] dracut-initqueue[361]: iscsistart: version 6.2.0.873-28
...
[Thu Nov 19 19:06:42.022 2015] dracut-initqueue[361]: iscsistart: cannot make a connection to 192.168.1.2:3260 (-1,101)
...
[Thu Nov 19 19:07:54.599 2015] dracut-initqueue[361]: iscsistart: cannot make a connection to 192.168.1.2:3260 (-1,101)
[Thu Nov 19 19:07:56.660 2015] [ TIME ] Timed out waiting for device dev-mapper-mpathb1.device.
[Thu Nov 19 19:07:56.673 2015] [DEPEND] Dependency failed for /kdumproot/var/crash.
[Thu Nov 19 19:07:56.682 2015] [DEPEND] Dependency failed for Initrd Root File System.
[Thu Nov 19 19:07:56.683 2015] [DEPEND] Dependency failed for Reload Configuration from the Real Root.
-----

Here is the debug logs we got with the option rd_debug.
-----
[Thu Nov 19 20:06:14.689 2015] dracut-cmdline[78]: //lib/net-lib.sh@380(ip_to_var): set -- 192.168.4.100 '' '' 255.255.255.0 '' enp12s0f0 none
...
[Thu Nov 19 20:06:25.928 2015] dracut-initqueue[359]: /sbin/netroot@79(): /usr/sbin/iscsiroot enp12s0f0 iscsi:@192.168.1.2::::iqn.2016-07.local.fainest.db /sysroot
...
[Thu Nov 19 20:06:27.358 2015] dracut-initqueue[359]: /usr/sbin/iscsiroot@169(handle_netroot): iscsistart -i iqn.2016-07.local.intdb01 -t iqn.2016-07.local.fainest.db -g 1 -a 192.168.1.2 -p 3260
[Thu Nov 19 20:06:27.378 2015] dracut-initqueue[359]: iscsistart: can not connect to iSCSI daemon (111)!
[Thu Nov 19 20:06:27.388 2015] dracut-initqueue[359]: iscsistart: cannot make a connection to 192.168.1.2:3260 (-1,101)
...
[Thu Nov 19 20:07:26.492 2015] [ TIME ] Timed out waiting for device dev-mapper-mpathb1.device.
[Thu Nov 19 20:07:26.503 2015] [DEPEND] Dependency failed for /kdumproot/var/crash.
[Thu Nov 19 20:07:26.513 2015] [DEPEND] Dependency failed for Initrd Root File System.
[Thu Nov 19 20:07:26.523 2015] [DEPEND] Dependency failed for Reload Configuration from the Real Root.
-----

From the above, we can see that iscasistart is trying to connect enp12s0f0(192.168.4.100) to the target(192.168.1.2) although the path is down.

Iscasistart seems to be trying the combinations of the NICs (enp12s0f0, enp12s0f1) and the targets set in /etc/cmdline.d/50iscsi.conf in the initramf, one by one.

--- /etc/cmdline.d/50iscsi.conf/ -----
netroot=iscsi:@192.168.1.2::::iqn.2016-07.local.fainest.db
rd.iscsi.initiator=iqn.2016-07.local.intdb01
netroot=iscsi:@192.168.4.2::::iqn.2016-07.local.fainest.db
netroot=iscsi:@192.168.4.2::::iqn.2016-07.local.faines:os.db01
netroot=iscsi:@192.168.1.2::::iqn.2016-07.local.faines:os.db01
---------------------------------------

Taking all things into consideration, Hitachi thinks iscasistart didn't have enough time to try the right combination before dev-mapper-mpathb1.device timed out.

What do you think?

Best Regards,
Masaharu Nakagawa"
"
Comment 61 Harald Hoyer 2015-11-19 10:51:49 EST
Are they using the new RHEL-7.2 dracut?
Comment 62 loberman 2015-11-22 11:38:28 EST
Hello Harald

This is what they have, I have asked them to update, now that 7.2 is out.

<dracut and device-mapper versions>
dracut-config-rescue-033-241.el7_1.5.x86_64
dracut-network-033-241.el7_1.5.x86_64
dracut-033-241.el7_1.5.x86_64

device-mapper-1.02.93-3.el7.x86_64
device-mapper-event-1.02.93-3.el7.x86_64
device-mapper-event-libs-1.02.93-3.el7.x86_64
device-mapper-libs-1.02.93-3.el7.x86_64
device-mapper-multipath-0.4.9-77.el7.x86_64
device-mapper-multipath-libs-0.4.9-77.el7.x86_64
device-mapper-persistent-data-0.4.1-2.el7.x86_64
dhclient-4.2.5-36.el7.x86_64
Comment 63 Harald Hoyer 2015-11-23 04:45:43 EST
(In reply to loberman from comment #62)
> Hello Harald
> 
> This is what they have, I have asked them to update, now that 7.2 is out.
> 
> <dracut and device-mapper versions>
> dracut-config-rescue-033-241.el7_1.5.x86_64
> dracut-network-033-241.el7_1.5.x86_64
> dracut-033-241.el7_1.5.x86_64

Yes, that's too old. I hope the new one works for them.
Comment 64 Harald Hoyer 2015-12-15 09:29:56 EST
(In reply to loberman from comment #62)
> Hello Harald
> 
> This is what they have, I have asked them to update, now that 7.2 is out.
> 

can this be closed now?
Comment 65 loberman 2015-12-15 09:52:06 EST
Hello Harald,
No Sir, not yet.

I am waiting for them to have a maintenance window to test this again.

Thanks
Laurence
Comment 67 loberman 2016-01-06 20:38:36 EST
Hello Harald

The customer came back after testing and is till having issues with kdump when a path is disconnected.

See the following and I will attach tah tar file discussed below.

Please lets try finally get this resolved here.

Thanks
Laurence

Environment]
RHEL7.1 updated with the following packages:
kernel-3.10.0-327.3.1.el7.x86_64
dracut-033-360.el7_2.x86_64
dracut-network-033-360.el7_2.x86_64
dracut-config-rescue-033-360.el7_2.x86_64
iscsi-initiator-utils-6.2.0.873-32.el7.x86_64
iscsi-initiator-utils-iscsiuio-6.2.0.873-32.el7.x86_64
device-mapper-multipath-libs-0.4.9-85.el7.x86_64
device-mapper-multipath-0.4.9-85.el7.x86_64

Physical Configurations:
| 192.168.4.100 |---| SW#1 |--------| 192.168.4.2  |
| server        |                   |     storage  |
| 192.168.1.100 |---| SW#2 |--------| 192.168.1.2  |

NOTE:
- SW#1 and SW#2 are not directly connected with each other.
- "/" and "/var/crash" are located at different disks.

[Results]
 - 2path connected: Kdump succeeded.
 - 1path down:      Kdump failed. (Even the 1st OS failed to start.)

[Possible Cause for the failure for 1path down]
kdump-capture.service failed to start because it depends on dracut-initqueue.service, which seems to be retrying the following processing waiting for enp12s0f0 in the down path to be up.

[  866.745273] dracut-initqueue[474]: /bin/dracut-initqueue@26(): check_finished
[  866.745387] dracut-initqueue[474]: /lib/dracut-lib.sh@422(check_finished): local f
[  866.745502] dracut-initqueue[474]: /lib/dracut-lib.sh@423(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[  866.745618] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f3bfae4b2-a401-4324-b935-67f4f95e2d21.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  866.745734] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f3bfae4b2-a401-4324-b935-67f4f95e2d21.sh' ']'
[  866.745855] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f3bfae4b2-a401-4324-b935-67f4f95e2d21.sh'
[  866.745971] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f3bfae4b2-a401-4324-b9'[' -e /dev/disk/by-uuid/3bfae4b2-a401-4324-b935-67f4f95e2d21 ']'
[  867.068162] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f6ae82cad-c577-41d6-8865-229cd8f15d22.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  867.068413] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f6ae82cad-c577-41d6-8865-229cd8f15d22.sh' ']'
[  867.068566] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f6ae82cad-c577-41d6-8865-229cd8f15d22.sh'
[  867.068687] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f6ae82cad-c577-41d6-88'[' -e /dev/disk/by-uuid/6ae82cad-c577-41d6-8865-229cd8f15d22 ']'
[  867.068812] dracut-initqueue[474]: /lib/dracut-lib.sh@423(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[  867.068930] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2frhel-root.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  867.069054] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2frhel-root.sh' ']'
[  867.069175] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2frhel-root.sh'
[  867.069291] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2frhel-root.sh@1(source): '[' -e /dev/mapper/rhel-root ']'
[  867.069410] dracut-initqueue[474]: /lib/dracut-lib.sh@423(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[  867.069528] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2froot.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  867.069643] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2froot.sh' ']'
[  867.069793] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2froot.sh'
[  867.069909] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2froot.sh@1(source): '[' -e /dev/rhel/root ']'
[  867.070040] dracut-initqueue[474]: /lib/dracut-lib.sh@423(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[  867.070158] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2fswap.sh' = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  867.070273] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2fswap.sh' ']'
[  867.070391] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . '/lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2fswap.sh'
[  867.070506] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2frhel\x2fswap.sh@1(source): '[' -e /dev/rhel/swap ']'
[  867.070625] dracut-initqueue[474]: /lib/dracut-lib.sh@423(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[  867.070740] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' /lib/dracut/hooks/initqueue/finished/iscsi_started.sh = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  867.070854] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e /lib/dracut/hooks/initqueue/finished/iscsi_started.sh ']'
[  867.070975] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . /lib/dracut/hooks/initqueue/finished/iscsi_started.sh
[  867.071121] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/iscsi_started.sh@1(source): '[' -f /tmp/iscsistarted-iscsi:@192.168.1.2::::iqn.2016-07.local.faines:os.db01 ']'
[  867.467154] dracut-initqueue[474]: /lib/dracut-lib.sh@423(check_finished): for f in '$hookdir/initqueue/finished/*.sh'
[  867.467394] dracut-initqueue[474]: /lib/dracut-lib.sh@424(check_finished): '[' /lib/dracut/hooks/initqueue/finished/wait-enp12s0f0.sh = '/lib/dracut/hooks/initqueue/finished/*.sh' ']'
[  867.467510] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): '[' -e /lib/dracut/hooks/initqueue/finished/wait-enp12s0f0.sh ']'
[  867.467630] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): . /lib/dracut/hooks/initqueue/finished/wait-enp12s0f0.sh
[  867.467745] dracut-initqueue[474]: //lib/dracut/hooks/initqueue/finished/wait-enp12s0f0.sh@1(source): '[' -f /tmp/net.enp12s0f0.did-setup ']' <--- It's waiting for enp12s0f0.
[  867.467866] dracut-initqueue[474]: /lib/dracut-lib.sh@425(check_finished): return 1


<Data to show the results>
"test_results_20151216.tar" includes the following data for each environment (1path down, 1path down with debug, and 2 pathes alive).
 - 1path_down
    10.197.57.8-20151214-195254.log

 - 1 path down with debug
    analyzing_log.xlsx
    10.197.57.8-20151215-193532.log

 - 2 pathes alive
    10.197.57.8-20151214-193447.log
    sosreport-localhost.localdomain-20151214194613.tar.xz
    sosreport-localhost.localdomain-20151214194613.tar.xz.md5

Best Regards,
Masaharu Nakagawa
Comment 68 loberman 2016-01-06 20:43 EST
Created attachment 1112315 [details]
Test tar file of latest test with all updates, still fails when we have one path down.

Latest test tar file.
Comment 69 loberman 2016-01-15 10:32:54 EST
Harald

Please look into this ASAP. Hitachi is starting to ring manager bells now.

Thanks
Laurence
Comment 72 Harald Hoyer 2016-01-26 06:28:59 EST
This should fix the waiting for the interface.
https://github.com/haraldh/dracut/commit/f4f8fb5c10cc8d0047123324197aff25f0a63e04
Comment 73 loberman 2016-01-31 18:58:56 EST
Hello Harald

The tested the patch but unfortunately they still have issues.
See below


"
Laurence-san,

Here are the test results of the test patch.
Please keep your investigation and share your progress.
Hitachi can do any additional tests if you ask us.

--- <test results>---
Condition     | Kdump result | log (in the attached test_patch_results.tar)
2paths alive  | succeeded    | 10.197.57.8-20160128-173715.log
1path down    | failed       | 10.197.57.8-20160129-152805.log
----------------------

Unfortunately we still cannot get kdump with one of the two paths down. (10.197.57.8-20160129-152805.log)
The processing stopped at Fri Jan 29 15:32:23.323 2016 in the following log.
---
[Fri Jan 29 15:32:22.704 2016] [  173.957013] EXT4-fs (dm-7): mounted filesystem with ordered data mode. Opts: (null)
[Fri Jan 29 15:32:22.714 2016] [  OK  ] Mounted /kdumproot/var/crash.
[Fri Jan 29 15:32:23.001 2016] [  174.253233] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null)
[Fri Jan 29 15:32:23.014 2016] [  OK  ] Mounted /sysroot.

Note here in the log

[Fri Jan 29 15:32:23.292 2016] [FAILED] Failed to start Login iSCSI Target iqn.2016-07.local.faines:dt.db.
[Fri Jan 29 15:32:23.303 2016] See 'systemctl status "iscsistart_\\x40...l.faines:dt.db.service"' for details.
[Fri Jan 29 15:32:23.313 2016] [FAILED] Failed to start Login iSCSI Target iqn.2016-07.local.faines:os.db01.
[Fri Jan 29 15:32:23.323 2016] See 'systemctl status "iscsistart_\\x40...faines:os.db01.service"' for details.
 << Stopped here! >>
---


Then, we tried it in debug mode. (10.197.57.8-20160129-091007_with_debug.log)
Kdump failed again, but in this case, it resulted in rebooting.

Best Regards,
Masaharu Nakagawa
"

I will attach the logs
Thanks
Laurence
Comment 74 loberman 2016-01-31 19:00 EST
Created attachment 1119933 [details]
Latest tar file of test with suggested patch that still fails
Comment 103 Mike McCune 2016-03-28 18:31:02 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 104 Harald Hoyer 2016-06-29 12:16:24 EDT
So, I'll close this bug, because I fixed the "rd.neednet=1" part, and mounting the filesystem seems to be working also.
Comment 108 Martin Banas 2016-08-31 09:42:05 EDT
Thanks!

Moving to VERIFIED.
Comment 110 errata-xmlrpc 2016-11-04 03:58:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2530.html

Note You need to log in before you can comment on or make changes to this bug.