Description of problem: When I try to install RHEL 7.6 Alpha on my server, it either gets stuck during the discovering multipath devices step or it can't seem to find any of my volumes. Version-Release number of selected component (if applicable): RHEL 7.6 Alpha Steps to Reproduce: 1. Try to install RHEL 7.6 Alpha to SANboot LUN Actual results: Fails to install because it can't find any volumes Expected results: Can find volumes and installs to SANboot LUN Additional info: Issue seems to be only occurring on servers that have QLE2692s and QLE26742s. I am running with FW version 8.08.03 and driver version 9.00.00.00.40.0-k1.
Are you able to get any of the anaconda logs?
Created attachment 1474816 [details] anaconda logs
Are there any updates on this? Thanks, Jennifer Duong
The log attached doesn't really provide any information. I was looking specifically for the anancoda storage.log program.log and syslog output. You might have to set up remote logging during install to get these since your installation is failing.
If you can provide the anaconda-tb-* log file. that should also be enough, since it contains all the other logs.
Created attachment 1478545 [details] program.log
Created attachment 1478546 [details] storage.log
Created attachment 1478547 [details] syslog
Ben, I was able to find storage.log, program.log, and the syslog output in the /tmp directory and have attached those. However, I could not find an anaconda-tb* log file in the /tmp directory. Thanks, Jennifer
Ben, I went ahead and tried installing RHEL 7.6 Beta in hopes that this might've been fixed in a different bug, but it looks like I'm still getting stuck during the discovering multipath devices step or it can't seem to find any of my volumes. The issue still seems to be only occurring on servers that have QLE2692s and QLE26742s Thanks, Jennifer
I assume that your volumes aren't visible because they are not working and were removed from the system. Looking at the syslog output. 17:12:05,262 INFO kernel:scsi 12:0:0:0: Device offlined - not ready after error recovery 17:12:06,789 INFO kernel:scsi 13:0:0:0: Device offlined - not ready after error recovery 17:12:08,772 INFO kernel:scsi 14:0:0:0: Device offlined - not ready after error recovery 17:12:10,764 INFO kernel:scsi 15:0:0:0: Device offlined - not ready after error recovery 17:12:15,611 INFO systemd:Started LVM2 metadata daemon. 17:12:19,808 ERR kernel: rport-12:0-1: blocked FC remote port time out: removing target and saving binding 17:12:19,808 ERR kernel: rport-12:0-0: blocked FC remote port time out: removing target and saving binding 17:12:21,856 ERR kernel: rport-13:0-1: blocked FC remote port time out: removing target and saving binding 17:12:23,904 ERR kernel: rport-14:0-0: blocked FC remote port time out: removing target and saving binding 17:12:23,904 ERR kernel: rport-13:0-0: blocked FC remote port time out: removing target and saving binding 17:12:23,904 ERR kernel: rport-14:0-1: blocked FC remote port time out: removing target and saving binding 17:12:25,824 ERR kernel: rport-15:0-1: blocked FC remote port time out: removing target and saving binding 17:12:27,872 ERR kernel: rport-15:0-0: blocked FC remote port time out: removing target and saving binding multipathd is running, but the only device it sees is sda, which isn't a SAN device 17:11:21,990 NOTICE kernel:sd 1:0:0:1: [sda] Attached SCSI removable disk It looks like a USB device. I'm reassigning this to a storage driver bug, since the syslog messages make it look like this is happening before multipath even has any chance to do anything with the storage.
Looks like a timeout followed by an abort when probing on all 4 qla2xxx HBAs, followed by SCSI EH escalations culminating in an adapter resets. It would seem that the HBAs could login to target 0 at least but couldn't access it. 17:11:21,219 WARNING kernel:qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.06.07.6-k. 17:11:21,219 WARNING kernel:qla2xxx [0000:04:00.0]-011c: : MSI-X vector count: 16. 17:11:21,219 WARNING kernel:qla2xxx [0000:04:00.0]-001d: : Found an ISP2261 irq 43 iobase 0xffff9bdd83ae6000. 17:11:21,221 DEBUG kernel:qla2xxx 0000:04:00.0: irq 44 for MSI/MSI-X 17:11:21,221 DEBUG kernel:qla2xxx 0000:04:00.0: irq 45 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 46 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 47 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 48 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 49 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 50 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 51 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 52 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 53 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 54 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 55 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 56 for MSI/MSI-X 17:11:21,222 DEBUG kernel:qla2xxx 0000:04:00.0: irq 57 for MSI/MSI-X 17:11:21,222 ERR kernel:qla2xxx [0000:04:00.0]-00c6:12: MSI-X: Failed to enable support with 16 vectors, using 14 vectors 17:11:21,407 WARNING kernel:qla2xxx [0000:04:00.0]-0075:12: ZIO mode 6 enabled; timer delay (200 us). 17:11:23,057 WARNING kernel:qla2xxx [0000:04:00.0]-d302:12: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) 17:11:23,211 INFO kernel:scsi host12: qla2xxx 17:11:23,211 WARNING kernel:qla2xxx [0000:04:00.0]-00fb:12: QLogic QLE2742 - QLogic 32Gb 2-port FC to PCIe Gen3 x8 Adapter. 17:11:23,211 WARNING kernel:qla2xxx [0000:04:00.0]-00fc:12: ISP2261: PCIe (8.0GT/s x8) @ 0000:04:00.0 hdma+ host#=12 fw=8.08.03 (d0d5). 17:11:23,217 WARNING kernel:qla2xxx [0000:04:00.1]-011c: : MSI-X vector count: 16. 17:11:23,217 WARNING kernel:qla2xxx [0000:04:00.1]-001d: : Found an ISP2261 irq 63 iobase 0xffff9bdd83b3e000. 17:11:23,218 DEBUG kernel:qla2xxx 0000:04:00.1: irq 64 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 65 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 66 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 67 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 68 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 69 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 70 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 71 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 74 for MSI/MSI-X 17:11:23,219 DEBUG kernel:qla2xxx 0000:04:00.1: irq 75 for MSI/MSI-X 17:11:23,220 DEBUG kernel:qla2xxx 0000:04:00.1: irq 76 for MSI/MSI-X 17:11:23,220 DEBUG kernel:qla2xxx 0000:04:00.1: irq 77 for MSI/MSI-X 17:11:23,220 DEBUG kernel:qla2xxx 0000:04:00.1: irq 78 for MSI/MSI-X 17:11:23,220 DEBUG kernel:qla2xxx 0000:04:00.1: irq 79 for MSI/MSI-X 17:11:23,220 ERR kernel:qla2xxx [0000:04:00.1]-00c6:13: MSI-X: Failed to enable support with 16 vectors, using 14 vectors 17:11:23,268 WARNING kernel:qla2xxx [0000:04:00.1]-0075:13: ZIO mode 6 enabled; timer delay (200 us). 17:11:24,213 WARNING kernel:qla2xxx [0000:04:00.0]-500a:12: LOOP UP detected (16 Gbps). 17:11:24,719 WARNING kernel:qla2xxx [0000:04:00.0]-ffff:12: register_localport: host-traddr=nn-0x20000024ff7ef9f4:pn-0x21000024ff7ef9f4 on portID:3d1300 17:11:25,056 WARNING kernel:qla2xxx [0000:04:00.1]-d302:13: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) 17:11:25,211 INFO kernel:scsi host13: qla2xxx 17:11:25,216 WARNING kernel:qla2xxx [0000:04:00.1]-00fb:13: QLogic QLE2742 - QLogic 32Gb 2-port FC to PCIe Gen3 x8 Adapter. 17:11:25,216 WARNING kernel:qla2xxx [0000:04:00.1]-00fc:13: ISP2261: PCIe (8.0GT/s x8) @ 0000:04:00.1 hdma+ host#=13 fw=8.08.03 (d0d5). 17:11:25,218 WARNING kernel:qla2xxx [0000:05:00.0]-011c: : MSI-X vector count: 16. 17:11:25,218 WARNING kernel:qla2xxx [0000:05:00.0]-001d: : Found an ISP2261 irq 80 iobase 0xffff9bdd83b72000. 17:11:25,221 DEBUG kernel:qla2xxx 0000:05:00.0: irq 81 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 82 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 83 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 84 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 85 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 86 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 87 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 88 for MSI/MSI-X 17:11:25,222 DEBUG kernel:qla2xxx 0000:05:00.0: irq 89 for MSI/MSI-X 17:11:25,223 DEBUG kernel:qla2xxx 0000:05:00.0: irq 90 for MSI/MSI-X 17:11:25,223 DEBUG kernel:qla2xxx 0000:05:00.0: irq 91 for MSI/MSI-X 17:11:25,223 DEBUG kernel:qla2xxx 0000:05:00.0: irq 92 for MSI/MSI-X 17:11:25,223 DEBUG kernel:qla2xxx 0000:05:00.0: irq 93 for MSI/MSI-X 17:11:25,223 DEBUG kernel:qla2xxx 0000:05:00.0: irq 94 for MSI/MSI-X 17:11:25,223 ERR kernel:qla2xxx [0000:05:00.0]-00c6:14: MSI-X: Failed to enable support with 16 vectors, using 14 vectors 17:11:25,271 WARNING kernel:qla2xxx [0000:05:00.0]-0075:14: ZIO mode 6 enabled; timer delay (200 us). 17:11:25,961 WARNING kernel:qla2xxx [0000:04:00.1]-500a:13: LOOP UP detected (32 Gbps). 17:11:26,423 WARNING kernel:qla2xxx [0000:04:00.1]-ffff:13: register_localport: host-traddr=nn-0x20000024ff7ef9f5:pn-0x21000024ff7ef9f5 on portID:10300 17:11:27,032 WARNING kernel:qla2xxx [0000:05:00.0]-d302:14: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) 17:11:27,187 INFO kernel:scsi host14: qla2xxx 17:11:27,192 WARNING kernel:qla2xxx [0000:05:00.0]-00fb:14: QLogic QLE2692 - QLogic 16Gb 2-port FC to PCIe Gen3 x8 Adapter. 17:11:27,192 WARNING kernel:qla2xxx [0000:05:00.0]-00fc:14: ISP2261: PCIe (8.0GT/s x8) @ 0000:05:00.0 hdma+ host#=14 fw=8.08.03 (d0d5). 17:11:27,194 WARNING kernel:qla2xxx [0000:05:00.1]-011c: : MSI-X vector count: 16. 17:11:27,194 WARNING kernel:qla2xxx [0000:05:00.1]-001d: : Found an ISP2261 irq 38 iobase 0xffff9bdd83bb0000. 17:11:27,197 DEBUG kernel:qla2xxx 0000:05:00.1: irq 95 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 96 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 97 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 98 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 99 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 100 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 101 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 102 for MSI/MSI-X 17:11:27,198 DEBUG kernel:qla2xxx 0000:05:00.1: irq 103 for MSI/MSI-X 17:11:27,199 DEBUG kernel:qla2xxx 0000:05:00.1: irq 104 for MSI/MSI-X 17:11:27,199 DEBUG kernel:qla2xxx 0000:05:00.1: irq 105 for MSI/MSI-X 17:11:27,199 DEBUG kernel:qla2xxx 0000:05:00.1: irq 106 for MSI/MSI-X 17:11:27,199 DEBUG kernel:qla2xxx 0000:05:00.1: irq 107 for MSI/MSI-X 17:11:27,199 DEBUG kernel:qla2xxx 0000:05:00.1: irq 108 for MSI/MSI-X 17:11:27,199 ERR kernel:qla2xxx [0000:05:00.1]-00c6:15: MSI-X: Failed to enable support with 16 vectors, using 14 vectors 17:11:27,247 WARNING kernel:qla2xxx [0000:05:00.1]-0075:15: ZIO mode 6 enabled; timer delay (200 us). 17:11:27,817 WARNING kernel:qla2xxx [0000:05:00.0]-500a:14: LOOP UP detected (16 Gbps). 17:11:28,322 WARNING kernel:qla2xxx [0000:05:00.0]-ffff:14: register_localport: host-traddr=nn-0x20000024ff1bfd96:pn-0x21000024ff1bfd96 on portID:3d1200 17:11:29,022 WARNING kernel:qla2xxx [0000:05:00.1]-d302:15: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) 17:11:29,178 INFO kernel:scsi host15: qla2xxx 17:11:29,183 WARNING kernel:qla2xxx [0000:05:00.1]-00fb:15: QLogic QLE2692 - QLogic 16Gb 2-port FC to PCIe Gen3 x8 Adapter. 17:11:29,183 WARNING kernel:qla2xxx [0000:05:00.1]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:05:00.1 hdma+ host#=15 fw=8.08.03 (d0d5). 17:11:29,960 WARNING kernel:qla2xxx [0000:05:00.1]-500a:15: LOOP UP detected (16 Gbps). 17:11:30,467 WARNING kernel:qla2xxx [0000:05:00.1]-ffff:15: register_localport: host-traddr=nn-0x20000024ff1bfd97:pn-0x21000024ff1bfd97 on portID:10200 17:11:45,770 WARNING kernel:qla2xxx [0000:04:00.0]-801c:12: Abort command issued nexus=12:0:0 -- 0 2003. 17:11:45,770 WARNING kernel:qla2xxx [0000:04:00.0]-8009:12: DEVICE RESET ISSUED nexus=12:0:0 cmd=ffff8e18e692c380. 17:11:45,770 ERR kernel:qla2xxx [0000:04:00.0]-5039:12: Async-tmf error - hdl=2b completion status(28). 17:11:47,774 ERR kernel:qla2xxx [0000:04:00.0]-800d:12: wait for pending cmds failed for cmd=ffff8e18e692c380. 17:11:47,774 WARNING kernel:qla2xxx [0000:04:00.0]-800f:12: DEVICE RESET FAILED: Waiting for command completions nexus=12:0:0 cmd=ffff8e18e692c380. 17:11:47,774 WARNING kernel:qla2xxx [0000:04:00.0]-8009:12: TARGET RESET ISSUED nexus=12:0:0 cmd=ffff8e18e692c380. 17:11:47,774 ERR kernel:qla2xxx [0000:04:00.0]-5039:12: Async-tmf error - hdl=2c completion status(28). 17:11:47,820 WARNING kernel:qla2xxx [0000:04:00.1]-801c:13: Abort command issued nexus=13:0:0 -- 0 2003. 17:11:47,820 WARNING kernel:qla2xxx [0000:04:00.1]-8009:13: DEVICE RESET ISSUED nexus=13:0:0 cmd=ffff8e1ce69b6840. 17:11:47,820 ERR kernel:qla2xxx [0000:04:00.1]-5039:13: Async-tmf error - hdl=22 completion status(28). 17:11:49,775 ERR kernel:qla2xxx [0000:04:00.0]-800d:12: wait for pending cmds failed for cmd=ffff8e18e692c380. 17:11:49,775 WARNING kernel:qla2xxx [0000:04:00.0]-800f:12: TARGET RESET FAILED: Waiting for command completions nexus=12:0:0 cmd=ffff8e18e692c380. 17:11:49,775 WARNING kernel:qla2xxx [0000:04:00.0]-8012:12: BUS RESET ISSUED nexus=12:0:0. 17:11:49,775 ERR kernel:qla2xxx [0000:04:00.0]-5039:12: Async-tmf error - hdl=2e completion status(28). 17:11:49,802 WARNING kernel:qla2xxx [0000:05:00.0]-801c:14: Abort command issued nexus=14:0:0 -- 0 2003. 17:11:49,802 WARNING kernel:qla2xxx [0000:05:00.0]-8009:14: DEVICE RESET ISSUED nexus=14:0:0 cmd=ffff8e1ce69b7640. 17:11:49,802 ERR kernel:qla2xxx [0000:05:00.0]-5039:14: Async-tmf error - hdl=2f completion status(28). 17:11:49,822 ERR kernel:qla2xxx [0000:04:00.1]-800d:13: wait for pending cmds failed for cmd=ffff8e1ce69b6840. 17:11:49,822 WARNING kernel:qla2xxx [0000:04:00.1]-800f:13: DEVICE RESET FAILED: Waiting for command completions nexus=13:0:0 cmd=ffff8e1ce69b6840. 17:11:49,822 WARNING kernel:qla2xxx [0000:04:00.1]-8009:13: TARGET RESET ISSUED nexus=13:0:0 cmd=ffff8e1ce69b6840. 17:11:49,822 ERR kernel:qla2xxx [0000:04:00.1]-5039:13: Async-tmf error - hdl=23 completion status(28). 17:11:49,976 WARNING kernel:qla2xxx [0000:04:00.0]-500b:12: LOOP DOWN detected (4 7 0 0). 17:11:50,718 WARNING kernel:qla2xxx [0000:04:00.0]-500a:12: LOOP UP detected (16 Gbps). 17:11:51,778 ERR kernel:qla2xxx [0000:04:00.0]-8014:12: Wait for pending commands failed. 17:11:51,778 ERR kernel:qla2xxx [0000:04:00.0]-802b:12: BUS RESET FAILED nexus=12:0:0. 17:11:51,778 WARNING kernel:qla2xxx [0000:04:00.0]-8018:12: ADAPTER RESET ISSUED nexus=12:0:0. 17:11:51,778 WARNING kernel:qla2xxx [0000:04:00.0]-00af:12: Performing ISP error recovery - ha=ffff8e1979878000. 17:11:51,786 WARNING kernel:qla2xxx [0000:05:00.1]-801c:15: Abort command issued nexus=15:0:0 -- 0 2003. 17:11:51,786 WARNING kernel:qla2xxx [0000:05:00.1]-8009:15: DEVICE RESET ISSUED nexus=15:0:0 cmd=ffff8e1ce682b2c0. 17:11:51,786 ERR kernel:qla2xxx [0000:05:00.1]-5039:15: Async-tmf error - hdl=22 completion status(28). 17:11:51,803 WARNING kernel:qla2xxx [0000:04:00.0]-0075:12: ZIO mode 6 enabled; timer delay (200 us). 17:11:51,803 ERR kernel:qla2xxx [0000:05:00.0]-800d:14: wait for pending cmds failed for cmd=ffff8e1ce69b7640. 17:11:51,803 WARNING kernel:qla2xxx [0000:05:00.0]-800f:14: DEVICE RESET FAILED: Waiting for command completions nexus=14:0:0 cmd=ffff8e1ce69b7640. 17:11:51,803 WARNING kernel:qla2xxx [0000:05:00.0]-8009:14: TARGET RESET ISSUED nexus=14:0:0 cmd=ffff8e1ce69b7640. 17:11:51,804 ERR kernel:qla2xxx [0000:05:00.0]-5039:14: Async-tmf error - hdl=30 completion status(28). 17:11:51,825 ERR kernel:qla2xxx [0000:04:00.1]-800d:13: wait for pending cmds failed for cmd=ffff8e1ce69b6840. 17:11:51,825 WARNING kernel:qla2xxx [0000:04:00.1]-800f:13: TARGET RESET FAILED: Waiting for command completions nexus=13:0:0 cmd=ffff8e1ce69b6840.
Ewan, So what exactly does this mean? Should we contact QLogic about this? Thanks, Jennifer Duong
It looks like a problem communicating between the HBA and the target (array). The host is not able to issue commands to probe for devices, it appears. First thing: - If you install an earlier version of the software (e.g. 7.5 GA) on the same machine, does it connect to the array without any issues?
Yes, it does
OK, thanks, we're looking at it.
Has anyone gotten a chance to look further into this? Thanks, Jennifer Duong
Can you please try setting qla2xxx.ql2xnvmeenable = 0 and see if that makes any difference? Also can you please report what model storage array you are attempting to connect to and the f/w revision level (i.e. version of ONTAP if it is a NetApp array) ? I am not seeing a problem on any of the systems I have here and there are no failure reports of this nature from our QE testing either. I suspect the problem may be related to the NVMe changes in the firmware or the driver but am not certain. Works OK with QLE2562 and QLE2662 as far as I can see. There is one bug fix to qla2xxx in -938.el7 but I suspect you would not be hitting the underlying problem to cause the issue during the device probe. [scsi] qla2xxx: Fix memory leak for allocating abort IOCB Himanshu, any other ideas?
Hi Ewan, (In reply to Ewan D. Milne from comment #19) > Can you please try setting qla2xxx.ql2xnvmeenable = 0 and see if that makes > any difference? > > Also can you please report what model storage array you are attempting to > connect to and the f/w revision level (i.e. version of ONTAP if it is > a NetApp array) ? > > I am not seeing a problem on any of the systems I have here and there are > no failure reports of this nature from our QE testing either. I suspect > the problem may be related to the NVMe changes in the firmware or the driver > but am not certain. > > Works OK with QLE2562 and QLE2662 as far as I can see. > > There is one bug fix to qla2xxx in -938.el7 but I suspect you would not be > hitting the underlying problem to cause the issue during the device probe. > > [scsi] qla2xxx: Fix memory leak for allocating abort IOCB > > Himanshu, any other ideas? The logs are not very helpful to indicate issue with NVMe code since we have not seen this with any of our env. Can you capture logs with ql2xextended_error_logging=0x5200b000 Just to confirm, is you setup direct attached or switch fabric mode? Thanks, Himanshu
Ewan, The install went further this time around when setting qla2xxx.ql2xnvmeenable=0. However, when booting up, one of my servers seems to crash and the other boots into emergency mode. I am attempting to connect to an E5600 that is running 8.40 FW and an E2800 that is running 11.50 FW. Himanshu, I am running a combination of both direct and fabric attached. The server that is crashing is fabric, while the server that is booting into emergency mode is direct. I will be attaching a screenshot of the crash shortly. As far as the server that is in emergency mode, there doesn't look to be any message logs or syslog output. During my installation I went ahead and set qla2xxx.ql2xextended_error_logging=0x5200b000, but I'm not entirely sure what logs you would want if there are no message logs or syslog output. Thanks, Jennifer
Created attachment 1485818 [details] Server crash
Is it possible for you to attach more of the information from the crash, such as the stack trace? i.e. is the machine connected to a serial console where you can capture the output? Crash appears to be in kmem_cache_alloc() but the RIP address looks beyond _end. Was a crash dump generated? If so can you make it available?
Ewan, I apologize for taking so long to get back. I tried setting up a serial console a while back, but I wasn't able to get it to work. Thanks, Jennifer Duong
Ewan, Is there an alternate method of capturing the output? Thanks, Jennifer Duong
Hi Jennifer, (In reply to jennifer.duong from comment #25) > Ewan, > > Is there an alternate method of capturing the output? > > Thanks, > > Jennifer Duong Can you try with snap 5 and see if you are able to make progress. Thanks, Himanshu
Himanshu, I tried installing SS5 both with and without both of the qla2xxx parameters, but it looks like both of my installations (1 x fabric-connect, 1 x direct-connect) resulted in the servers booting into emergency mode. Thanks, Jennifer Duong
Himanshu, It looks like I got the same results for RHEL 7.6 RC (both direct-connect and fabric-connect booting into emergency mode). Are there any next steps on debugging this issue if I'm unable to successfully get the serial console to work? Thanks, Jennifer Duong
Hi Jennifer, Just so that I am clear on issue here With RHEL7.6-RC build, using ql2xnvmeenable=0 as well, you are seeing issue with Direct-connect and Fabric connect mode to discover your SANBoot LUNs. You are not able to capture serial console logs for this behavior. Please let me know if the above is correct. I will discuss with extended team at Marvell and get back to you on next step. Thanks, Himanshu
Himanshu, Yes, that is correct. Thanks, Jennifer Duong
Himanshu, Do you by chance have any updates on the next step? Thanks, Jennifer Duong
Himanshu, Are there any next steps that I should take? Thanks, Jennifer Duong
Hi Jennifer, Sorry about long delay. I wanted to get some more informations/steps from you so that we can understand your setup better 1. Do you see issue with Local Boot on the same host with Same Target? 2. Can we try to reduce the connection and maybe just use Fabric Connect and see if you are able to capture logs? 3. Just to clarify, are you using FCP LUN for SAN Boot or FC-NVMe LUN for SAN Boot? We do not currently support FC-NVMe LUN for SAN Boot, so just wanted to confirm your configuration. 4. Also can you provide your Target Model number with firmware revision level on the Storage Adapters? (In reply to jennifer.duong from comment #32) > Himanshu, > > Are there any next steps that I should take? > > Thanks, > > Jennifer Duong Thanks, Himanshu
Hi Himanshu, 1. No, I do not see an issue with a local install. However, I am not able to see any volumes or either of my arrays after the installation. 2. I'm not entirely sure what you mean by reducing the connections and using fabric connect 3. FCP LUN 4. 1 x QLE2692 (FW:v8.08.03 DVR:v10.00.00.06.07.6-k) and 1 x QLE2742 (FW:v8.08.03 DVR:v10.00.00.06.07.6-k) I've also noticed that if I downgrade my FW for my HBAs to v8.07.80, I'm able to install to my SANboot LUN just fine.
Hi Jennifer, Apologies for delay in response. I was out sick last week. I am in process of porting our Boot from SAN patches onto RH76 GA kernel. For that kernel to try, what I would suggest is to use following steps 1. First use 8.07.08 firmware to install and boot from the SANBoot Kernel. 2. Install new kernel with fixed qla2xxx driver. 3. Update firmware to 8.08.03. Update your initramfs and see if you are able to boot from SANboot LUN. Thanks, Himanshu
Himanshu, Do you happen to have a link of where that kernel can be found? Thanks, Jennifer Duong
Hi Jennifer, Sorry have not yet finished porting patches. I will have kernel built by Friday 11/16 and will ask Ewan to post it for you to download. (In reply to jennifer.duong from comment #36) > Himanshu, > > Do you happen to have a link of where that kernel can be found? > > Thanks, > > Jennifer Duong Thanks, Himanshu
Jennifer, My server is very slow and could not make any progress for kernel build. I'll have this ready over weekend but you wont see it until Monday. Sorry for delay. Thanks, Himanshu (In reply to Himanshu Madhani (Cavium) from comment #37) > Hi Jennifer, > > Sorry have not yet finished porting patches. I will have kernel built by > Friday 11/16 and will ask Ewan to post it for you to download. > > (In reply to jennifer.duong from comment #36) > > Himanshu, > > > > Do you happen to have a link of where that kernel can be found? > > > > Thanks, > > > > Jennifer Duong > > Thanks, > Himanshu
Hi Jennifer, I was able to kick-off build with following patches. * 619fe86 (HEAD, bz1613543) qla2xxx: Update driver version * 0968b4e scsi: qla2xxx: Fix driver hang when FC-NVMe LUNs are configured * dddf3a9 scsi: qla2xxx: Fix re-using LoopID when handle is in use * 8934491 scsi: qla2xxx: Fix duplicate switch database entries * 4678615 scsi: qla2xxx: Fix NVMe session hang on unload * ef4ee55 scsi: qla2xxx: Fix stalled relogin * c6ae09e scsi: qla2xxx: Fix unintended Logout This build will be ready in couple hours and by Monday Ewan should be able to post it for you to try it out. In mean time can you provide me details of your configuration one more time. 1. What Target Array that you are using. - I need specific model/version - I need Software version on the target array 2. What adapter is on the Target system - I need exact ISP number - I need Firmware running on the adapter 3. Can you provide me details of your switch - I need Switch Model - I need firmware/OS of your switch. Thanks, Himanshu
Hi Ewan, Can you please provide this build to Jennifer when its ready. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=19205774 Thanks, Himanshu
RPMS for test kernel referred to in comment # 40 available for download at: http://people.redhat.com/emilne/RPMS/.bz1613543_for_netapp/ The -debuginfo RPM is only present should crash analysis be necessary, you do not need to install it initially for testing. Please advise when download is complete so we can free up the space.
Himanshu, Here are the details of my config: 1) I have two arrays in my config. 1 x E5600 that is running 8.40 FW and 1 x E2800 that is running 11.50 FW. The array with my SANboot LUN is the E2800 running 11.50 FW. 2) Here is the FW version when I am not able to see any volumes: 1 x QLE2692 (FW:v8.08.03 DVR:v10.00.00.06.07.6-k) and 1 x QLE2742 (FW:v8.08.03 DVR:v10.00.00.06.07.6-k). Here is the FW version when I am able to see and install to my SANboot LUN: 1 x QLE2692 (FW:v8.07.80 DVR:v10.00.00.06.07.6-k) and 1 x QLE2742 (FW:v8.07.80 DVR:v10.00.00.06.07.6-k). I'm not quite sure what you mean by ISP number. 3) I have two switches in my config. 1 x Brocade G620 running v8.2.1a and 1 x Cisco 9148 running v8.3(1) I have finished downloading all four RPMs. Also, would I be installing all three RPMs excluding the -debuginfo RPM before updating my FW to 8.08.03 and updating my initramfs and see if you are able to boot from SANboot LUN?
I installed the three RPMs (all but the -debuginfo RPM) and updated my FW to 8.08.03. I tried updating my initramfs by running update-initramfs, but it said that the command wasn't found. I thought I had a package missing so I attempted to install the initramfs-tools package, but it said that package was not found. Since I couldn't seem to update my initramfs, I went ahead with a reboot and tried to boot into the new kernel, but it entered emergency mode. What does updating my initramfs do? Is this the cause of why my host boot into emergency mode? What should my next steps be? Thanks, Jennifer Duong
It should not have been necessary to rebuilt the initramfs, the installation of the RPMs should have done that. Can you attach the console output of the boot when it entered emergency mode?
It looks like I'm able to boot into kernel 3.10.0-957.el7.bz1613543.x86_64 with FW:v8.08.03 DVR:v10.00.99.06.07.6-k on my fabric-connect server, but not my direct-connect server. My direct-connect system boots into emergency mode as stated in my previous comment. Thanks, Jennifer Duong
Created attachment 1507740 [details] emergency mode
Hi Jennifer, from the screen shot of the emergency mode. it does not look like qla2xxx driver is causing boot to fail. We would have to look at SOS report to find out why direct connect system went into Emergency mode. also for your Fabric connect server, Can you no issues with the provided kernel. There is a known issue with direct connection and for that I will need to pull some additional patches. We have queued patches for RH77 submission which addresses Direct connect. if you confirm your fabric connection is okay with this kernel. I can provide you another kernel with Direct connect fixes. (In reply to jennifer.duong from comment #45) > It looks like I'm able to boot into kernel 3.10.0-957.el7.bz1613543.x86_64 > with FW:v8.08.03 DVR:v10.00.99.06.07.6-k on my fabric-connect server, but > not my direct-connect server. My direct-connect system boots into emergency > mode as stated in my previous comment. > > Thanks, > > Jennifer Duong Thanks, Himanshu
Himanshu, I'm able to boot into the host with kernel 3.10.0-957.el7.bz1613543.x86_64 and FW:v8.08.03 DVR:v10.00.99.06.07.6-k, but haven't done any testing outside of that. Thanks, Jennifer Duong
Himanshu, I ran through my automated tests and it looks like kernel 3.10.0-957.el7.bz1613543.x86_64 running FW:v8.08.03 DVR:v10.00.99.06.07.6-k on my Qlogic fabric-connect system works properly. Do you have a status on the kernel with the Qlogic direct-connect fixes? Thanks, Jennifer Duong
Hi Jenifer, I've built kernel with n2n fixes. Please let me know the result once you have received the kernel. Note: These patches are going to be part of RH77 inbox driver. Hi Ewan, can you please make this build available to Jennifer for validating N2N (direct-connect) configuration. https://brewweb.devel.redhat.com/taskinfo?taskID=19486813 Thanks, Himanshu
RPMS for test kernel referred to in comment # 50 available for download at: http://people.redhat.com/emilne/RPMS/.bz1613543_for_netapp/ Please note that this is the kernel with the N2N changes, and has a different name, i.e. the RPMs are named like: kernel-3.10.0-957.el7.bz1613543.n2n.x86_64.rpm etc. I've left the earlier RPMs there for the time being, let me know if you no longer need them. The -debuginfo RPM is only present should crash analysis be necessary, you do not need to install it initially for testing. Please advise when download is complete so we can free up the space.
Ewan, I tried booting into kernel-3.10.0-957.el7.bz1613543.n2n.x86_64.rpm with FW:v8.08.03 DVR:v10.00.22.06.07.6-k and it boot into emergency mode. Thanks, Jennifer
Hi Ewan, FYI, Here's list of patches that are part of n2n kernel * scsi: qla2xxx: Save frame payload size from ICB * scsi: qla2xxx: Fix race between switch cmd completion and timeout * scsi: qla2xxx: Fix Management Server NPort handle reservation logic * scsi: qla2xxx: Flush mailbox commands on chip reset * scsi: qla2xxx: Fix session state stuck in Get Port DB * scsi: qla2xxx: Fix redundant fc_rport registration * scsi: qla2xxx: Silent erroneous message * scsi: qla2xxx: Prevent sysfs access when chip is down * scsi: qla2xxx: Add longer window for chip reset * scsi: qla2xxx: Fix N2N link re-connect * scsi: qla2xxx: Cleanup for N2N code Thanks, Himanshu
So, this is with regular qla2xxx FC SCSI, correct? This is not an NVMe target? The screenshot from the emergency mode looks like the boot device was not found. I assume from the earlier comments that the boot device was on the SAN. Is is possible to provide output from a serial console attached to the system as opposed to just a screenshot of the video? We can't debug this with the boot messages.
Yes, this is with regular qla2xxx FC SCSI and not an NVMe target. Is this what you mean by the output of the serial console? I will be uploading it shortly.
Created attachment 1515180 [details] serial console kernel-3.10.0-957.el7.bz1613543.n2n.x86_64.rpm
Yes, that is the kind of serial output we need. Unfortunately the file you attached does not appear to contain the output from the time period of the actual boot failure. It ends with: 12/17/18 17:08:39: [ 281.493586] ata1: exception Emask 0x50 SAct 0x0 SErr 0x40d0800 action 0xe frozen 12/17/18 17:08:39: [ 281.501862] ata1: irq_stat 0x00400040, connection status changed 12/17/18 17:08:39: [ 281.508577] ata1: SError: { HostInt PHYRdyChg CommWake 10B8B DevExch } 12/17/18 17:08:39: [ 281.515865] ata1: hard resetting link 12/17/18 17:08:40: [ 282.243029] ata1: SATA link down (SStatus 0 SControl 300) 12/17/18 17:08:45: [ 287.248993] ata1: hard resetting link 12/17/18 17:08:45: [ 287.557995] ata1: SATA link down (SStatus 0 SControl 300) 12/17/18 17:08:50: [ 292.563966] ata1: hard resetting link 12/17/18 17:08:51: [ 292.872958] ata1: SATA link down (SStatus 0 SControl 300) 12/17/18 17:08:51: [ 292.878994] ata1.00: disabled 12/17/18 17:08:51: [ 292.882323] ata1: EH complete 12/17/18 17:08:51: [ 292.884951] sd 0:0:0:0: rejecting I/O to offline device 12/17/18 17:08:51: [ 292.884955] sd 0:0:0:0: [sda] killing request 12/17/18 17:08:51: [ 292.884976] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK 12/17/18 17:08:51: [ 292.884981] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 57 2c 2a e8 00 01 c0 00 12/17/18 17:08:51: [ 292.884985] blk_update_request: I/O error, dev sda, sector 1462512360 12/17/18 17:08:51: [ 292.885014] Buffer I/O error on dev dm-0, logical block 182685533, lost async page write 12/17/18 17:08:51: [ 292.885019] Buffer I/O error on dev dm-0, logical block 182685534, lost async page write 12/17/18 17:08:51: [ 292.885022] Buffer I/O error on dev dm-0, logical block 182685535, lost async page write 12/17/18 17:08:51: [ 292.885024] Buffer I/O error on dev dm-0, logical block 182685536, lost async page write 12/17/18 17:08:51: [ 292.885026] Buffer I/O error on dev dm-0, logical block 182685537, lost async page write 12/17/18 17:08:51: [ 292.885029] Buffer I/O error on dev dm-0, logical block 182685538, lost async page write 12/17/18 17:08:51: [ 292.885031] Buffer I/O error on dev dm-0, logical block 182685539, lost async page write 12/17/18 17:08:51: [ 292.885034] Buffer I/O error on dev dm-0, logical block 182685540, lost async page write 12/17/18 17:08:51: [ 292.885036] Buffer I/O error on dev dm-0, logical block 182685541, lost async page write 12/17/18 17:08:51: [ 292.885038] Buffer I/O error on dev dm-0, logical block 182685542, lost async page write 12/17/18 17:08:51: [ 293.010599] ata1.00: detaching (SCSI 0:0:0:0) 12/17/18 17:08:51: [ 293.026870] sd 0:0:0:0: [sda] Stopping disk 12/17/18 17:08:51: [ 293.031789] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK 12/17/18 17:08:52: [ 294.056032] XFS (dm-0): metadata I/O error: block 0x39f371c4 ("xlog_iodone") error 5 numblks 64 12/17/18 17:08:52: [ 294.056140] XFS (dm-0): metadata I/O error: block 0x3a3df420 ("xfs_buf_iodone_callback_error") error 5 numblks 32 12/17/18 17:08:52: [ 294.077216] XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1221 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc0420c30 12/17/18 17:08:52: [ 294.090908] XFS (dm-0): Log I/O Error Detected. Shutting down filesystem 12/17/18 17:08:52: [ 294.098487] XFS (dm-0): Please umount the filesystem and rectify the problem(s) 12/17/18 17:08:52: [ 294.105948] XFS (dm-0): Failing async write on buffer block 0x56f7fd20. Retrying async write. 12/17/18 17:08:56: [ 298.746849] Core dump to |/usr/libexec/abrt-hook-ccpp 6 0 10991 0 0 1545088136 e 10991 11033 ictm1608s02h4.ict.englab.netapp.com pipe failed 12/17/18 17:08:56: [ 298.750581] Core dump to |/usr/libexec/abrt-hook-ccpp 6 0 8011 0 0 1545088136 e 8011 8236 ictm1608s02h4.ict.englab.netapp.com pipe failed and then there are some BIOS boot messages, so it looks like the machine is being either power cycled or rebooted. The last output is: 12/17/18 17:11:56: ^[[8;1HBooting^[[8;9Hfrom^[[8;14HHard^[[8;19Hdrive^[[8;25HC:^[[9;1H.^[[10;1H^[[?25h 12/17/18 17:11:56: ^M 12/17/18 17:11:56: What we would like to see are the messages leading up to the output on the earlier screen capture you attached that might show us why it was able to load the Linux kernel, but was unable to find the root and swap device later in the boot process. --- It seems based on your earlier information that you could install and boot from SAN successfully with the earlier 8.07.80 HBA firmware but not the 8.08.03 firmware so this would seem to be either a firmware issue or a case where the newer firmware needs driver changes as well, so what I am trying to understand is how you were able to connect to the array in the first place if you were booting from SAN, but not later. Both the arrays are E-series arrays, correct? (We have one arriving here soon for testing.)
Ewan, I initially had the OS installed on the local hard drive and was booting from there with all my connections to the controllers disconnected so that it wouldn't try to SANboot. I had to upgrade the firmware back to 08.08.03 and once that was complete I connected all of the controller connections and disconnected the hard drive. From there I had to reboot the server so that it could try to SANboot with the 08.08.03 FW and kernel-3.10.0-957.el7.bz1613543.n2n.x86_64. Shouldn't messages looking like the machine is being power cycled or rebooted be expected? How would I be able to test out kernel-3.10.0-957.el7.bz1613543.n2n without power cycling my server or rebooting it? I checked my serial logs and what I provided you, from what I can tell, is the instance when my host boot into emergency mode. The only logs after that is this: 12/17/18 17:20:50: [?25l[1;1H [1;9H [1;14H [1;20H [2;1H [2;10H [2;18H [2;27H [2;31H [0m[30;47m[4;1H Red Hat Enterprise Linux Server (3.10.0-957.el7.bz1613543.n2n.x86_64) 7. [0m[37;40m[5;7HRed[5;11HHat[5;15HEnterprise[5;26HLinux[5;32HServer[5;39H(3.10.0-957.el7.bz1613543.x86_64)[5;73H7.6[5;77H(M [6;1H Red Hat Enterprise Linux Server[6;39H(3.10.0-957.el7.x86_64)[6;63H7.6[6;67H(Maipo)[7;7HRed[7;11HHat[7;15HEnterprise[7;26HLinux[7;32HServer[7;39H(0-rescue-fb8addbcbf17439786d9ecdce2d202 [8;1H [8;9H [8;14H [8;19H [8;25H [9;1H [21;7HUse[21;11Hthe[21;15H [21;17Hand[21;21H [21;23Hkeys[21;28Hto[21;31Hchange[21;38Hthe[21;42Hselection.[22;7HPress[22;13H'e'[22;17Hto[22;20Hedit[22;25Hthe[22;29Hselected[22;38Hitem,[22;44Hor[22;47H'c'[22;51Hfor[22;55Ha[22;57Hcommand[22;65Hprompt.[23;4HThe[23;8Hselected[23;17Hentry[23;23Hwill[23;28Hbe[23;31Hstarted[23;39Hautomatically[23;53Hin[23;56H5s.[23;56H4[23;56H3[23;56H2[23;56H1[4;1H [5;7H [5;11H [5;15H [5;26H [5;32H [5;39H [5;73H [5;77H [6;7H [6;11H [6;15H [6;26H [6;32H [6;39H [6;63H [6;67H [7;7H [7;11H [7;15H [7;26H [7;32H [7;39H [21;7H [21;11H [21;15H [21;17H [21;21H [21;23H [21;28H [21;31H [21;38H [21;42H [22;7H [22;13H [22;17H [22;20H [22;25H [22;29H [22;38H [22;44H [22;47H [22;51H [22;55H [22;57H [22;65H [23;4H [23;8H [23;17H [23;23H [23;28H [23;31H [23;39H [23;53H [23;56H [1;1H[?25h[0;37;40m[2J[H When I tried to boot into the n2n kernel, it listed all the kernels I can boot into and when I select that particular one, it tries to load it and looks like it's about to finish but then enters emergency mode. And yes, the arrays are E-Series arrays. Thanks, Jennifer
Hi Jennifer, I want to rule out issue of qla2xxx in your direct connect setup. I want you to confirm if UEFI driver sees FC luns or not. Here's instructions i received from our UEFI developer for the instruction on how to verify FC LUNs are seen. --------- <snip> --------- The UEFI Shell will let you see what Luns were discovered by the UEFI driver. HPE servers have a built-in UEFI shell. Attached is a shell executable that will work on other servers. Go to the server setup screens and look for an option to run a UEFI application. This option will let you run the attached shell. Once the shell is running, use the "map" command to see if the FC Luns were mapped. FC Luns will have Fibre(WWPN, LUN) in their path name. --------- </snip> ---------- I am attaching shell executable in case you are using other than HP Server. Can you capture this information. Thanks, Himanshu
Created attachment 1515425 [details] Shell Executable for UEFI shell
Hello Himanshu and Ewan, I have got a customer who seems to face the same issue as defined in this bugzilla. Below I am providing all the information. Support Case : 02276650 ======================= Issue : ======= After updating the kernel to 3.10.0-957.1.3.el7.x86_64, server is no more able to see LUN's coming from Qlogic HBA. However when booted with Older kernel 3.10.0-862.el7.x86_64, server is able to detect all the LUN's. Qlogic Adapter Details from sosreport ===================================== ============== fenacosrv92151 <====> [Hostname of server] ============== All 4 Qlogic cards are exactly same. $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. ***Even the subsystem vendor and device id are same. $ grep Fibre -A1 lspci 2f:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01) Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- -- 2f:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01) Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- -- 58:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01) Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- -- 58:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01) Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- **** All 4 are Dual-port HBA. $ grep QLogic lspci|grep HBA Product Name: QLogic 16Gb FC Dual-port HBA Product Name: QLogic 16Gb FC Dual-port HBA Product Name: QLogic 16Gb FC Dual-port HBA Product Name: QLogic 16Gb FC Dual-port HBA *** Firmware versions: fw=8.08.05 [Actual Firmware version 1.90.53] as per XClarity Controller's firmware web page $ grep QLE -A1 sos_commands/logs/journalctl_--no-pager_--catalog_--boot Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ host#=15 fw=8.08.05 (d0d5). -- Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ host#=16 fw=8.08.05 (d0d5). -- Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ host#=17 fw=8.08.05 (d0d5). -- Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.1 hdma+ host#=18 fw=8.08.05 (d0d5). ------------------------- 3.10.0-957.1.3.el7.x86_64 ------------------------- No disks are getting detected and we can see Qlogic 'Abort' messages along with 'TECH PREVIEW' message. $ grep 'tech preview' -i sos_commands/logs/journalctl_--no-pager_--catalog_--boot Dec 18 09:20:24 fenacosrv92151.main.corp.fenaco.com kernel: TECH PREVIEW: NVMe over FC may not be fully supported. $ grep Abort sos_commands/logs/journalctl_--no-pager_--catalog_--boot Dec 18 09:20:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-801c:16: Abort command issued nexus=16:0:0 -- 0 2003. Dec 18 09:20:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-801c:17: Abort command issued nexus=17:0:0 -- 0 2003. Dec 18 09:21:02 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-801c:16: Abort command issued nexus=16:0:0 -- 1 2002. $ grep fc-nvme -i sos_commands/logs/journalctl_--no-pager_--catalog_--boot Dec 18 09:20:23 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-d302:15: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) Dec 18 09:20:25 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-d302:16: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) Dec 18 09:20:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-d302:17: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) Dec 18 09:20:29 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.1]-d302:18: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) Dec 18 09:20:51 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-d302:16: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) Dec 18 09:20:53 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-d302:17: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) --------------------- 3.10.0-862.el7.x86_64 --------------------- When same system is booted with Older RHEL7.5 GA kernel, it detects all disks coming via Qlogic HBA. $ grep 'scsi host' sos_commands/logs/journalctl_--no-pager_--catalog_--boot|grep qla Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: scsi host15: qla2xxx Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: scsi host16: qla2xxx Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: scsi host17: qla2xxx Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: scsi host18: qla2xxx Disks are coming only from scsi host 15 and scsi host 18. $ cat sos_commands/multipath/multipath_-l mpathc (360060e80221598005041159800000447) dm-4 HITACHI ,OPEN-V size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:1 sdc 8:32 active undef running `- 18:0:0:1 sdf 8:80 active undef running mpathb (360060e80221598005041159800000448) dm-5 HITACHI ,OPEN-V size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:2 sdd 8:48 active undef running `- 18:0:0:2 sdg 8:96 active undef running mpatha (360060e80221598005041159800000446) dm-3 HITACHI ,OPEN-V size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:0 sdb 8:16 active undef running `- 18:0:0:0 sde 8:64 active undef running In Addition, we do not see any 'TECH PREVIEW' message for FC over NVME nor any 'Abort' message. ====================== WORKAROUND IDENTIFIED: ====================== -> Applied QLogic parameter 'ql2xnvmeenable' set to 0. # cat > /etc/modprobe.d/qla2xxx.conf options qla2xxx ql2xnvmeenable=0 -> Rebuild initramfs for 3.10.0-957.1.3.el7.x86_64 and rebooted the server, it was able to detect all the disks coming from Qlogic HBA. Observation: ----------- -> After disabling NVME support for Qlogic with above parameter, I did not see 'Abort' message neither 'TECH PREVIEW' message. =================== ANOTHER WORKAROUND: =================== -> Customer was using one more system with 3.10.0-957.1.3.el7.x86_64 having exact same model of QLogic Adapter [ QLogic QLE2692 ] where disks were getting detected without any modification to 'qla2xxx' parameters. -> That system was not showing any 'Abort' message nor it was showing any 'TECH PREVIEW' message. -> I figured out that firmware version of QLOGIC HBA was lower on that server fw=8.05.63 [Actual Firmware version 1.90.43] as per XClarity Controller's firmware web page $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot -A1 Dec 11 08:39:47 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fb:14: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 11 08:39:47 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fc:14: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ host#=14 fw=8.05.63 (d0d5). -- Dec 11 08:39:49 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 11 08:39:49 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ host#=16 fw=8.05.63 (d0d5). -- Dec 11 08:39:51 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fb:17: QLogic QLE2690 - QLogic 16Gb FC Single-port HBA. Dec 11 08:39:51 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ host#=17 fw=8.05.63 (d0d5). -- Dec 11 08:39:53 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:af:00.0]-00fb:18: QLogic QLE2690 - QLogic 16Gb FC Single-port HBA. Dec 11 08:39:53 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx [0000:af:00.0]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:af:00.0 hdma+ host#=18 fw=8.05.63 (d0d5). -> I asked Customer to downgrade the firmware on problematic server to 8.05.63 or '1.90.43' as per LENOVO THINKSYSTEM SR650 Drivers page and it worked. https://datacentersupport.lenovo.com/in/en/downloads/DS501286 -> All disks are getting detected with fw=8.05.63 and without applying any parameter modification on qla2xxx module. ============== fenacosrv92151 ============== AFTER FIRMWARE DOWNGRADE ------------------------ fw=8.05.63 $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot -A1 Dec 21 15:38:22 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 21 15:38:22 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ host#=15 fw=8.05.63 (d0d5). -- Dec 21 15:38:24 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 21 15:38:24 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ host#=16 fw=8.05.63 (d0d5). -- Dec 21 15:38:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 21 15:38:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ host#=17 fw=8.05.63 (d0d5). -- Dec 21 15:38:28 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. Dec 21 15:38:28 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx [0000:58:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.1 hdma+ host#=18 fw=8.05.63 (d0d5). $ cat sys/module/qla2xxx/parameters/ql2xnvmeenable 1 $ cat sos_commands/multipath/multipath_-l mpathc (360060e80221598005041159800000447) dm-3 HITACHI ,OPEN-V size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 18:0:0:1 sdf 8:80 active undef running `- 15:0:0:1 sdc 8:32 active undef running mpathb (360060e80221598005041159800000448) dm-5 HITACHI ,OPEN-V size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:2 sdd 8:48 active undef running `- 18:0:0:2 sdg 8:96 active undef running mpatha (360060e80221598005041159800000446) dm-4 HITACHI ,OPEN-V size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:0 sdb 8:16 active undef running `- 18:0:0:0 sde 8:64 active undef running Also I do not observe any 'Abort' message or 'TECH PREVIEW' message QUESTION: ========= -> Here I see 2 workaround o Downgrade the firmware of QLogic HBA o set nvme support to 0 in qla2xxx parameter using 'ql2xnvmeenable=0' -> Does this needs fix in qla2xxx kernel module or does this needs fix in QLogic firmware? Thanks,
Another customer (SFDC#02281410) has reported similar issues in LUN discovery while using following Qlogic adapters with kernel-3.10.0-957 [50-sosreport-loraakbp11-02281410-2018-12-26-xzctxsu]$ less var/log/dmesg |grep -i qla2|grep fw [ 7.200130] qla2xxx [0000:86:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:86:00.0 hdma+ host#=15 fw=8.08.03 (d0d5). [ 9.160122] qla2xxx [0000:86:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:86:00.1 hdma+ host#=16 fw=8.08.03 (d0d5). [ 11.280063] qla2xxx [0000:87:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:87:00.0 hdma+ host#=17 fw=8.08.03 (d0d5). [ 13.235063] qla2xxx [0000:87:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:87:00.1 hdma+ host#=18 fw=8.08.03 (d0d5). Previously the customer was using kernel-3.10.0-862.14.4.el7.x86_64 and the HITACHI OPEN-V LUNs were visible through all the 4 HBAs. But after the upgrade to 957 kernel, the LUNs were visible through only one of the above HBAs. We had then disabled the ql2xnvmeenable option for qla2xxx module, rebuild the initial ram disk image for 957 kernel and the issues in LUN discovery were fixed.
Himanshu is a partner engineer from Cavium and cannot see private comments. Please verify with your customers that the information in comment # 61 and comment # 62 can be shared with the HBA vendor, and then un-check the private comment field.
Resetting needinfo from comment # 59.
Hi Himanshu, Kindly check #61 and #62 for detailed information from 2 reported cases. Thanks,
Hello Vishal, Are these customer using Fabric mode or direct attached? (In reply to vishal agrawal from comment #61) > Hello Himanshu and Ewan, > > I have got a customer who seems to face the same issue as defined > in this bugzilla. Below I am providing all the information. > > Support Case : 02276650 > ======================= > > Issue : > ======= > > After updating the kernel to 3.10.0-957.1.3.el7.x86_64, server is no more > able to > see LUN's coming from Qlogic HBA. > > However when booted with Older kernel 3.10.0-862.el7.x86_64, server is able > to detect > all the LUN's. > > Qlogic Adapter Details from sosreport > ===================================== > > ============== > fenacosrv92151 <====> [Hostname of server] > ============== > > All 4 Qlogic cards are exactly same. > > $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > ***Even the subsystem vendor and device id are same. > > $ grep Fibre -A1 lspci > 2f:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > Channel to PCIe Adapter [1077:2261] (rev 01) > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > -- > 2f:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > Channel to PCIe Adapter [1077:2261] (rev 01) > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > -- > 58:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > Channel to PCIe Adapter [1077:2261] (rev 01) > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > -- > 58:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > Channel to PCIe Adapter [1077:2261] (rev 01) > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > > **** All 4 are Dual-port HBA. > > $ grep QLogic lspci|grep HBA > Product Name: QLogic 16Gb FC Dual-port HBA > Product Name: QLogic 16Gb FC Dual-port HBA > Product Name: QLogic 16Gb FC Dual-port HBA > Product Name: QLogic 16Gb FC Dual-port HBA > > *** Firmware versions: > > fw=8.08.05 [Actual Firmware version 1.90.53] as per XClarity Controller's > firmware web page > > $ grep QLE -A1 sos_commands/logs/journalctl_--no-pager_--catalog_--boot > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ > host#=15 fw=8.08.05 (d0d5). > -- > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ > host#=16 fw=8.08.05 (d0d5). > -- > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ > host#=17 fw=8.08.05 (d0d5). > -- > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.1 hdma+ > host#=18 fw=8.08.05 (d0d5). > > ------------------------- > 3.10.0-957.1.3.el7.x86_64 > ------------------------- > > No disks are getting detected and we can see Qlogic 'Abort' messages along > with 'TECH PREVIEW' message. > > $ grep 'tech preview' -i > sos_commands/logs/journalctl_--no-pager_--catalog_--boot > Dec 18 09:20:24 fenacosrv92151.main.corp.fenaco.com kernel: TECH PREVIEW: > NVMe over FC may not be fully supported. > > $ grep Abort sos_commands/logs/journalctl_--no-pager_--catalog_--boot > Dec 18 09:20:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-801c:16: Abort command issued nexus=16:0:0 -- 0 2003. > Dec 18 09:20:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-801c:17: Abort command issued nexus=17:0:0 -- 0 2003. > Dec 18 09:21:02 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-801c:16: Abort command issued nexus=16:0:0 -- 1 2002. > > $ grep fc-nvme -i sos_commands/logs/journalctl_--no-pager_--catalog_--boot > Dec 18 09:20:23 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-d302:15: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > Dec 18 09:20:25 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-d302:16: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > Dec 18 09:20:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-d302:17: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > Dec 18 09:20:29 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.1]-d302:18: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > Dec 18 09:20:51 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-d302:16: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > Dec 18 09:20:53 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-d302:17: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > --------------------- > 3.10.0-862.el7.x86_64 > --------------------- > > When same system is booted with Older RHEL7.5 GA kernel, it detects all > disks coming via Qlogic HBA. > > $ grep 'scsi host' > sos_commands/logs/journalctl_--no-pager_--catalog_--boot|grep qla > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: scsi host15: > qla2xxx > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: scsi host16: > qla2xxx > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: scsi host17: > qla2xxx > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: scsi host18: > qla2xxx > > Disks are coming only from scsi host 15 and scsi host 18. > > $ cat sos_commands/multipath/multipath_-l > mpathc (360060e80221598005041159800000447) dm-4 HITACHI ,OPEN-V > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:1 sdc 8:32 active undef running > `- 18:0:0:1 sdf 8:80 active undef running > mpathb (360060e80221598005041159800000448) dm-5 HITACHI ,OPEN-V > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:2 sdd 8:48 active undef running > `- 18:0:0:2 sdg 8:96 active undef running > mpatha (360060e80221598005041159800000446) dm-3 HITACHI ,OPEN-V > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:0 sdb 8:16 active undef running > `- 18:0:0:0 sde 8:64 active undef running > > In Addition, we do not see any 'TECH PREVIEW' message for FC over NVME nor > any > 'Abort' message. > > ====================== > WORKAROUND IDENTIFIED: > ====================== > > -> Applied QLogic parameter 'ql2xnvmeenable' set to 0. > > # cat > /etc/modprobe.d/qla2xxx.conf > options qla2xxx ql2xnvmeenable=0 > > -> Rebuild initramfs for 3.10.0-957.1.3.el7.x86_64 and rebooted the server, > it was able to > detect all the disks coming from Qlogic HBA. > > Observation: > ----------- > > -> After disabling NVME support for Qlogic with above parameter, I did not > see 'Abort' message > neither 'TECH PREVIEW' message. > > =================== > ANOTHER WORKAROUND: > =================== > > -> Customer was using one more system with 3.10.0-957.1.3.el7.x86_64 having > exact same model of QLogic Adapter [ QLogic QLE2692 ] > where disks were getting detected without any modification to 'qla2xxx' > parameters. > > -> That system was not showing any 'Abort' message nor it was showing any > 'TECH PREVIEW' message. > > -> I figured out that firmware version of QLOGIC HBA was lower on that > server > > fw=8.05.63 [Actual Firmware version 1.90.43] as per XClarity Controller's > firmware web page > > $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot -A1 > Dec 11 08:39:47 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fb:14: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 11 08:39:47 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fc:14: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ > host#=14 fw=8.05.63 (d0d5). > -- > Dec 11 08:39:49 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 11 08:39:49 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ > host#=16 fw=8.05.63 (d0d5). > -- > Dec 11 08:39:51 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fb:17: QLogic QLE2690 - QLogic 16Gb FC Single-port HBA. > Dec 11 08:39:51 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ > host#=17 fw=8.05.63 (d0d5). > -- > Dec 11 08:39:53 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:af:00.0]-00fb:18: QLogic QLE2690 - QLogic 16Gb FC Single-port HBA. > Dec 11 08:39:53 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > [0000:af:00.0]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:af:00.0 hdma+ > host#=18 fw=8.05.63 (d0d5). > > -> I asked Customer to downgrade the firmware on problematic server to > 8.05.63 or '1.90.43' as per LENOVO THINKSYSTEM SR650 Drivers page and it > worked. > > https://datacentersupport.lenovo.com/in/en/downloads/DS501286 > > -> All disks are getting detected with fw=8.05.63 and without applying any > parameter modification on qla2xxx module. > > ============== > fenacosrv92151 > ============== > > AFTER FIRMWARE DOWNGRADE > ------------------------ > > fw=8.05.63 > > $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot -A1 > Dec 21 15:38:22 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 21 15:38:22 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ > host#=15 fw=8.05.63 (d0d5). > -- > Dec 21 15:38:24 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 21 15:38:24 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ > host#=16 fw=8.05.63 (d0d5). > -- > Dec 21 15:38:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 21 15:38:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ > host#=17 fw=8.05.63 (d0d5). > -- > Dec 21 15:38:28 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > Dec 21 15:38:28 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > [0000:58:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.1 hdma+ > host#=18 fw=8.05.63 (d0d5). > > $ cat sys/module/qla2xxx/parameters/ql2xnvmeenable > 1 > > $ cat sos_commands/multipath/multipath_-l > mpathc (360060e80221598005041159800000447) dm-3 HITACHI ,OPEN-V > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 18:0:0:1 sdf 8:80 active undef running > `- 15:0:0:1 sdc 8:32 active undef running > mpathb (360060e80221598005041159800000448) dm-5 HITACHI ,OPEN-V > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:2 sdd 8:48 active undef running > `- 18:0:0:2 sdg 8:96 active undef running > mpatha (360060e80221598005041159800000446) dm-4 HITACHI ,OPEN-V > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:0 sdb 8:16 active undef running > `- 18:0:0:0 sde 8:64 active undef running > > Also I do not observe any 'Abort' message or 'TECH PREVIEW' message > > QUESTION: > ========= > > -> Here I see 2 workaround > > o Downgrade the firmware of QLogic HBA > o set nvme support to 0 in qla2xxx parameter using 'ql2xnvmeenable=0' > > -> Does this needs fix in qla2xxx kernel module or does this needs fix in > QLogic firmware? > We have identified patches that has fixed this issue and are submitted as part of RH77 inbox. see comment #39 for the patches that were identified for Fabric mode connection. Reporter of this bugzilla confirmed that the issue with fabric connect was resolved in comment #49. > Thanks, Thanks, Himanshu
Hi Himanshu, >> Are these customer using Fabric mode or direct attached? Is this something which I can identify from sosreport or should I get this detail from customer directly. Do you also want me to share the test kernel's from comment #51? Thanks,
Hi Vishal, (In reply to vishal agrawal from comment #67) > Hi Himanshu, > > >> Are these customer using Fabric mode or direct attached? > > Is this something which I can identify from sosreport or should > I get this detail from customer directly. > I would want you to get detail topology information from customer before sharing any test kernel. I want to make sure their configuration before we share any test code. > Do you also want me to share the test kernel's from comment #51? > > Thanks, Thanks, Himanshu
Created attachment 1518039 [details] map
Himanshu, I have attached a screenshot of part of the output when running "map". Does that look correct to you? Thanks, Jennifer
Hi Jenifer, (In reply to jennifer.duong from comment #70) > Himanshu, > > I have attached a screenshot of part of the output when running "map". Does > that look correct to you? > > Thanks, > > Jennifer We were able to confirm that the information is good. We do see LUNs discovered by UEFI driver. Is it possible for you to capture FC trace to see why you are not able to boot into SAN Boot LUN after installation. Also, Note that these patches have been merged into RHEL 77 kernel. i'll find out if there is possibility of having ISO image that you can try and see if that makes any difference. Thanks, Himanshu
(In reply to Himanshu Madhani (Cavium) from comment #66) > Hello Vishal, > > Are these customer using Fabric mode or direct attached? > > (In reply to vishal agrawal from comment #61) > > Hello Himanshu and Ewan, > > > > I have got a customer who seems to face the same issue as defined > > in this bugzilla. Below I am providing all the information. > > > > Support Case : 02276650 > > ======================= > > > > Issue : > > ======= > > > > After updating the kernel to 3.10.0-957.1.3.el7.x86_64, server is no more > > able to > > see LUN's coming from Qlogic HBA. > > > > However when booted with Older kernel 3.10.0-862.el7.x86_64, server is able > > to detect > > all the LUN's. > > > > Qlogic Adapter Details from sosreport > > ===================================== > > > > ============== > > fenacosrv92151 <====> [Hostname of server] > > ============== > > > > All 4 Qlogic cards are exactly same. > > > > $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot > > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > > > ***Even the subsystem vendor and device id are same. > > > > $ grep Fibre -A1 lspci > > 2f:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > > Channel to PCIe Adapter [1077:2261] (rev 01) > > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > > -- > > 2f:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > > Channel to PCIe Adapter [1077:2261] (rev 01) > > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > > -- > > 58:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > > Channel to PCIe Adapter [1077:2261] (rev 01) > > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > > -- > > 58:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre > > Channel to PCIe Adapter [1077:2261] (rev 01) > > Subsystem: QLogic Corp. Device [1077:02af] <<<<<<====------<----- > > > > **** All 4 are Dual-port HBA. > > > > $ grep QLogic lspci|grep HBA > > Product Name: QLogic 16Gb FC Dual-port HBA > > Product Name: QLogic 16Gb FC Dual-port HBA > > Product Name: QLogic 16Gb FC Dual-port HBA > > Product Name: QLogic 16Gb FC Dual-port HBA > > > > *** Firmware versions: > > > > fw=8.08.05 [Actual Firmware version 1.90.53] as per XClarity Controller's > > firmware web page > > > > $ grep QLE -A1 sos_commands/logs/journalctl_--no-pager_--catalog_--boot > > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ > > host#=15 fw=8.08.05 (d0d5). > > -- > > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ > > host#=16 fw=8.08.05 (d0d5). > > -- > > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ > > host#=17 fw=8.08.05 (d0d5). > > -- > > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.1 hdma+ > > host#=18 fw=8.08.05 (d0d5). > > > > ------------------------- > > 3.10.0-957.1.3.el7.x86_64 > > ------------------------- > > > > No disks are getting detected and we can see Qlogic 'Abort' messages along > > with 'TECH PREVIEW' message. > > > > $ grep 'tech preview' -i > > sos_commands/logs/journalctl_--no-pager_--catalog_--boot > > Dec 18 09:20:24 fenacosrv92151.main.corp.fenaco.com kernel: TECH PREVIEW: > > NVMe over FC may not be fully supported. > > > > $ grep Abort sos_commands/logs/journalctl_--no-pager_--catalog_--boot > > Dec 18 09:20:47 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-801c:16: Abort command issued nexus=16:0:0 -- 0 2003. > > Dec 18 09:20:49 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-801c:17: Abort command issued nexus=17:0:0 -- 0 2003. > > Dec 18 09:21:02 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-801c:16: Abort command issued nexus=16:0:0 -- 1 2002. > > > > $ grep fc-nvme -i sos_commands/logs/journalctl_--no-pager_--catalog_--boot > > Dec 18 09:20:23 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-d302:15: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > Dec 18 09:20:25 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-d302:16: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > Dec 18 09:20:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-d302:17: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > Dec 18 09:20:29 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.1]-d302:18: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > Dec 18 09:20:51 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-d302:16: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > Dec 18 09:20:53 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-d302:17: qla2x00_get_fw_version: FC-NVMe is Enabled (0x3c58) > > > > --------------------- > > 3.10.0-862.el7.x86_64 > > --------------------- > > > > When same system is booted with Older RHEL7.5 GA kernel, it detects all > > disks coming via Qlogic HBA. > > > > $ grep 'scsi host' > > sos_commands/logs/journalctl_--no-pager_--catalog_--boot|grep qla > > Dec 18 10:32:43 fenacosrv92151.main.corp.fenaco.com kernel: scsi host15: > > qla2xxx > > Dec 18 10:32:45 fenacosrv92151.main.corp.fenaco.com kernel: scsi host16: > > qla2xxx > > Dec 18 10:32:47 fenacosrv92151.main.corp.fenaco.com kernel: scsi host17: > > qla2xxx > > Dec 18 10:32:49 fenacosrv92151.main.corp.fenaco.com kernel: scsi host18: > > qla2xxx > > > > Disks are coming only from scsi host 15 and scsi host 18. > > > > $ cat sos_commands/multipath/multipath_-l > > mpathc (360060e80221598005041159800000447) dm-4 HITACHI ,OPEN-V > > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:1 sdc 8:32 active undef running > > `- 18:0:0:1 sdf 8:80 active undef running > > mpathb (360060e80221598005041159800000448) dm-5 HITACHI ,OPEN-V > > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:2 sdd 8:48 active undef running > > `- 18:0:0:2 sdg 8:96 active undef running > > mpatha (360060e80221598005041159800000446) dm-3 HITACHI ,OPEN-V > > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:0 sdb 8:16 active undef running > > `- 18:0:0:0 sde 8:64 active undef running > > > > In Addition, we do not see any 'TECH PREVIEW' message for FC over NVME nor > > any > > 'Abort' message. > > > > ====================== > > WORKAROUND IDENTIFIED: > > ====================== > > > > -> Applied QLogic parameter 'ql2xnvmeenable' set to 0. > > > > # cat > /etc/modprobe.d/qla2xxx.conf > > options qla2xxx ql2xnvmeenable=0 > > > > -> Rebuild initramfs for 3.10.0-957.1.3.el7.x86_64 and rebooted the server, > > it was able to > > detect all the disks coming from Qlogic HBA. > > > > Observation: > > ----------- > > > > -> After disabling NVME support for Qlogic with above parameter, I did not > > see 'Abort' message > > neither 'TECH PREVIEW' message. > > > > =================== > > ANOTHER WORKAROUND: > > =================== > > > > -> Customer was using one more system with 3.10.0-957.1.3.el7.x86_64 having > > exact same model of QLogic Adapter [ QLogic QLE2692 ] > > where disks were getting detected without any modification to 'qla2xxx' > > parameters. > > > > -> That system was not showing any 'Abort' message nor it was showing any > > 'TECH PREVIEW' message. > > > > -> I figured out that firmware version of QLOGIC HBA was lower on that > > server > > > > fw=8.05.63 [Actual Firmware version 1.90.43] as per XClarity Controller's > > firmware web page > > > > $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot -A1 > > Dec 11 08:39:47 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fb:14: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 11 08:39:47 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fc:14: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ > > host#=14 fw=8.05.63 (d0d5). > > -- > > Dec 11 08:39:49 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 11 08:39:49 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ > > host#=16 fw=8.05.63 (d0d5). > > -- > > Dec 11 08:39:51 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fb:17: QLogic QLE2690 - QLogic 16Gb FC Single-port HBA. > > Dec 11 08:39:51 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ > > host#=17 fw=8.05.63 (d0d5). > > -- > > Dec 11 08:39:53 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:af:00.0]-00fb:18: QLogic QLE2690 - QLogic 16Gb FC Single-port HBA. > > Dec 11 08:39:53 fenacosrv92071.main.corp.fenaco.com kernel: qla2xxx > > [0000:af:00.0]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:af:00.0 hdma+ > > host#=18 fw=8.05.63 (d0d5). > > > > -> I asked Customer to downgrade the firmware on problematic server to > > 8.05.63 or '1.90.43' as per LENOVO THINKSYSTEM SR650 Drivers page and it > > worked. > > > > https://datacentersupport.lenovo.com/in/en/downloads/DS501286 > > > > -> All disks are getting detected with fw=8.05.63 and without applying any > > parameter modification on qla2xxx module. > > > > ============== > > fenacosrv92151 > > ============== > > > > AFTER FIRMWARE DOWNGRADE > > ------------------------ > > > > fw=8.05.63 > > > > $ grep QLE sos_commands/logs/journalctl_--no-pager_--catalog_--boot -A1 > > Dec 21 15:38:22 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fb:15: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 21 15:38:22 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.0]-00fc:15: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.0 hdma+ > > host#=15 fw=8.05.63 (d0d5). > > -- > > Dec 21 15:38:24 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fb:16: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 21 15:38:24 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:2f:00.1]-00fc:16: ISP2261: PCIe (8.0GT/s x8) @ 0000:2f:00.1 hdma+ > > host#=16 fw=8.05.63 (d0d5). > > -- > > Dec 21 15:38:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fb:17: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 21 15:38:27 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.0]-00fc:17: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.0 hdma+ > > host#=17 fw=8.05.63 (d0d5). > > -- > > Dec 21 15:38:28 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.1]-00fb:18: QLogic QLE2692 - QLogic 16Gb FC Dual-port HBA. > > Dec 21 15:38:28 fenacosrv92151.main.corp.fenaco.com kernel: qla2xxx > > [0000:58:00.1]-00fc:18: ISP2261: PCIe (8.0GT/s x8) @ 0000:58:00.1 hdma+ > > host#=18 fw=8.05.63 (d0d5). > > > > $ cat sys/module/qla2xxx/parameters/ql2xnvmeenable > > 1 > > > > $ cat sos_commands/multipath/multipath_-l > > mpathc (360060e80221598005041159800000447) dm-3 HITACHI ,OPEN-V > > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 18:0:0:1 sdf 8:80 active undef running > > `- 15:0:0:1 sdc 8:32 active undef running > > mpathb (360060e80221598005041159800000448) dm-5 HITACHI ,OPEN-V > > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:2 sdd 8:48 active undef running > > `- 18:0:0:2 sdg 8:96 active undef running > > mpatha (360060e80221598005041159800000446) dm-4 HITACHI ,OPEN-V > > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:0 sdb 8:16 active undef running > > `- 18:0:0:0 sde 8:64 active undef running > > > > Also I do not observe any 'Abort' message or 'TECH PREVIEW' message > > > > QUESTION: > > ========= > > > > -> Here I see 2 workaround > > > > o Downgrade the firmware of QLogic HBA > > o set nvme support to 0 in qla2xxx parameter using 'ql2xnvmeenable=0' > > > > -> Does this needs fix in qla2xxx kernel module or does this needs fix in > > QLogic firmware? > > > > We have identified patches that has fixed this issue and are submitted as > part of RH77 inbox. > > see comment #39 for the patches that were identified for Fabric mode > connection. > > Reporter of this bugzilla confirmed that the issue with fabric connect was > resolved in comment #49. > > > Thanks, > > Thanks, > Himanshu Hi Himanshu, my customer has confirmed that the storage LUNs are using 'FABRIC MODE' Do you want me to share test kernel with him now? Thanks, - Vishal Agrawal.
Himanshu, How do I capture the FC trace? Thanks, Jennifer
Himanshu, I am in the process of requesting an analyzer. Once that becomes available to me, I'll try and grab that FC trace for you.
Created attachment 1521610 [details] FC trace
Hi there, If there is anything else required from Jennifer, now that the FC trace is available, let us know and we will work to provide it. If there is test kernels, I can help provide them via my peoples page. Kind regards, /S
Himanshu, what should the next steps be?
Hi Jennifer, Looks like I did not see that FC trace was available. (I was on PTO on week of Jan 14-18 when these were uploaded so must have missed notification) Let me take a look at the trace and provide next step. Thanks, Himanshu
Hi Jennifer, Still going thru FC trace. nothing seems to stand out of ordinary. So need to take another look and see if there is any thing else that we can try on your setup. Thanks, Himanshu
Himanshu, have you had a chance to take another look at the FC trace?
I have a customer with the following that is hitting this or a similar issue (but the ql2xnvmeenable workaround doesn't help). Hardware: ProLiant DL380 Gen9 w/3 Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01) # --------- PCI ------------- # subsystem model model #scsi_addr vendor device vendor device name description #----------- ------ ------ ------ ------ ------------ -------------------------------------------------- 1:*:*:* 0x1077 0x2261 0x1590 0x00fa SN1100Q HPE SN1100Q 16Gb 2p FC HBA << tapes/changer 2:*:*:* 0x1077 0x2261 0x1590 0x00fa SN1100Q HPE SN1100Q 16Gb 2p FC HBA << tapes/changer 3:*:*:* 0x1077 0x2261 0x1590 0x00fa SN1100Q HPE SN1100Q 16Gb 2p FC HBA << disks 4:*:*:* 0x1077 0x2261 0x1590 0x00fa SN1100Q HPE SN1100Q 16Gb 2p FC HBA 5:*:*:* 0x1077 0x2261 0x1590 0x00fa SN1100Q HPE SN1100Q 16Gb 2p FC HBA << disks 6:*:*:* 0x1077 0x2261 0x1590 0x00fa SN1100Q HPE SN1100Q 16Gb 2p FC HBA firmware level: 8.07.18 (d0d5) Running 7.5, no issues with seeing disks. Running 7.6 (3.10.0-957.5.1.el7.x86_64) . 10.00.00.06.07.6-k (in-box driver) - doesn't see the IBM 2145 disks, but does see tapes and changers. - using ql2xnvmeenable didn't help . 8.08.00.08.07.5-k9 (from QLogic website) - disks being seen again. With inbox driver they only see tapes/changers: #scsi_addr Type Vendor Model Rev sdN #--------------- --------- ------- ----------------- ------ ------------- [0:0:0:0] raid HP P440ar 6.60 /dev/sg0 [0:0:1:0] disk HP EH0600JDXBC HPD5 /dev/sda [0:0:2:0] disk HP EH0600JDXBC HPD5 /dev/sdb [0:0:3:0] enclosure HP P440ar 6.60 /dev/sg3 [1:0:0:0] tape IBM 03592E08 481A /dev/st0 [2:0:0:0] tape IBM 03592E08 481A /dev/st1 [1:0:1:0] tape IBM 03592E08 481A /dev/st2 [2:0:1:0] tape IBM 03592E08 481A /dev/st3 [1:0:1:1] changer IBM 03584L22 F330 /dev/ch0 [2:0:1:1] changer IBM 03584L22 F330 /dev/ch1 Is this a different issue or the same as this one. Should I ask the customer to upgrade to 8.08.xx.xx firmware and then try setting ql2xnvmeenable off or should I open a new bugzilla since the ql2xnvmeenable workaround didn't help. Please advise.
Created attachment 1538584 [details] overview of storage when all devices are seen storage view of 7.6 system with v8 (out of box) driver and all storage ports, storage seen. This is more info on case where workaround did not help.
Created attachment 1538585 [details] overview of 7.6 with inbox driver, no disk luns seen storage view of 7.6 system with v10 (in-box) driver and tapes/changers seen attached to Cisco switch(s), but no IBM storage seen off of IBM(?) switch. Aka there is an assigned portid to the HBA, a Fabric wwn but no IBM 2145 storage ports listed as being logged into. This is more info on case where workaround did not help.
Hi David, (In reply to Dwight (Bud) Brown from comment #85) > Created attachment 1538585 [details] > overview of 7.6 with inbox driver, no disk luns seen > > storage view of 7.6 system with v10 (in-box) driver and tapes/changers seen > attached to Cisco switch(s), but no IBM storage seen off of IBM(?) switch. > Aka there is an assigned portid to the HBA, a Fabric wwn but no IBM 2145 > storage ports listed as being logged into. > > This is more info on case where workaround did not help. Can you try with RHEL7.6 Z stream kernel and see if the same issue is seen? Thanks, Himanshu
Hi Jennifer, (In reply to jennifer.duong from comment #81) > Himanshu, have you had a chance to take another look at the FC trace? Will have some update this week. I was on PTO last week so could not respond. Thanks, Himanshu
> Hi David, > Can you try with RHEL7.6 Z stream kernel and see if the same issue is seen? Himanshu, I'm not sure I understand, who's David?. Did you mean Dwight? Anyway, as noted in comment #82 " Running 7.6 (3.10.0-957.5.1.el7.x86_64) . 10.00.00.06.07.6-k (in-box driver) - doesn't see the IBM 2145 disks, but does see tapes and changers. - using ql2xnvmeenable didn't help . 8.08.00.08.07.5-k9 (from QLogic website) - disks being seen again. " So as you can see from the original comment, the customer is running latest shipped 7.6 (zstream) kernel. Are you aware of specific driver commits between the latest shipped 7.6 kernel and a future one still unreleased? Is that what you want the customer to test against? For example, since the last released kernel, there have been several brew builds on later kernels: [ 1] RHEL7.6.z: ( )3.10.0-957.8.1.el7 09-Jan-2019 [ 2] RHEL7.6.z: ( )3.10.0-957.7.1.el7 08-Jan-2019 [ 3] RHEL7.6.z: ( )3.10.0-957.6.1.el7 25-Dec-2018 << these and later are potential hotfix kernels, but not shipped. [ 4] RHEL7.6.z: (!)3.10.0-957.5.1.el7 19-Dec-2018 << last shipped Pulling the qla2xxx driver from 862-27.1 (latest 7.5z) into 957-5.1 results in the customer again being able to see his disks again under 7.6. Please advise.
(In reply to Dwight (Bud) Brown from comment #88) > > Hi David, > > Can you try with RHEL7.6 Z stream kernel and see if the same issue is seen? > > > Himanshu, > > I'm not sure I understand, who's David?. Did you mean Dwight? Anyway, as > noted in comment #82 > Sorry about mess up. I did meant Dwight. > " > Running 7.6 (3.10.0-957.5.1.el7.x86_64) > . 10.00.00.06.07.6-k (in-box driver) > - doesn't see the IBM 2145 disks, but does see tapes and changers. > - using ql2xnvmeenable didn't help > . 8.08.00.08.07.5-k9 (from QLogic website) > - disks being seen again. > " > > So as you can see from the original comment, the customer is running latest > shipped 7.6 (zstream) kernel. Are you aware of specific driver commits > between the latest shipped 7.6 kernel and a future one still unreleased? Is > that what you want the customer to test against? For example, since the > last released kernel, there have been several brew builds on later kernels: > > [ 1] RHEL7.6.z: ( )3.10.0-957.8.1.el7 09-Jan-2019 > [ 2] RHEL7.6.z: ( )3.10.0-957.7.1.el7 08-Jan-2019 > [ 3] RHEL7.6.z: ( )3.10.0-957.6.1.el7 25-Dec-2018 << these and later are > potential hotfix kernels, but not shipped. > [ 4] RHEL7.6.z: (!)3.10.0-957.5.1.el7 19-Dec-2018 << last shipped > We added a patch in RH7.6.z kernel version 3.10.0-957.7.1.el7. which helped recover path in a multi path env. we should try driver from that kernel to see if it helps > Pulling the qla2xxx driver from 862-27.1 (latest 7.5z) into 957-5.1 results > in the customer again being able to see his disks again under 7.6. > > > Please advise. Also, can you provide debug logs using ql2xextended_error_logging=1 with both working and non working case. (i.e. RH7.5.z driver in RHEL 7.6 and using RHEL7.6 inbox driver) Thanks, Himanshu
Created attachment 1539493 [details] messages from 957-5.1, all devices present afterwards case 02324311 - messages from 957-5.1, all devices present afterwards symptom in case, after tape library restart, devices do not return as they didn in 7.5
Created attachment 1539494 [details] messages 957-5.1 w/lip & debug after devices disappeared case 02324311 - messages 957-5.1 w/lip & debug after devices disappeared Attempt to manually rediscover missing devices via LIP, didn't work. Tape/changers still missing after lip.
new 957-5.1 test kernel with qla2xxx patch from 957-7.1 provided to two customers: customer #1 - looses tapes/changes after tape library restart and then never return in 7.6*, works fine in 7.5* as tapes/changes come back within 1 minute. customer #2 - upon boot of 7.6*, tapes/changes show up but no disks. 7.5 works fine, 7.6 w/driver from 7.5 works fine.
customer #1 - looses tapes/changes after tape library restart and then never return in 7.6*, works fine in 7.5* as tapes/changes come back within 1 minute. [No Change] : Test kernel with identified patch from 957-7.1 did not address the issue. Still when tape library is restarted, it never returns to configuration. I have a new messages file for this kernel which I will upload after I pull it down and unpack it. There is no workaround identified for this issue other than rebooting the production servers affected.
customer #2 - upon boot of 7.6*, tapes/changes show up but no disks. 7.5 works fine, 7.6 w/driver from 7.5 works fine. [No Change] : Test kernel with identified patch from 957-7.1 did not address the issue. The IBM disks are still not visible after boot with 7.6 kernels but are with 7.5 kernels. The customer provided new messages files for this kernle and I will uplaod them after I pull the files down and unpack them. There is no workaround identified for this issue other than downgrading to 7.5. The nvme option off does not change the issue. Running the v9 driver pulled from 7.5 into 7.6 kernels results in the issue not being seen.
Himanshu, do you happen to have an update on what my next steps should be?
Created attachment 1541498 [details] messages - 957.5.1.el7.QLAV10_7.1.2 customer #1 - looses tapes/changes after tape library restart and then never return in 7.6*, works fine in 7.5* as tapes/changes come back within 1 minute. Testing with kernel - 3.10.0-957.5.1.el7.QLAV10_7.1.2 case 02324311 - system still running with the library in the DEAD state, running lip.bsh again and uploaded the resulting messages log. The library returns about 80 seconds after being restarted, as it did with 3.10.0-862.14.4.el7.
Himanshu, do you have an update on this?
Hi Jennifer, (In reply to jennifer.duong from comment #95) > Himanshu, do you happen to have an update on what my next steps should be? Looks like this comment got lost in subsequent updates in this BZ. from FC trace i don't see anything out of ordinary. we do get response from PLOGI/PRLI. Given that you are not seeing issue in Fabric mode, We need to identify how we can verify this in a N2N mode. i'll update it tomorrow if there is anyway we can confirm this. I Will want you to verify this once we have RHEL77 ISO build as well. Thanks, Himanshu
I've another customer with issues since updating to 7.6 with 16G QLogic adapters, it appears that port discover stutters, that's the only way I know how to describe it. #scsi_addr name version f/w device #----------- ---------------------- ---------------------- ------------------------- ---------------------------------------------- 0:*:*:* megaraid_sas /sys/devices/pci0000:00/0000:00:02.2/0000:07:00.0/host0 1:*:*:* qla2xxx 7.05.04 (d0d5) /sys/devices/pci0000:00/0000:00:03.0/0000:11:00.0/host1 2:*:*:* qla2xxx 7.05.04 (d0d5) /sys/devices/pci0000:00/0000:00:03.0/0000:11:00.1/host2 : # --------- PCI ------------- # subsystem model model #scsi_addr vendor device vendor device name description #----------- ------ ------ ------ ------ ------------ -------------------------------------------------- 0:*:*:* 0x1000 0x005d 0x1014 0x0454 <D:MegaRAID SAS-3 3108 [Invader]> 1:*:*:* 0x1077 0x2031 0x1077 0x0263 QLE2662 QLogic 16Gb FC Dual-port HBA for System x 2:*:*:* 0x1077 0x2031 0x1077 0x0263 QLE2662 QLogic 16Gb FC Dual-port HBA for System x : #SCSI HBA Fabric Storage Port #Addr Luns wwnn wwpn portid wwn wwnn wwpn portid info #------------- ---- ------------------/------------------/-------- ------------------ ------------------/------------------/--------/--------- 1:0:0:- 282 0x2000000e1ee8ba56 0x2100000e1ee8ba56 0x6a95c0 0x100050eb1afa122c 0x50060e8007e60963 0x50060e8007e60963 0x6a1900 FCP Target 1:0:1:- 225 0x2000000e1ee8ba56 0x2100000e1ee8ba56 0x6a95c0 0x100050eb1afa122c 0x50060e8007e61763 0x50060e8007e61763 0x693500 FCP Target 2:0:0:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:1:- 225 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e61773 0x50060e8007e61773 0x692500 FCP Target >> these are the expected two storage targets, we see hba->switch->storage targets with san did of 0x6a1900, 0x692500 >> but below is repeated storage targets to thru the same switch/san to the same end port -- there should only be one target discovery, >> but in this case the same storage target (0x6a1900) was discovered and added to the kernel storage interconnect topology multiple times. >> its possible that some last stage discover error results in this behavior, but have not seen it before. There have been regression issues >> within the qla2xxx driver within 7.6, especially with 16G adapters, and this problem may be related to those issues. same hba (duh) same switch by id same storage port by ids wwpn,wwnn, san did vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv vvvvvvvvvvvvvvvvvv vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv 2:0:2:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:3:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:4:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:5:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:6:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:7:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:8:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:9:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:10:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:11:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:12:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:13:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:14:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:15:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target 2:0:16:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target So the QLogic HBA discovers the same port over and over again resulting in the same port being registered under a new scsi target index, at least that is what it appears from the above data. So instead of the expected 4 paths to a device the system end up with 17 paths, 15 being repeats of hba->switch->san did 0x6a1900. Customer is running 7.05.xx firmware on the 16G adpaters, but not sure that has anything to do with it.
Hi Dwight, Can you please create new BZ for this issue. I do not see issue is related to this particular BZ and would like to keep issue separate. (In reply to Dwight (Bud) Brown from comment #99) > I've another customer with issues since updating to 7.6 with 16G QLogic > adapters, it appears that port discover stutters, that's the only way I know > how to describe it. > > #scsi_addr name version f/w > device > #----------- ---------------------- ---------------------- > ------------------------- ---------------------------------------------- > 0:*:*:* megaraid_sas > /sys/devices/pci0000:00/0000:00:02.2/0000:07:00.0/host0 > 1:*:*:* qla2xxx 7.05.04 (d0d5) > /sys/devices/pci0000:00/0000:00:03.0/0000:11:00.0/host1 > 2:*:*:* qla2xxx 7.05.04 (d0d5) > /sys/devices/pci0000:00/0000:00:03.0/0000:11:00.1/host2 > : > > # --------- PCI ------------- > # subsystem model model > #scsi_addr vendor device vendor device name description > #----------- ------ ------ ------ ------ ------------ > -------------------------------------------------- > 0:*:*:* 0x1000 0x005d 0x1014 0x0454 <D:MegaRAID SAS-3 > 3108 [Invader]> > 1:*:*:* 0x1077 0x2031 0x1077 0x0263 QLE2662 QLogic 16Gb FC > Dual-port HBA for System x > 2:*:*:* 0x1077 0x2031 0x1077 0x0263 QLE2662 QLogic 16Gb FC > Dual-port HBA for System x > : > > #SCSI HBA Fabric > Storage Port > #Addr Luns wwnn wwpn portid wwn > wwnn wwpn portid info > #------------- ---- ------------------/------------------/-------- > ------------------ ------------------/------------------/--------/--------- > 1:0:0:- 282 0x2000000e1ee8ba56 0x2100000e1ee8ba56 0x6a95c0 > 0x100050eb1afa122c 0x50060e8007e60963 0x50060e8007e60963 0x6a1900 FCP Target > 1:0:1:- 225 0x2000000e1ee8ba56 0x2100000e1ee8ba56 0x6a95c0 > 0x100050eb1afa122c 0x50060e8007e61763 0x50060e8007e61763 0x693500 FCP Target > > 2:0:0:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:1:- 225 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e61773 0x50060e8007e61773 0x692500 FCP Target > > >> these are the expected two storage targets, we see hba->switch->storage targets with san did of 0x6a1900, 0x692500 > >> but below is repeated storage targets to thru the same switch/san to the same end port -- there should only be one target discovery, > >> but in this case the same storage target (0x6a1900) was discovered and added to the kernel storage interconnect topology multiple times. > >> its possible that some last stage discover error results in this behavior, but have not seen it before. There have been regression issues > >> within the qla2xxx driver within 7.6, especially with 16G adapters, and this problem may be related to those issues. > > same hba (duh) same > switch by id same storage port by ids wwpn,wwnn, san did > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > vvvvvvvvvvvvvvvvvv vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > 2:0:2:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:3:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:4:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:5:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:6:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:7:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:8:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:9:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:10:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:11:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:12:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:13:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:14:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:15:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > 2:0:16:- 282 0x2000000e1ee8ba57 0x2100000e1ee8ba57 0x6abac0 > 0x100050eb1affdcd8 0x50060e8007e60973 0x50060e8007e60973 0x6a1900 FCP Target > > So the QLogic HBA discovers the same port over and over again resulting in > the same port being registered under a new scsi target index, at least that > is what it appears from the above data. So instead of the expected 4 paths > to a device the system end up with 17 paths, 15 being repeats of > hba->switch->san did 0x6a1900. Customer is running 7.05.xx firmware on the > 16G adpaters, but not sure that has anything to do with it. do you happen to have log file for multiple port registrations? Please ask customer to capture log file with ql2xextended_error_logging=1. Please provide me logs from the failure case to identify if this is known issue or not. Thanks, Himanshu
Himanshu, what should my next steps be?
Himanshu, I did a network install of the latest RHEL 7.7 nightly build onto my SANboot LUN with FW:v8.07.80 DVR:v10.00.00.12.07.7-k and it boot just fine. It was also able to see the remainder of my volumes. I went ahead and upgraded to FW:v8.08.03 and my host boot into emergency mode. I received a warning saying the following: /dev/mapper/rhel_ictm1608s02h4-root does not exist /dev/rhel_ictm1608s02h4/root does not exist /dev/rhel_ictm1608s02h4/swap does not exit I'm guessing that I hit that message because this issue still exists in RHEL 7.7, specifically when FW:v8.08.03 is loaded onto the HBAs. I reboot the host with FW:v8.07.80 loaded onto the HBAs and it was able to reboot multiple times without losing sight of the SANboot LUN. I will be attaching the serial logs shortly.
Created attachment 1549588 [details] ICTM1608S02H4-3-29-19 7.7 Nightly build
Hi Jennifer, Can you try with 8.08.204 firmware posted on our download site to check if the issue is still reproducible http://driverdownloads.qlogic.com/QLogicDriverDownloads_UI/SearchByProduct.aspx?ProductCategory=39&Product=1261&Os=2 Thanks, Himanshu
Himanshu, it doesn't look like I'm able to reproduce this with FW:v8.08.204.
Hi Jennifer, (In reply to jennifer.duong from comment #105) > Himanshu, it doesn't look like I'm able to reproduce this with FW:v8.08.204. Thanks for the update. Thanks, Himanshu
Himanshu, since it looks like this issue is fixed in the latest QLogic FW 8.08.204, go ahead and close this bug.
Thanks Jennifer for the confirmation. We'll close this Bugzilla. (In reply to jennifer.duong from comment #107) > Himanshu, since it looks like this issue is fixed in the latest QLogic FW > 8.08.204, go ahead and close this bug.