Bug 123331

Summary: LUN i not getting registered
Product: Red Hat Enterprise Linux 3 Reporter: Satish Mohan <smohan>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: hinz, petrides, riel, thomas.zhang
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 14:23:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156320    
Attachments:
Description Flags
output log of IA 64 machine with qlogic module loading
none
add pv-136 to sparselun list none

Description Satish Mohan 2004-05-17 09:16:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040207 Firefox/0.8

Description of problem:
The current setup of the customer is with machines (IA64 and IA32)
with qlogic hba cards attahced to a hitachi SAN of 20TB. The san is
being shared by different OS's and LUNS hve been assingned to all of them.

in linux the qlogic driver is able to see the luns whihc is allocated
to it, but not abl to register the same with OS as a device. am
attaching the proc details of qla2300. The Lun numbering is random. we
tried options ghost lun and max luns options with scsi_mod, but no
results.

the current info pasted is of an IA 64 machine. if needed i will put
the IA 32 output also.

QLogic PCI to Fibre Channel Host Adapter for QLA2340:
        Firmware version:  3.02.13, Driver version 6.06.00b11
Entry address = a000000000135a50
HBA: QLA2312 , Serial# J56045
Request Queue = 0x81b0000, Response Queue = 0x81a0000
Request Queue count= 512, Response Queue count= 512
Total number of active commands = 0
Total number of interrupts = 3
Total number of IOCBs (used/max) = (0/600)
Total number of queued commands = 0
    Device queue depth = 0x20
Number of free request entries = 510
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state= <READY>, flags= 0xc048e0813
Dpc flags = 0x0
MBX flags = 0x0
SRB Free Count = 4096
Link down Timeout = 000
Port down retry = 030
Login retry count = 030
Commands retried with dropped frame(s) = 0


SCSI Device Information:
scsi-qla0-adapter-node=200000e08b0e8d96;
scsi-qla0-adapter-port=210000e08b0e8d96;
scsi-qla0-target-0=500060e8027b4d14;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 1, Pending reqs 0, flags 0x0*, 0:0:81,
( 0:14): Total reqs 0, Pending reqs 0, flags 0x0*, 0:0:81,



Version-Release number of selected component (if applicable):
2.4.21-9

How reproducible:
Always

Steps to Reproduce:
1.Install RHEL3 
2.try updates also
3.
    

Actual Results:  LUn's are not getting registered, where as all other
OS(non linux) are able to do so

Expected Results:  Lun should be registed and user should see it as a
device

Additional info:

Comment 1 Arjan van de Ven 2004-05-17 09:24:41 UTC
how big is the lun ?

Comment 2 Satish Mohan 2004-05-17 09:33:55 UTC
different sizes. 

Min - 2Gb 
Max as on today is 15Gb

This is data center with 150 machines running multiple operating systems.

Comment 3 Tom Coughlan 2004-05-17 14:46:19 UTC
Please attach the /var/log/messages that show the Qlogic driver being
loaded and configured.

Did you try an rmmod then modprobe after the system was booted?

You could also try 

        /*
         * Usage: echo "scsi scan-new-devices" >/proc/scsi/scsi
         *
         * Scans all host adapters again to see if there are any
         * new devices.
         */

Tom

Comment 4 Satish Mohan 2004-05-19 05:38:45 UTC
Created attachment 100318 [details]
output log of IA 64 machine with qlogic module loading

contains the following output
1. qlogic module loading at boot time
2. qlogic module loading using modprobe
3. output after sacn scsi_new_devices
4. /etc/modules.conf

Comment 5 Tom Coughlan 2004-05-19 13:45:26 UTC
Your current LUN numbering is 

LUN 0 - the Hitachi processor device - used for managing the box
LUN 14 - presumably the disk device you are trying to configure

Please try the following command to configure the disk device:

echo "scsi add-single-device 2 0 0 14" >/proc/scsi/scsi

Also, are you able to try a test with the disk device at LUN 1? 





Comment 6 Satish Mohan 2004-05-21 09:53:18 UTC
echo "scsi add-single-device 2 0 0 14" >/proc/scsi/scsi

this is attaching the device. how we can automate this to attach
devices at the time of boot (i mean different systems, multiple luns)

can we follow similar steps for IA32 also

Comment 7 Satish Mohan 2004-05-21 10:32:07 UTC
echo "scsi add-single-device 2 0 0 14" >/proc/scsi/scsi

this is attaching the device. how we can automate this to attach
devices at the time of boot (i mean different systems, multiple luns)

can we follow similar steps for IA32 also

Comment 8 Tom Coughlan 2004-05-21 12:42:37 UTC
Yes, this will work on any architecture.

The problem is caused by the gap in the LUN numbering.  The system
stops scanning when it hits a gap.  It is not immediately clear why
this is happening in your case because your storage device is listed
in scsi_scan.c as being okay with sparse LUNs. 

As a temporary workaround, you could put a script like
rescan-scsi-bus.sh in rc.local. See "Rescan SCSI bus" on 
http://www.garloff.de/kurt/linux/

Comment 9 Tom Coughlan 2004-12-21 22:08:01 UTC
The Qlogic driver has been updated several times since 6.06.00b11.
Please re-test with RHEL 3 Update 4. Post the results here. Thanks.

Comment 10 Joerg Hinz 2005-03-26 17:56:09 UTC
The bug is still exist with qla 7.03.00 (original Qlogic or -RH) and Kernel 2.4.
21-27.0.2.ELsmp.


Comment 11 Joerg Hinz 2005-03-26 18:09:19 UTC
I suggest changing the SUMMARY to "LUNs not found with Qlogic-FC-adapters". On 
the other hand I cannot say wether its a general issue or just qla-specific.

My configuration is qlogic on a dell pv136t library connected via fibre-channel.

When loading the driver only the FC-Connector of the library (LUN 0) is found:

qla2x00_set_info starts at address = f89ca060
qla2x00: Found  VID=1077 DID=2312 SSVID=1077 SSDID=100
scsi(1): Found a QLA2312  @ bus 10, device 0x3, irq 77, iobase 0xf895d000
scsi(1): 64 Bit PCI Addressing Enabled.
scsi(1): Allocated 4096 SRB(s).
scsi(1): Configure NVRAM parameters...
scsi(1): Verifying loaded RISC code...
scsi(1): Verifying chip...
scsi(1): Waiting for LIP to complete...
scsi(1): LOOP UP detected.
scsi(1): Port database changed.
scsi(1): Topology - (N_Port-to-N_Port), Host Loop address 0x0
scsi(1): Failed SNS login: loop_id=80 mb[0]=4005 mb[1]=5 mb[2]=0 mb[6]=600 mb[7]
=0
scsi-qla0-adapter-node=200000e08b1b91c4\;
scsi-qla0-adapter-port=210000e08b1b91c4\;
scsi-qla0-tgt-0-di-0-port=200100308c036c30\;
qla2x00_detect num_hosts=0
scsi1 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 10 device 3 irq 77
        Firmware version:  3.03.01, Driver version 7.01.01

scsi(1): Waiting for LIP to complete...
scsi(1): Topology - (N_Port-to-N_Port), Host Loop address 0x0
blk: queue f762c618, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi: unknown type 12
  Vendor: DELL      Model: PV-136T-SNC2      Rev: 42b1
  Type:   Unknown                            ANSI SCSI revision: 03
blk: queue f762c418, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi(1:0:0:0): Enabled tagged queuing, queue depth 32.
Attached scsi generic sg2 at scsi1, channel 0, id 0, lun 0,  type 12
resize_dma_pool: unknown device type 12

When you take a look to the other FC-devices you see:

backup07:~# cat /proc/scsi/qla2300/1
QLogic PCI to Fibre Channel Host Adapter for QLA2340:
        Firmware version:  3.03.01, Driver version 7.01.01
Entry address = f89ca060
HBA: QLA2312 , Serial# S19793
Request Queue = 0x37000000, Response Queue = 0x36ff0000
Request Queue count= 512, Response Queue count= 512
Total number of active commands = 0
Total number of interrupts = 124
Total number of IOCBs (used/max) = (0/600)
Total number of queued commands = 0
    Device queue depth = 0x20
Number of free request entries = 493
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 1
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state= <READY>, flags= 0x860813
Dpc flags = 0x1000000
MBX flags = 0x0
SRB Free Count = 4096
Link down Timeout = 000
Port down retry = 045
Login retry count = 045
Commands retried with dropped frame(s) = 0
Configured characteristic impedence: 50 ohms
Configured data rate: 1-2 Gb/sec auto-negotiate


SCSI Device Information:
scsi-qla0-adapter-node=200000e08b1b91c4;
scsi-qla0-adapter-port=210000e08b1b91c4;
scsi-qla0-target-0=200100308c036c30;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 1, Pending reqs 0, flags 0x0*, 0:0:01,
( 0: 2): Total reqs 14, Pending reqs 0, flags 0x0, 0:0:01,
( 0: 4): Total reqs 1, Pending reqs 0, flags 0x0*, 0:0:01,
( 0: 5): Total reqs 1, Pending reqs 0, flags 0x0*, 0:0:01,

So you see LUN 2,4 and 5 are other devices not detected by the Kernel.

If you "workaround" ala:
backup07:~# cat workaround.sh 
echo "scsi add-single-device 1 0 0 2" > /proc/scsi/scsi 
echo "scsi add-single-device 1 0 0 4" > /proc/scsi/scsi 
echo "scsi add-single-device 1 0 0 5" > /proc/scsi/scsi 

Those devices get detected:
scsi singledevice 1 0 0 2
blk: queue c3544c18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
  Vendor: DELL      Model: PV-136T           Rev: 3.11
  Type:   Medium Changer                     ANSI SCSI revision: 02
blk: queue c37b2c18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi(1:0:0:0): Enabled tagged queuing, queue depth 32.
Attached scsi generic sg3 at scsi1, channel 0, id 0, lun 2,  type 8
resize_dma_pool: unknown device type 12
scsi singledevice 1 0 0 4
  Vendor: IBM       Model: ULTRIUM-TD2       Rev: 37RH
  Type:   Sequential-Access                  ANSI SCSI revision: 03
blk: queue f6f56418, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi(1:0:0:0): Enabled tagged queuing, queue depth 32.
resize_dma_pool: unknown device type 12
scsi singledevice 1 0 0 5
  Vendor: IBM       Model: ULTRIUM-TD2       Rev: 37RH
  Type:   Sequential-Access                  ANSI SCSI revision: 03
blk: queue f6e93c18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi(1:0:0:0): Enabled tagged queuing, queue depth 32.
resize_dma_pool: unknown device type 12

But that workaround is not really a satifying solution....

Joerg


Comment 12 Joerg Hinz 2005-03-26 18:15:34 UTC
BTW I saw my posting showed the old 7.01.01 driver.

With the recent 7.03.00 its the same:

qla2x00_set_info starts at address = f89ca060
qla2x00: Found  VID=1077 DID=2312 SSVID=1077 SSDID=100
scsi(1): Found a QLA2312  @ bus 10, device 0x3, irq 77, iobase 0xf895f000
scsi(1): 64 Bit PCI Addressing Enabled.
scsi(1): Allocated 4096 SRB(s).
scsi(1): Configure NVRAM parameters...
scsi(1): Verifying loaded RISC code...
scsi(1): Verifying chip...
scsi(1): Waiting for LIP to complete...
scsi(1): LOOP UP detected.
scsi(1): Port database changed.
scsi(1): Topology - (N_Port-to-N_Port), Host Loop address 0x0
scsi(1): Failed SNS login: loop_id=80 mb[0]=4005 mb[1]=5 mb[2]=0 mb[6]=47b1 
mb[7]=f89e
scsi-qla0-adapter-node=200000e08b1b91c4\;
scsi-qla0-adapter-port=210000e08b1b91c4\;
scsi-qla0-tgt-0-di-0-port=200100308c036c30\;
qla2x00_detect num_hosts=0
scsi1 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 10 device 3 irq 77
        Firmware version:  3.03.06, Driver version 7.03.00

scsi(1): Waiting for LIP to complete...
scsi(1): Topology - (N_Port-to-N_Port), Host Loop address 0x0
blk: queue f6e93a18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi: unknown type 12
  Vendor: DELL      Model: PV-136T-SNC2      Rev: 42b1
  Type:   Unknown                            ANSI SCSI revision: 03
blk: queue f6e93818, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi(1:0:0:0): Enabled tagged queuing, queue depth 32.
Attached scsi generic sg2 at scsi1, channel 0, id 0, lun 0,  type 12
resize_dma_pool: unknown device type 12
(no other devices found)

We think the problem is kernel- and not qlogic-related.

Joerg

Comment 13 Tom Coughlan 2005-03-29 13:52:19 UTC
You are right, the problem is not related to the QLogic driver. 

By default the system does not scan LUNs greater than zero. You can over-ride
this by adding the following to /etc/modules.conf:

options scsi_mod max_scsi_luns=256

Re-make the initrd, and reboot. Please do this, if you have not already.

This will cause the system to scan LUNs sequentially until there is no response.
Your LUNs are 0, 2, 4, 5, so the system will stop scanning when LUN 1 does not
answer. If you can re-number the LUNs sequentially, this will be the simplest fix.

Otherwise, in order for the system to scan past gaps in the LUN number space,
your device must be listed in scsi_scan.c with the BLIST_SPARSELUN flag set. See
the attached patch. If you confirm that this patch solves the problem I will
include it in an RHEL 3 update. 


Comment 14 Tom Coughlan 2005-03-29 13:53:41 UTC
Created attachment 112415 [details]
add pv-136 to sparselun list

Comment 15 Joerg Hinz 2005-03-29 15:43:22 UTC
> If you confirm that this patch solves the problem I will
> include it in an RHEL 3 update. 

Yes, the patch solved the problem.

So you can put this into the next update to support PV-136T libraries with LUN 
gaps.

Today I found that re-numbering of the LUNs sequentially is ONLY possibile with 
the Windows Dell SNC-Manager... NOT via the serial console...

What a .... ;->

Thanks for the patch.

Joerg


Comment 16 Joerg Hinz 2005-03-29 15:45:04 UTC
BTW you might close this bug, since it's not really a bug but a generic problem 
of the linux kernel?

What about a new kernel parameter, scan_max_luns=1 to force the scsi_scan.c to 
scan up to the max_scsi_luns-Parameter?

Joerg

Comment 17 Tom Coughlan 2005-03-29 16:23:00 UTC
Thanks for testing the patch. It is too late for RHEL 3 U5, so this will go in
U6. I'll keep the bug open to track status. 

There have been discussions about adding more dynamic controls over the LUN
scanning behavior. The prevailing opinion seems to be that it is too late for
the 2.4 kernel, and scanning is done differently in 2.6, where the Report LUNs
command is used if it is supported by the device. 

Tom

Comment 18 Thomas Zhang 2005-04-24 14:05:53 UTC
I am testing EM64T on HP DL380 with QLogic 2312.  I got same error.  Can you 
tell me how to fix this problem please?

Comment 19 Tom Coughlan 2005-04-25 11:21:02 UTC
Thomas, please post /var/log/messages that shows the messages when the qla2xxx
driver loads. Also let me know what the LUN numbers are.

Did you try  

echo "scsi add-single-device 2 0 0 14" >/proc/scsi/scsi

with the appropriate values filled in?


Comment 22 Tom Coughlan 2005-05-19 11:28:32 UTC
U6 status update: will do. One hour of work.

Comment 26 Ernie Petrides 2005-07-20 07:41:09 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-33.EL).


Comment 29 Red Hat Bugzilla 2005-09-28 14:23:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html