127434 – Kernel not detecting multiple LUNs on SCSI aic7xxx

Bug 127434 - Kernel not detecting multiple LUNs on SCSI aic7xxx

Summary: Kernel not detecting multiple LUNs on SCSI aic7xxx

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	2
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-07-08 05:23 UTC by D. Cross
Modified:	2015-01-04 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-04-16 06:05:28 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg output (15.32 KB, text/plain) 2004-07-08 05:36 UTC, D. Cross	no flags	Details
View All

Description D. Cross 2004-07-08 05:23:35 UTC

Description of problem:
Previously running RH9 on a system with dual Intel Xeon 3.0 processors
and an Adaptec SCSI Card 39320D, connecting to an external raid which 
is divided into two LUNs (Logical Unit Numbers). The first LUN (LUN 
0) is 2 TBytes in size and the other (LUN 1) is 1.3 TBytes.  Prior to 
our system disk dying, this worked fine under RH9.  The first LUN was 
and is at about 90% usage, and the smaller device was about 60% 
populated.

We replaced the system disk and decided this might be a good time to 
try to move to Fedora Core 2.  Everything seems to be fine, with the 
exception of the fact that we can no longer see the second LUN (LUN 
1).  We were instructed by our vendor contact to custom rebuild the 
kernel and to add Multiple LUN support.  That did not seem to make a 
difference, with the only change to the kernel config being the 
addition of multiple LUN Support. That suggestion included the 
addition of "options scsi_mod.o scsi_max_luns=255" to 
the /etc/modules.conf file.  We still do not see the second LUN.  I 
have included below the dmesg from the custom build of kernel-2.6.6-
i686-smp.config for the kernel version 2.6.6-1.435custom.  Our vendor 
contact seemed to think this is an issue with the 2.6.X kernel.
  
You might note the section of the dmesg where it says the SCSI "has a
LUN larger than currently supported."  Note also that the larger of 
the
two LUNs (LUN 0) mounts just fine.  Both LUNs show up during system 
startup at the bios level with LUN sizes proper noted when the SCSI 
devices are scanned.  So it is the kernel that is apparently not 
picking up the configuration.  During boot, at the bios level when
SCSI devices are scanned, the system identifies this raid as having 
LUN
0 and LUN 1.  Previously in RH9, each LUN appeared to the system 
kernel as individual SCSI devices /dev/sdc and /dev/sdd, which were 
partitioned as two large individual partitions /dev/sdc1 (2 TBytes) 
and /dev/sdd1 (1.3 TBytes) respectively.

In the dmesg included below, it looks like the driver is 
misinterpreting data relating the LUNs.  Instead of reporting small 
integers like the 0 and 1 as I would expect, it has these huge hex 
numbers associated with it.  See attached.


Version-Release number of selected component (if applicable):
kernel-2.6.6-1.435

How reproducible:
Everytime

Steps to Reproduce:
1.Install Fedora Core 2 with an external raid configured with 
multiple LUNs
2.Startup system
3.Check dmesg, or fdisk -l for additional LUN
  
Actual results:
You only see the first LUN.  fdisk appears as though the existing LUN 
consists of the entire RAID.

Expected results:
Expect to see LUN 0 and LUN 1 as was previously seen before the 
system disk crashed, when it was running RedHat 9.0.

Additional info:

Comment 1 D. Cross 2004-07-08 05:36:23 UTC

Created attachment 101711 [details]
dmesg output

In the dmesg included, it looks like the driver is misinterpreting data
relating the LUNs.  Instead of reporting small integers like the 0 and 1 as I
would expect, it has these huge hex numbers associated with it.  See below.

Comment 2 Barry K. Nathan 2004-07-14 08:27:53 UTC

Just so you know, for a 2.6 kernel you need to edit /etc/modprobe.conf
not /etc/modules.conf.

Comment 3 D. Cross 2004-07-14 22:10:27 UTC

Thank you for your response.  I had to drop my system back down to
Fedora Core 1, which is working.  I actually did try both the
/etc/modprobe.conf and the /etc/modules.conf with no success in either
case.  My understanding was that the custom kernel build with Multiple
LUN support added, would not require the either one of those files. 
However, I believe I built the custom kernel with the entries also. 
Even though we are back to Fedora Core 1, we would still like to get
this resolved.  The only problem is that right now I don't have a test
system to experiment.  I could possibly get our vendor contact to try
a few things if you believe you have a potential solution.

Comment 4 Barry K. Nathan 2004-07-15 01:37:07 UTC

There are newer test kernels available from the following places:

+ rawhide (a.k.a. fc-devel)
+ FC3 test 1
+ http://people.redhat.com/arjanv/2.6

You could try one of these newer kernels and see if it's something
that's been fixed upstream (and therefore in these kernels). (My
experience is that it's actually possible to use these kernels on FC1
much of the time, so you may be able to test without having to upgrade
your system to FC2.) I don't know how likely this is to fix your
problem, however.

The only other suggestion I have is to try asking on the linux-scsi
mailing list.

Comment 5 D. Cross 2004-07-15 23:01:54 UTC

My apologees.  I thought I had included it in my original attachment
that I had also tried additional kernels of version 2.6.7-X with no
additional success there either.  Thanks for the direction on the
linux-scsi list.  I had tried another list with not much response as
of yet.

Comment 6 D. Cross 2004-07-21 17:39:27 UTC

A vendor contact has informed me that in their testing, this is
apparently a kernel 2.6.X problem, and not specific to the adaptec
drivers.  The large hexadecimal LUN numbers noted in the dmesg output
show up even with a single LUN, if the raid size is greater than about
1 TB.  They have also observed this behavior with cards other than
adaptec.

Comment 7 Joe Krahn 2005-01-06 22:39:25 UTC

I think this is related to using REPORT_LUNS, rather than sequential
scanning, where the REPORT_LUNS response is getting mis-read. The
kernel will fall back to sequential scanning if REPORT_LUNS fails (see
scsi_scan.c), but it doesn't see the huge LUN values as errors.

The REPORT_LUNS bug should be fixed, but there should probably be a
kernel option to force sequential scanning.

I'm having this problem on FC3. I will try disabling REPORT_LUNS in
the source, and see if LUN 1 becomes available.

Comment 8 Joe Krahn 2005-01-06 23:06:18 UTC

Workaround: Hey, I figured out that you can manually scan individual
LUNs via /procfs with:

echo scsi add-single-device HOST CHANNEL ID LUN >/proc/scsi/scsi

For example, this gets my disk back online:
echo scsi add-single-device 1 0 8 1 >/proc/scsi/scsi

Comment 9 Dave Jones 2005-01-14 06:02:24 UTC

any improvements with the latest updates ?

Comment 10 David Kewley 2005-04-01 07:37:52 UTC

As an additional datapoint, I see the same-appearing error on a box with the 
following specs: 
 
RHEL 4AS 
kernel 2.6.9-5.0.3.ELsmp x86_64 
two 3ware 9500S-12MI cards 
24x 400GB disks attached to 3ware cards 
latest 3w-9xxx driver from 3ware website 
 
dual Xeon EM64T 
4GB RAM 
 
In my case, I only have one LUN per SCSI ID, one SCSI ID per 400GB disk, and 
the real LUN is seen fine, so I have no problems using my disks. 
 
Is there anything else I can give you to help track this down?

Comment 11 Dave Jones 2005-04-16 06:05:28 UTC

Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Note You need to log in before you can comment on or make changes to this bug.