Bug 179474

Summary: Problem with multiple SCSI LUNs and aic7xxx
Product: Red Hat Enterprise Linux 4 Reporter: wilburn
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: alriddoch, coldwell, jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-17 13:59:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output
none
dmesg output (RHEL3) none

Description wilburn 2006-01-31 17:33:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.0.7-1.4.1 Firefox/1.0.7

Description of problem:
I have two external RAID devices (different) connected to an Adaptec 29160B Ultra160 SCSI adapter. Each RAID device is configured as two slices, assigned LUNs 0 and 1.

Both LUNs are seen for one RAID device (channel 0, id 2, lun 0 and channel 0, id 2, lun 1).

Only one LUN is seen for the other device (channel 0, id 0, lun 0). The second LUN (should be channel 0, id 0, lun 1) gives error messages:

scsi: host 0 channel 0 id 0 lun 0x00000200080c0400 has a LUN larger than currently supported.
scsi: host 0 channel 0 id 0 lun 0xff010000ffffffff has a LUN larger than currently supported.
scsi: host 0 channel 0 id 0 lun 0x0002202020202020 has a LUN larger than currently supported.
scsi: host 0 channel 0 id 0 lun808529923 has a LUN larger than allowed by the host adapter
scsi: host 0 channel 0 id 0 lun3078 has a LUN larger than allowed by the host adapter

Complete output of dmesg attached.

On boot, BIOS reports proper LUNs for all devices.

Multiple LUNs are enabled and new initrd created.

This problem did not occur with RHEL 3, same hardware.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-22.0.2.EL

How reproducible:
Always

Steps to Reproduce:
1.Boot
2.
3.
  

Actual Results:  No device is assigned to channel 0, id 0, lun 1. Error messages appear in dmesg output.

Expected Results:  Device should be assigned to channel 0, id 0, lun 1.

Additional info:

Non-working RAID array appears as:

  Vendor: CAEN RAP  Model: TOR 16            Rev: 0001
  Type:   Direct-Access                      ANSI SCSI revision: 03

Working RAID array appears as:

  Vendor: IFT       Model: A16U-G1410        Rev: 334B
  Type:   Direct-Access                      ANSI SCSI revision: 03

Comment 1 wilburn 2006-01-31 17:35:49 UTC
Created attachment 123926 [details]
dmesg output

Comment 2 wilburn 2006-02-14 21:56:37 UTC
As a test, moved the RAID arrays and SCSI card to a machine running RHEL3.
Everything works. Attached is dmesg output for this case.

Comment 3 wilburn 2006-02-14 21:57:46 UTC
Created attachment 124645 [details]
dmesg output (RHEL3)

Comment 4 Tom Coughlan 2006-02-15 13:14:08 UTC
One of the differences between RHEL 3 and 4 is that RHEL 4 uses the Report LUN
command. It looks like this storage device is returning bad info in response to
this command. 

You can turn of the Report LUN probing, and revert to one-at-a-time scanning by
setting the device_info flag: 

0x40000 /* don't try REPORT_LUNS scan (SCSI-3 devs) */

with the command:

echo 'CAEN RAP':'TOR 16':0x40000 > /proc/scsi/device_info

Now rmmod/modprobe the aic7xxx driver.

If this does not work, turn on the debug messages and report the results:

sysctl -w dev.scsi.logging_level=0x000001c0
rmmod/modprobe the aic7xxx driver.

Comment 5 wilburn 2006-02-16 19:28:57 UTC
This fixes the problem. Does this mean it is a problem with the RAID device,
rather than a bug?

Comment 6 Tom Coughlan 2006-02-17 13:59:20 UTC
So far we know that the data returned by the RAID device in response to the
Report LUNs command is not interpreted correctly by Linux. It seems likely that
the problem is with the data, since Linux handles the Report LUN data from every
other known RAID device correctly. Someone would need to look at the data
returned to see if it meets the SCSI spec. 

There is another indication that this device may not be carefully adhering to
the SCSI spec. This information:

  Vendor: CAEN RAP  Model: TOR 16 

Comes from the RAID box's reply to the SCSI Inquiry command. The Vendor ID is
defined as 8 bytes of ASCII data, and the Product ID (Model) field is defined as
16 bytes of ASCII data. I suspect that the Model is supposed to be "RAPTOR 16",
but they did not bother fill the Vendor field so that the Model field is aligned
properly. 

I'll close this, since you have a workaround. If you find that the device is
actually complying with the SCSI spec. you can re-open it.