Bug 156300 - scsi drives present in /proc/scsi/scsi but missing from /proc/partitions
scsi drives present in /proc/scsi/scsi but missing from /proc/partitions
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-28 14:05 EDT by Richard Abbott
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:03:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
modules.conf (158 bytes, text/plain)
2005-04-29 11:25 EDT, Richard Abbott
no flags Details
messages from the last boot of the system (34.85 KB, text/plain)
2005-04-29 11:26 EDT, Richard Abbott
no flags Details
output from `lsmod` (856 bytes, text/plain)
2005-04-29 11:27 EDT, Richard Abbott
no flags Details
output from `grep mpt modules.dep` of the kernel 2.4.21-27.0.2.ELsmp (1.72 KB, text/plain)
2005-04-29 11:28 EDT, Richard Abbott
no flags Details
requested info for RHEL3AS U5 (42.83 KB, text/plain)
2005-05-05 12:11 EDT, Richard Abbott
no flags Details

  None (edit)
Description Richard Abbott 2005-04-28 14:05:32 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040805 Netscape/7.2

Description of problem:
When I upgraded from kernel version 2.4.21-27 to 27.0.2 and also 27.0.4 many of the scsi drives attached to the network were no longer avaliable. The scsi devices are listed on the screen as they are discovered during boot and are listed in the /proc/scsi/scsi however thay are not being listed in /proc/partitions and are not avaliable for mount or displayed when `fdisk -l` is run. The drive ordering remains constant and some drives remain avaliable. ex. i will have /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdj. Which are the device letterings that i get when i boot into the 2.4.21-27 kernel. 



Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-27.0.2.EL kernel-smp-2.4.21-27.0.4.EL

How reproducible:
Always

Steps to Reproduce:
1. upgrade to the kernel-smp-2.4.21-27.0.2.EL kernel
2. try to list or mount some of the NAS raids that were previousely avaliable
3.
  

Actual Results:  devices that were previousely avalible for mounting are no longer avaliable

Expected Results:  the devices should be avaliable for use with the new kernel

Additional info:

The drives that we are using are 3.5 and 5.6 TB Apple X-Raids attached through a fibre network. We are using dual port LSI7202XP-LC fibre channel cards in out systems. The problem does not seem to be related to the firmware version of the Apple X-Raids nor the firmware of the fibre network switches. I have also tried to install the latest drivers for the LSI fibre channel adapter but this did not fix the problem. 

I have reviewed several log files to try to find some error messages that would explain why many of the devices are being mysteriousely dropped but i was unable to find anything out of the ordinary. 

This is the first bug I have reported so if there is some information I have omitted that woudl help please let me know and I will do my best to get it.
Comment 1 Ernie Petrides 2005-04-28 14:49:00 EDT
Hello, Richard.  Which kernel module(s) support the drives you've listed?
And if any of those are in the kernel-unsupported-* RPM, have you also
installed the associated RPM from the new release?
Comment 2 Richard Abbott 2005-04-28 16:26:01 EDT
From what i can see the modules that are needed are

mptscsih  which uses mptbase, diskdumplib, and scsi_mod

according to the mudules.dep file the mptscsih module is coming from under a
fusion_20505 directory and mptbase is coming from a fusion_20511 directory where
all of the modules are stored for the kernel 2.4.21-27.0.2.ELsmp. That i know of
nothing is used from kernel-unsupported.
Comment 3 Tom Coughlan 2005-04-29 10:06:11 EDT
Please post /etc/modules.conf, and /var/log/messages showing the boot messages
when the system failed to configure all the devices. Also post the output of lsmod.

The driver consists of four modules: mptbase, mptctl, mptlan, mptscsih. The
modules with the "_nnnnn" are older versions of the driver. The modules without
the suffix are the default driver. You can see all these with: 

grep mpt modules.dep
Comment 4 Richard Abbott 2005-04-29 11:25:09 EDT
Created attachment 113849 [details]
modules.conf
Comment 5 Richard Abbott 2005-04-29 11:26:36 EDT
Created attachment 113851 [details]
messages from the last boot of the system

I have removed all information before and after the boot messages to shorten
the file
Comment 6 Richard Abbott 2005-04-29 11:27:22 EDT
Created attachment 113852 [details]
output from `lsmod`
Comment 7 Richard Abbott 2005-04-29 11:28:53 EDT
Created attachment 113853 [details]
output from `grep mpt modules.dep` of the kernel 2.4.21-27.0.2.ELsmp
Comment 8 Richard Abbott 2005-05-03 17:48:30 EDT
Also found out that the 4 devices that this problem is showing itself with is
using a slightly newer firmware version. The firmware versions on the Apple
X-Raids that do work are 1.2/1.19f and 1.3.1d1/1.24f the new X-Raids which are
the ones causing the problems have a firmware version of 1.3.2/1.26a which is
required for the 5.6 TB raid volumes so I cant downgrade the firmware to an
older version. 

Another step i have taken to try to resolve this is to download the latest
version of the mpt_fusion drivers from LSI Logic (v. 2.05.23). After installing
them i removed the 2 other fusion module versions from
/lib/modules/kernel/drivers/addon/fusion_020505 and fusion_020511 and also
removed all references to them from the modules.dep file to make sure that the
new modules that i installed would be the ones loaded. However the problem
persisted even after installing the latest fusion drivers.
Comment 9 Tom Coughlan 2005-05-04 08:15:17 EDT
Woops, I didn't notice those multi-TB disks the first time through.

The current max. disk size in RHEL 3 is 1 TB. In RHEL 3 Update 5 (U5) we are
planning to increase this to 2 TB. This is the limit for the SCSI subsystem in
the 2.4 kernel and is not expected to be increased. (RHEL 4 supports larger disks.)

I am not sure why the large disks worked with 2.4.21-27 and not 27.0.2 or
27.0.4. There should not be any difference, and the fact they worked with
2.4.21-27 is surprising. 

Please confirm that the disks that are not being configured are indeed > 2 TB. 

U5 is in beta test, and can be obtained from RHN, if you would like to test it. 
Comment 10 Richard Abbott 2005-05-04 10:40:42 EDT
The multi-TB volume limit i ran into when I was using RHEL 3 Desktop (probably
U0 or U3) but when i installed RHEL 3 AS (U4) that limit seemed to disappear as
the volumes could then be used (before the kernel upgrade to 27.0.2/4) without
any problems. Those volumes were a max of 1.3 to 1.4 TB. 

The new raids that are appearing in the /proc/scsi/scsi but not avaliable for
use each device is 2.14 TB but because I cant access the device I have yet to
partition them. 

So this limit multi TB limit is on the device itself? and not on each volume?

I am downloading E3 AS U5 right now and will let you know if this solves the
problem. Thanks
Comment 11 Richard Abbott 2005-05-05 12:11:45 EDT
Created attachment 114059 [details]
requested info for RHEL3AS U5

This is the info from the system that I installed RHEL3 U5 on. U5 appears to
have sorta fixed the problem. 

After installing U5 I connected one of the new raids which is 2.14 TB directly
into the fibre controller and it was detected correctly and I was able to
partition and build the filesystem on it as expected. 

I then plugged that raid with the new file system into the fibre network and
connected the U5 system into the fibre network. After rebooting the U5 system
detected all of the raids and were listed in the /proc/scsi/scsi and according
to the boot messages all had been assigned a drive letter /dev/sd[a-o]. However
in the /proc/partitions only /dev/sd[a-c] and /dev/sdp(local internal hard
drive) were avaliable. 

Additionally after looking at the /proc/scsi/scsi there are 5 devices listed
there for Apple Xserv RAID with a Rev: of 1.26. These numbers are matching up
against the firmware version of the X-Raids themselves however in the list
there are 5 that have a version of 1.26 while there are only 4 on the network
with that version number.

With U5 I am able to use the >2 TB raids however the problem where not
everything is avaliable still exists.
Comment 12 Tom Coughlan 2005-05-05 15:12:01 EDT
As I said, the 2.4 kernel is limited to 2 TB disk devices. This is becaue of
limitations in the SCSI subsystem:

Typical disk devices are addressed in units of 512 byte blocks. The size of the
address in the SCSI command determines the maximum device size. The SCSI command
set includes commands that have 16-bit block addresses (device size is limited
to 2 GB), 32-bit block addresses (limited to addressing 2 TB), and 64-bit block
addresses. The SCSI subsystem in the 2.4 kernel does not have support for
commands with 64-bit block addresses. This support is in the 2.6 kernel. 

Devices > 2 TB are not supported in RHEL 3. 
Comment 13 Richard Abbott 2005-05-05 15:25:38 EDT
Ok. The main problem of this bug that i posted got sidetracked by the large size
of a couple of the devices that we have on our network. 

Aside from the 4 of my 16 devices being greater than 2 TB. Which isnt a problem
anymore. My main problem was that many of the scsi devices are for some reason
not avaliable after they are being detected. As can be seen in the last file
that i posted we have 15 scsi devices that are seen by the scsi controller.
Ignore the 4 that are greater than 2 TB and I am left with 11 that we can deal
with. Of those 11 devices only 2 are making it into the list of devices that are
avaliable. 

All of the devices are listed in the /proc/scsi/scsi however for some reason
they are missing from the /proc/partitions and there are no error messages
anywhere to explain why these devices are being dropped. I would appreciate any
help you can provide solving that problem.
Comment 14 Tom Coughlan 2005-05-05 17:58:48 EDT
Well, the failure happens right after the first 2 TB device:

SCSI device sdc: 4294967294 512-byte hdwr sectors (2199997 MB)

Can you try a test with no >= 2 TB devices? Or reconfigure to see if the
problems follows the large devices?

> however in the list there are 5 that have a version of 1.26 while there are 
> only 4 on the network with that version number

Are you saying that an extra disk is being configured, or that the revision of
an existing disk is incorrect?
Comment 15 Richard Abbott 2005-05-06 12:03:42 EDT
>Are you saying that an extra disk is being configured, or that the revision of
>an existing disk is incorrect?

Actually after some testing I discovered that no matter which of the 4 devices
that have a firmware version of 1.26 is plugged in, the first one is always
discovered and listed twice in /proc/scsi/scsi. 

I have removed the 4 devices devices with a firmware of 1.26 because these are
the ones that are >2TB and RHEL3 does not support. So this problem  I will work
out later on my own.


>Can you try a test with no >= 2 TB devices? Or reconfigure to see if the
>problems follows the large devices?

With booting the system with the >2TB devices removed the problem where not all
of the scsi devices are avaliable persists. 

The first 3 scsi devices, Apple Xserv raids ~1.4TB, are discovered and given
drive letters sd[a-c]. The next 5 devices, also Apple Xserv raids ~1.4TB, are
discovered and listed in /proc/scsi/scsi and according to the boot messages are
mapped to drive letters sd[d-h]. The final 2 scsi devices, more raid arrays
~1.7TB but a different brand than Apple, are discovered, mapped to drive letters
sd[i,j], and are avaliable.

When boot is finished I end up with all devices discovered correctly and listed
in /proc/scsi/scsi, however, only the first 3 Apple raids and the Attaboy raids
are avaliable for use.

All of the devices do have valid filesystems built on them.
Comment 17 RHEL Product and Program Management 2007-10-19 15:03:55 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.