Bug 464636 - scsi scan not waiting long enough
Summary: scsi scan not waiting long enough
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mkinitrd
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Peter Jones
QA Contact: Release Test Team
URL:
Whiteboard:
: 475943 475965 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-29 19:15 UTC by Bill Peck
Modified: 2013-01-28 00:20 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 22:12:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Possible fix (1.34 KB, patch)
2008-11-10 22:20 UTC, Peter Jones
no flags Details | Diff
console log bootup after install (10.59 KB, application/octet-stream)
2008-11-24 15:09 UTC, Bill Peck
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0237 0 normal SHIPPED_LIVE mkinitrd bug fix and enhancement update 2009-01-20 16:06:39 UTC

Description Bill Peck 2008-09-29 19:15:06 UTC
Description of problem:
Installed an IBM JS20 with multi-path on fibre channel storage.

Install went fine with mpath passed to cmdline options but on bootup the system did not wait long enough for the scsi disks to show up before scanning for logical volumes.

Version-Release number of selected component (if applicable):
RHEL-5.2-20080926 build

  
Actual results:

SCSI subsystem initialized
Loading sd_mod.ko module
Loading scsi_transport_fc.ko module
Loading qla2xxx.ko module
QLogic Fibre Channel HBA Driver
qla2xxx 0000:01:01.0: Found an ISP2312, irq 24, iobase 0xd000080080021000
qla2xxx 0000:01:01.0: Configuring PCI space...
qla2xxx 0000:01:01.0: Configure NVRAM parameters...
qla2xxx 0000:01:01.0: Verifying loaded RISC code...
qla2xxx 0000:01:01.0: Extended memory detected (512 KB)...
qla2xxx 0000:01:01.0: Resizing request queue depth (2048 -> 4096)...
qla2xxx 0000:01:01.0: Allocated (1308 KB) for firmware dump...
scsi0 : qla2xxx
qla2xxx 0000:01:01.0:
QLogic Fibre Channel HBA Driver: 8.02.00-k5-rhel5.3-01
 QLogic IBM FCEC -
 ISP2312: PCI-X (133 MHz) @ 0000:01:01.0 hdma-, host#=0, fw=3.03.26 IPX
qla2xxx 0000:01:01.1: Found an ISP2312, irq 25, iobase 0xd000080080030000
qla2xxx 0000:01:01.1: Configuring PCI space...
qla2xxx 0000:01:01.1: Configure NVRAM parameters...
qla2xxx 0000:01:01.1: Verifying loaded RISC code...
qla2xxx 0000:01:01.1: Extended memory detected (512 KB)...
qla2xxx 0000:01:01.1: Resizing request queue depth (2048 -> 4096)...
qla2xxx 0000:01:01.1: Allocated (1308 KB) for firmware dump...
scsi1 : qla2xxx
qla2xxx 0000:01:01.1:
QLogic Fibre Channel HBA Driver: 8.02.00-k5-rhel5.3-01
 QLogic IBM FCEC -
 ISP2312: PCI-X (133 MHz) @ 0000:01:01.1 hdma-, host#=1, fw=3.03.26 IPX
Loading shpchp.ko module
shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7460 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
Loading dm-mod.kdevice-mapper: uo module
event: version 1.0.3
device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel
Loading dm-log.ko module
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
Loading scsi_dh.ko module
Loading dm-multidevice-mapper: multipath: version 1.0.5 loaded
path.ko module
Loading dm-round-robin.ko module
device-mapper: multipath round-robin: version 1.0.0 loaded
Waiting for driver initialization.
Creating multipath devices
No devices found
Scanning and configuring dmraid supported devices
Scanning logical volumes
 Reading all physical volumes.  This may take a qla2xxx 0000:01:01.0: LOOP UP detected (2 Gbps).
while...
 Couldn't find device with uuid '41wu8g-7m7I-FX3B-E368-UTeW-zkKA-mqAFGN'.
 Found volume group "VolGroup00" using metadata type lvm2
Activating logical volumes
 Couldn't find device with uuid '41wu8g-7m7I-FX3B-E368-UTeW-zkKA-mqAFGN'.
 Couldn't find device with uuid '41wu8g-7m7I-FX3B-E368-UTeW-zkKA-mqAFGN'.
 Refusing activation of partial LV LogVol00. Use --partial to override.
 Couldn't find device with uuid '41wu8g-7m7I-FX3B-E368-UTeW-zkKA-mqAFGN'.
 Refusing activation of partial LV LogVol01. Use --partial to override.
 0 logical volume(s) in volume group "VolGroup00" now active
Creating root device.
Mounting root filesystem.
mount: could not find filesystem '/dev/root'
Setting up other filesystems.
Setting up new root fs
setuproot: moving /dev failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or  Vendor: HP        Model: HSV200            Rev: 5000
 Type:   RAID                               ANSI SCSI revision: 05
 Vendor: HP        Model: HSV200            Rev: 5000
 Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 20971520 512-byte hdwr sectors (10737 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
SCSI device sda: 20971520 512-byte hdwr sectors (10737 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
sda: sda1
sd 0:0:0:1: Attached scsi disk sda
 Vendor: HP        Model: HSV200            Rev: 5000
 Type:   RAID                               ANSI SCSI revision: 05
 Vendor: HP        Model: HSV200            Rev: 5000
 Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 20971520 512-byte hdwr sectors (10737 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write through w/ FUA
SCSI device sdb: 20971520 512-byte hdwr sectors (10737 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write through w/ FUA
sdb: sdb1
sd 0:0:1:1: Attached scsi disk sdb
directory
setuproot: error mounting /sys: No such file or directory
Switching to new root and running init.

Comment 2 Peter Jones 2008-10-07 15:18:51 UTC
Do you still have this problem with mkinitrd-5.1.19.6-37  ?

Comment 8 Peter Jones 2008-11-10 22:20:20 UTC
Created attachment 323127 [details]
Possible fix

Can you please test with the affixed patch?

Comment 9 Denise Dumas 2008-11-13 19:42:49 UTC
Brock, Bill, Arlinton - we dont have this hardware or anything like it, can one of you please check that this patch fixes things for you? If possible, we'd like to get this reviewed and submitted for Snap4 which is Wednesday ...
Thanks

Comment 10 Brock Organ 2008-11-17 18:01:26 UTC
I don't have the hardware either, will ping bpeck about this issue ...

Regards,

Brock

Comment 11 Bill Peck 2008-11-17 18:09:23 UTC
What am I supposed to do with the patch?  If you provide me an updates.img I can work with that.

Comment 12 Peter Jones 2008-11-17 18:59:45 UTC
Use the repo at http://people.redhat.com/pjones/mkinitrd-464636/ during installation.

Comment 13 Denise Dumas 2008-11-24 13:55:00 UTC
Did it work, Bill? If so, we can get this patch in for Wednesday Snap5...

Comment 14 Bill Peck 2008-11-24 15:08:25 UTC
The job failed but it looks like it was because of multi-path.  Looking at the log it does show the disks being loaded before scanning for the Volume Groups.

I'm attaching the log for you to make a decision with.

Comment 15 Bill Peck 2008-11-24 15:09:03 UTC
Created attachment 324488 [details]
console log bootup after install

Comment 17 Peter Jones 2008-12-02 16:55:08 UTC
This should be fixed in 11.1.2.160-1 .

Comment 19 Cameron Meadors 2008-12-09 19:29:48 UTC
I noticed a difference in the latest snapshot regarding this behavior. I am seeing "Could not detect stabilization, waiting 10 seconds" just before the drives are scanned.  I was not seeing this before on the same hardware.

I tried to time the pause in the 124 kernel that happens at the same place as this new error message and it seems like it takes the same amount of time.  So this is just a new error message.  I just wanted to make sure that the effects on other systems that did not have problems was recognized.

Comment 20 Cameron Meadors 2008-12-09 19:30:17 UTC
latest snapshot being 5 with kernel 125

Comment 21 Chris Lalancette 2008-12-10 17:16:55 UTC
I'm also seeing the "Could not detect stabilization, waiting 10 seconds" message, and it is most certainly taking longer on my hardware to initialize.  With the 5.2 mkinitrd, it took around 4-5 seconds; with this mkinitrd, it now takes that + 10 seconds.

(I'm switching to FAILS_QA at the moment; not sure if that is the right BZ state)

Chris Lalancette

Comment 22 Denise Dumas 2008-12-10 17:19:50 UTC
Chris and Cameron, did it wait long enough for your scsi disks to be detected? The "Could not detect stabilization" message is expected behavior and release noted already, a Fedora patch was backported that waits for a maximum time. 

If we wait long enough for the disks to be detected, that solves the reported problem.

Comment 23 Chris Lalancette 2008-12-10 17:40:09 UTC
Yes, it did, but as I said, it adds an additional 10 second wait that wasn't there before (I never saw this particular bug on this hardware before).  I guess the thing is: either we are going to unconditionally wait 10 seconds (which is bad behavior in my opinion, but maybe the only way to fix the bug), or we aren't.  If we are going to do the unconditional wait, then no message is necessary.

Chris Lalancette

Comment 24 Denise Dumas 2008-12-10 20:09:04 UTC
It's not always a 10 second wait, that's the max. But no way around the wait until we get a kernel change, and that won't happen in rhel5. 
So per dmaley/dmair, we're going to remove the informational message in Snap6.

Comment 25 Peter Jones 2008-12-10 20:38:52 UTC
Message removed in mkinitrd-5.1.19.6-42 .

Comment 27 Hans de Goede 2008-12-11 09:47:28 UTC
*** Bug 475943 has been marked as a duplicate of this bug. ***

Comment 28 Hans de Goede 2008-12-11 11:15:52 UTC
*** Bug 475965 has been marked as a duplicate of this bug. ***

Comment 30 errata-xmlrpc 2009-01-20 22:12:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0237.html


Note You need to log in before you can comment on or make changes to this bug.