198666 – Startup race: could not find filesystem '/dev/root'

Bug 198666 - Startup race: could not find filesystem '/dev/root'

Summary: Startup race: could not find filesystem '/dev/root'

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Mike Christie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	420521 422431 422441
TreeView+	depends on / blocked

Reported:	2006-07-12 16:59 UTC by Bryan Stillwell
Modified:	2007-12-12 21:11 UTC (History)
CC List:	17 users (show)
Fixed In Version:	5.0
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-12-07 19:14:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Boot output with kernel panic (25.21 KB, text/plain) 2006-07-12 16:59 UTC, Bryan Stillwell	no flags	Details
Diff between a good and bad boot of rhel5b2 (1.41 KB, text/plain) 2006-11-29 19:53 UTC, Bryan Stillwell	no flags	Details
View All

Description Bryan Stillwell 2006-07-12 16:59:09 UTC

Description of problem:
When doing a reboot test on an sx1000-based Superdome using storage located off
an 4Gb Emulex card, I received the attached kernel panic.

Version-Release number of selected component (if applicable):
2.6.16-1.2290_EL

How reproducible:
I've only seen it once so far and that was 7 hours into a 12 hour reboot test.

Steps to Reproduce:
1. Install rhel5a1 on to a ia64-based system using a 4Gb Emulex card for storage
(I used a Superdome with an AD167A)
2. Set up the machine to reboot continuously for 24 hours
3. Come back the next day and see the panic
  
Actual results:
Kernel panic

Expected results:
12 hours of successful reboots

Additional info:
This exact same config was tested with rhel4u4b2 the day before without any issues.

Comment 1 Bryan Stillwell 2006-07-12 16:59:09 UTC

Created attachment 132315 [details]
Boot output with kernel panic

Comment 3 Linda Wang 2006-09-25 20:49:51 UTC

Can you give the latest Beta2 kernel a try?  2.6.18-1.2685.el5

Comment 4 Andrius Benokraitis 2006-11-03 22:23:12 UTC

Bryan, any news on trying this?

Comment 5 Bryan Stillwell 2006-11-10 18:26:43 UTC

Andrius,

This might be a superdome or cellular based systems problem.  I tried to
reproduce it on an rx6600 with the rhel5a1 code and it didn't have any issues. 
I just reserved some time on a superdome again and will attempt to reproduce
this there and then I'll try the newer kernel after that.

Thanks,
Bryan

Comment 6 Andrius Benokraitis 2006-11-10 18:50:32 UTC

Bryan, thanks for the update... keep us posted!

Comment 7 Bryan Stillwell 2006-11-29 19:53:03 UTC

Created attachment 142417 [details]
Diff between a good and bad boot of rhel5b2

While it's not the same panic, it appears there's a race in the
initialization code for FC drives in rhel5b2...  See the attached diff
that shows the difference between a good and bad boot.	Note: none of
the partitions were lvm.

Bryan

Comment 8 Tom Coughlan 2006-11-29 20:06:53 UTC

James, Mike,

Have you seen this race on startup?

Tom

Comment 9 Tom Coughlan 2006-11-30 16:11:45 UTC

James Smart from Emulex says that the panic reported in comment #1 is fixed in
RHEL 5 Beta 2.

The problem reported in Comment #7 is new, and has not been seen previously. 

I'm changing the summary to reflect the new problem. 

Bill, Jeremy, Peter, it looks like we are getting "Creating root device" before
the device is configured by the kernel. Any thoughts?

Comment 10 Jeremy Katz 2006-11-30 16:58:09 UTC

(In reply to comment #9)
> Bill, Jeremy, Peter, it looks like we are getting "Creating root device" before
> the device is configured by the kernel. Any thoughts?

I expect it's related to bug 213039 (ie, there's no way for userspace to know
when the kernel is actually done scanning).

Comment 11 RHEL Program Management 2006-11-30 17:01:37 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 12 Tom Coughlan 2006-12-01 03:04:28 UTC

James Smart does not have thie priv to write to ths BZ. He also can not read   
213039. I'll try to fix that. In the meantime he replies:

   I mentioned to Tom about an effort at HP to async scan SCSI adapters.

 http://marc.theaimsgroup.com/?l=linux-scsi&m=116343160819034&w=2

   It's intended to parallelize some of the delays, but a side effect
   is that it must coordinate when scan is done before it attempts to
   mount root. It's still in the development stage in the scsi midlayer
   (is in scsi-misc-2.6) but sounds like a good overlap. Unfortunately,
   it may be a little late for RHEL5 inclusion. (and lpfc has yet to finish
   support for it)

(End quote)

Yes, it is too late for RHEL 5. We are going to have to make due with some
hard-coded delays in mkinitrd (or wherever).

Comment 14 Mike Christie 2006-12-08 17:54:19 UTC

(In reply to comment #12)
> James Smart does not have thie priv to write to ths BZ. He also can not read   
> 213039. I'll try to fix that. In the meantime he replies:
> 
>    I mentioned to Tom about an effort at HP to async scan SCSI adapters.
> 
>  http://marc.theaimsgroup.com/?l=linux-scsi&m=116343160819034&w=2
> 
>    It's intended to parallelize some of the delays, but a side effect
>    is that it must coordinate when scan is done before it attempts to
>    mount root. It's still in the development stage in the scsi midlayer
>    (is in scsi-misc-2.6) but sounds like a good overlap. Unfortunately,
>    it may be a little late for RHEL5 inclusion. (and lpfc has yet to finish
>    support for it)
> 
> (End quote)
> 
> Yes, it is too late for RHEL 5. We are going to have to make due with some
> hard-coded delays in mkinitrd (or wherever). 

Tom, are you definately going the user space hard coded delay route for RHEL5?

I was working on slimmed down versions of Mathew's code for RHEL5. If upstream
added new callouts to the scsi_host_template, should I at least send patches to
rh-kernel to add them to RHEL5's host_template, or did you say host template
additions will not break KABI?

Comment 15 Tom Coughlan 2006-12-08 23:36:49 UTC

I'm open to better solutions. It is really late though. 

If you have something to post, then by all means go ahead. The list is the best
place to ask the question about kabi as well.

Comment 16 Mike Christie 2006-12-11 16:24:05 UTC

You are right. I am not going to be able to test every driver for a kernel change.

Comment 17 Tom Coughlan 2007-01-10 18:55:13 UTC

The user space delay was added in 5.0 (bug 213039). This will hopefully avoid
the problem is most situations, while the upstream kernel solution solidifies. I
am setting this BZ to 5.1, so we can add the proper fix there. Mike, when the
kernel  fix is availble, please ask the anaconda team to consider un-doing the
hack in bug 213039.

Comment 18 Bryan Stillwell 2007-03-01 00:38:47 UTC

I just did a 24 hour reboot test on a 2 cell Superdome partition with rhel5rc
and didn't see a kernel panic.  So the delay does appear to prevent the problem
from appearing.

Comment 23 Andrius Benokraitis 2007-06-01 18:48:34 UTC

This is being deferred to RHEL 5.2 due to resource/time constraints, and
priorities. Seems like the workaround still seems to be work in the meantime.

Comment 29 Ronald Pacheco 2007-11-01 14:40:15 UTC

Bryan,

As we continue to review this, we are of the opinion that the userspace timer
sufficiently addresses the original issue and do not plan to puruse this any
further.  Please confirm.

Comment 30 Doug Chapman 2007-11-01 15:24:01 UTC

(In reply to comment #29)
> Bryan,
> 
> As we continue to review this, we are of the opinion that the userspace timer
> sufficiently addresses the original issue and do not plan to puruse this any
> further.  Please confirm.

FYI, Bryan is not currently working with HP so I will answer this...

I have not seen exactly what code was used to address this problem.  I only know
that it was considered a workaround, not a real fix.  Can someone point me to
the patch for this?

Comment 31 Mike Christie 2007-11-01 17:15:00 UTC

(In reply to comment #30)
> (In reply to comment #29)
> > Bryan,
> > 
> > As we continue to review this, we are of the opinion that the userspace timer
> > sufficiently addresses the original issue and do not plan to puruse this any
> > further.  Please confirm.
> 
> FYI, Bryan is not currently working with HP so I will answer this...
> 
> I have not seen exactly what code was used to address this problem.  I only know
> that it was considered a workaround, not a real fix.  Can someone point me to

I am not sure which is better or a real fix.

The kernel fix for async scanning is basically just a wait in the kernel. For
qla2xxx async scanning we wait for loop_reset_delay and then we have the
possible race with the rport addition and actual scanning (qla2xxx_scan_finished
reports when transport scanning is done or timedout but not when the scsi device
scanning is done so it may return before scsi_devices are actually added). With
the kernel fix though, we could just to synchronous scanning, but that is
probably a hack still. Either way we sit around for loop_reset_delay seconds and
at that time if we have devices we have them.

The userspace fix is just a wait in userspace. It affects all drivers though, so
it works for other fc drivers, but it affects other that may not need it.

I think Peter Jones had a idea that is better than the existing kernel one and
the userspace fix we are using, but that seemed like a ways off.

> the patch for this?
> 
I do not think we have a patch. I think some code was just merged in the
mkinitrd release that you tested for bz 213039.

Comment 32 Chip Coldwell 2007-11-07 18:11:28 UTC

should bug 209160 be marked as a duplicate of this?

Chip

Comment 33 Mike Christie 2007-11-07 22:00:16 UTC

(In reply to comment #32)
> should bug 209160 be marked as a duplicate of this?
> 

What is 209160 for? Was it for multipath bugs that were a result of async
scanning? I thought there were two bugs with multipath boot:

1. async scanning causes a device's names (/dev/sX) and major minor numbers to
change between boots. This was bad for the initial multipath boot code back in
5.0 beta, because the multipath boot code was relying major minor numbers to be
the same. I think Peter Jones or someone fixed that by having multipath assemble
devices for boot using uuid like is done with the non-boot multipath setup.

2. Previously, userspace assumed that when a module was done loading the devices
were added and ready to go, but async scanning causes the module loading to
return before devices are found. This causes multipath boot not to find devices.
I thought this was fixed with the wait fix in this bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=213039

Comment 34 Tom Coughlan 2007-12-07 19:14:09 UTC

To recap: RHEL 5.0 implements a delay in userspace to allow the drivers to
finish configuring devices. Testing of this with as many as 2K devices, plus
some time in the field, indicates that this workaround seems to be adequate.
This has remained true throughout the introduction of multipath boot in 5.1. 

Longer term, there is some work being done upstream to interlock the kernel and
userspace configuration actions, so we will not need to depend on a hard-coded
delay. 

I propose that we leave RHEL 5 as it is, and expect to inherit the improved
implementation in RHEL 6. If problems eventually arise on RHEL 5, we will
consider backporting the functionality then.

Note You need to log in before you can comment on or make changes to this bug.