This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 157082 - Drives aren't identified by unique identifiers during boot
Drives aren't identified by unique identifiers during boot
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mkinitrd (Show other bugs)
5.1
All Linux
urgent Severity urgent
: ---
: ---
Assigned To: David Cantrell
: FutureFeature, OtherQA, Reopened
: 173994 208332 (view as bug list)
Depends On: 198201
Blocks: 228988 182355 182356 200222 217104 220653 227613 228021 230627
  Show dependency treegraph
 
Reported: 2005-05-06 13:21 EDT by Heather Conway
Modified: 2011-05-05 05:43 EDT (History)
31 users (show)

See Also:
Fixed In Version: RHBA-2007-0656
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 12:58:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Steps to achieve persisten binding (6.80 KB, text/plain)
2005-07-15 14:38 EDT, Hari Kannan
no flags Details

  None (edit)
Description Heather Conway 2005-05-06 13:21:36 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; EMC IS 55; .NET CLR 1.0.3705; .NET CLR 1.1.4322)

Description of problem:
When booting from the SAN without persistent binding, GRUB will use /dev/sda as the boot device.  If there is a problem and /dev/sda slides, GRUB will no longer find the / root device and will fail to boot.
Persistent binding via udev should be used.  However, udev starts after the ramdisk loads.  
As a result, if there is a device shift and the device that was /dev/sda is no longer sda, then the root device won't be detected properly.
The testing performed has been with the Emulex HBAs and 8.x series driver.
Persistent binding seems to work well post boot. For instance, an automount using the persistent scsi_id in the /dev/disks/by-id/ directory can be used instead of the device name in the /dev/ tree. This functionality has been
verified.
The testing performed has been with the Emulex HBAs and 8.x series driver.

Version-Release number of selected component (if applicable):
kernel 2.6.5-5.EL*

How reproducible:
Always

Steps to Reproduce:
1.  Configure a RHEL 4.0 host to boot from a SAN device.
2.  Enable persistent binding via udev.
3.  Add a device to the host so that the boot device is no longer /dev/sda.
4.  Reboot the host.
  

Actual Results:  Since /dev/sda is now another device, the host fails to boot.

Expected Results:  Is there another mechanism available by which a host can persistently boot from a SAN disk?   

Additional info:
Comment 1 Harald Hoyer 2005-05-09 09:30:30 EDT
grub cannot use udev devices.
as for the kernel you should use filesystem labels ( e.g. root=LABEL=/ROOT )
Comment 2 Hari Kannan 2005-07-15 14:38:30 EDT
Created attachment 116817 [details]
Steps to achieve persisten binding
Comment 3 Hari Kannan 2005-07-15 14:40:08 EDT
The solution to this requires the following steps:

1. Inclusion of statically linked udev/scsi_id executables in the initrd
2. Edits to the udev config file to include lines that will use the scsi_id to 
create entries for scsi devices. 
3. Inclusion of the udev config files in the RAM disk. With this present 
when "udevstart" is called during the RAM disk load time, udev names will be 
populated for the scsi devices in the devfs tree as specified in the config 
file.

If all these components are present, then the root device can be specified 
as "root=disk-360060160c5101100009fe284a90dd911" or something similar.

In our three way call with Emulex/EMC/RedHat, Tom Coughlan had indicated that 
statically linked udev and scsi_id will be included in the RAM disk by default.
Tom was to get back to us regarding the inclusion of the config files for udev 
in the etc directory during the creation of RAM disk. [A change may be needed 
in mkinitrd to accomadate this request.] 

If the config files are included then the edits to the udev config files can be 
made by the end user to achieve persistent/reliable booting. [That is Step 2 
above need not be done by default, rather it can be user specified.]

With the 2.6 kernel, HBA drivers can no longer provide persistence for devices. 
In the absence of such a mechanism, booting from SAN reliably is not feasible.

Please let us know what further information is needed from us to assist in this 
process.

Thanks and Regards,
Hari Kannan
System Integration Engineer
eLab - Linux
EMC



Comment 4 Hari Kannan 2005-07-19 19:35:05 EDT
Hi Tom, 
Any updates on this one?
Thanks,
Hari
Comment 11 Heather Conway 2005-08-18 15:52:37 EDT
Changing the severity to high and requesting that the request  be considered.  
Without a persistent binding mechanism in the OS, booting from the SAN will be 
unstable and potentially difficult for customers to implement properly.  EMC 
considers booting from the SAN an important functionality and we would like to 
be able to provide support for it with both PowerPath and native MPIO.  
Thanks for your consideration on this issue.
-H
Comment 12 Samuel Benjamin 2005-08-25 19:45:35 EDT
Per a request from Dell, I have discussed with RH engineering and attempted to
to bring out some of the questions and concerns that have been expressed in this
case. It would be beneficial for EMC to take into consideration some of the
limitatons of the proposed solution and address the questions that RH
engineering has raised.

Status for IT 72398 / BZ 157082 : Comments from Engineering 

U2 will include statically linked udev and scsi_id.  Now we can get persistent
device names at boot time by doing the following ;

1. opening up the initrd,
2. adding a /etc/udev/rules.d file,
3. modifying the /etc/scsi_id.config file, and then
4. rebuilding the initrd.

What we would really like to do now is avoid the requirement to unpack and then
rebuild the initrd to accomplish this. Is there a way for the customer to make
his udev rule and modified scsi_id.config available at mkinitrd time, so it gets
put into the initrd more conveniently?
---------------------------------
No, there's no way to specify something like this right now.  And doing so in a
way that is guaranteed to work is impossible. 

I'm not sure how you expect grub to find the right device if you're moving your
devices around,  The user is changing their hardware, adding in another device
that claims to be the boot device, and the OS is believing it. We need to
understand why they are doing so?

There's really nothing going on during booting that uses "/dev/sda"
Use the "root=LABEL=/" (the default), and new disks won't be a problem.

Does the BIOS bootable disk list change?
Is the original disk still at the same position in it?

--------------------------------------
Comment 13 Harald Hoyer 2005-08-26 03:06:56 EDT
One could also LABEL the root partitions with specific names to avoid name clashes.
Comment 14 Hari Kannan 2005-08-31 10:51:20 EDT
RH: "And doing so in a way that is guaranteed to work is impossible. "
We have succesfully been able to boot from SAN by making this modification. Have
there been any tests at RedHat that reflect the above statement?

RH: "I'm not sure how you expect grub to find the right device if you're moving your
devices around"
The Emulex/QLogic Bios determines which device is available for boot. Even if
more devices are added to the system, the HBA BIOS will only make available
those devices that have been configured in the BIOS. Thus the MBR on the
configured device is the only one thats available and hence the system detects
it and GRUB loads. In grub.conf is where the issue comes up. 
The device.map has the entry that positions the location of initrd and vmlinuz. 
Thus the RAM disk gets properly loaded.

As part of the RAM disk, the driver for the HBA gets loaded. At this point,
devices are detected and the device nodes are created in the devfs tree.
Multiple issues can arise here:
1. The user could have added more devices to the system.
2. If more than one array has been zoned or if the order in which the targets
are determnined change then the device nodes that are created are not consistent
across reboots. Please note that there are no mechanisms currently within the
2.6 kernel, or the drivers thereof,  to determine and ensure that the order in
which the targets are detected are unique.
Hence the proposed solution to use persistence via udev. The proposed solution
will use the scsi_id of the devices to determine the boot device. Since the
scsi_id is an unique signature it ensures that the system will boot up.

RH: "The user is changing their hardware, adding in another device that claims
to be the boot device, and the OS is believing it. We need to understand why
they are doing so?" 
Its not that other devices are claiming to be the boot device. It is that
whatever has been specified in GRUB to be the boot device is not necessarily
accurate everytime the system boots. That is the crux of the issue and what
needs to be fixed.

RH: "There's really nothing going on during booting that uses "/dev/sda" "
The driver loads in the RAM disk and that changes everything.

RH: "Use the "root=LABEL=/" (the default), and new disks won't be a problem."
Mounting by specifying LABEL names does not work. There are numerous bugs that
have been opened on this issue already - including 116300. The crux of the issue
here is that in any multipathing solution there will be more than one path to
the same device. However, mount currently fails when more than one device
presents the same label name which is the case in any multipathing solution.
[Also in the default installation, the labels that are created are always the
same. It might be useful to use a random number generator in the future to
determine label names] 

RH: "Does the BIOS bootable disk list change? Is the original disk still at the
same position in it?"
Already answered. The disk that has been configured to boot at the BIOS level
will not change.

A reliable boot mechanism is crucial in a SAN environment. This issue is
especially needed in Blade server deployments where the customers rely on the
SAN for booting.

Please write back with any further questions.

Regards,
Hari Kannan
eLAB - EMC Corporation
Comment 16 Tom Coughlan 2005-09-09 09:54:37 EDT
> RH: "And doing so in a way that is guaranteed to work is impossible. "
> We have succesfully been able to boot from SAN by making this modification. Have
> there been any tests at RedHat that reflect the above statement?

I have been successful in simple configurations, like booting off a
single disk. The potential problems arise with composite devices, like
multipath, LVM and RAID. This is why we prefer to use filesystem or LVM
VolGroup labels in grub.conf.

> RH: "I'm not sure how you expect grub to find the right device if you're
moving your
> devices around"
> The Emulex/QLogic Bios determines which device is available for boot. Even if
> more devices are added to the system, the HBA BIOS will only make available
> those devices that have been configured in the BIOS. Thus the MBR on the
> configured device is the only one thats available and hence the system detects
> it and GRUB loads. In grub.conf is where the issue comes up.
> The device.map has the entry that positions the location of initrd and vmlinuz.
> Thus the RAM disk gets properly loaded.
>
> As part of the RAM disk, the driver for the HBA gets loaded. At this point,
> devices are detected and the device nodes are created in the devfs tree.
> Multiple issues can arise here:
> 1. The user could have added more devices to the system.
> 2. If more than one array has been zoned or if the order in which the targets
> are determnined change then the device nodes that are created are not consistent
> across reboots. Please note that there are no mechanisms currently within the
> 2.6 kernel, or the drivers thereof,  to determine and ensure that the order in
> which the targets are detected are unique.
> Hence the proposed solution to use persistence via udev. The proposed solution
> will use the scsi_id of the devices to determine the boot device. Since the
> scsi_id is an unique signature it ensures that the system will boot up.

If the root= parameter in Grub.conf identifies a unique filesystem label
or VolGroup name, then the system should continue to boot even if the
physical device names change.

> RH: "The user is changing their hardware, adding in another device that claims
> to be the boot device, and the OS is believing it. We need to understand why
> they are doing so?"
> Its not that other devices are claiming to be the boot device. It is that
> whatever has been specified in GRUB to be the boot device is not necessarily
> accurate everytime the system boots. That is the crux of the issue and what
> needs to be fixed.
>
> RH: "There's really nothing going on during booting that uses "/dev/sda" "
> The driver loads in the RAM disk and that changes everything.
>
> RH: "Use the "root=LABEL=/" (the default), and new disks won't be a problem."
> Mounting by specifying LABEL names does not work. There are numerous bugs that
> have been opened on this issue already - including 116300.

I believe that is fixed in the next Update. If there are other issues,
we should work to get them fixed.

> The crux of the issue
> here is that in any multipathing solution there will be more than one path to
> the same device. However, mount currently fails when more than one device
> presents the same label name which is the case in any multipathing solution.
> [Also in the default installation, the labels that are created are always the
> same. It might be useful to use a random number generator in the future to
> determine label names]

From what I have seen, the current code creates a label that is unique
among all the currently visible labels (by incrementing "n" in /n or
VolGroupnn as needed). If new storage is added to the SAN later, then
the system manager must ensure that these labels remain unique. We are
also considering adding the ability to use root=UUID=, so the system
manager does not need to manage the labels.

> RH: "Does the BIOS bootable disk list change? Is the original disk still at the
> same position in it?"
> Already answered. The disk that has been configured to boot at the BIOS level
> will not change.
>
> A reliable boot mechanism is crucial in a SAN environment. This issue is
> especially needed in Blade server deployments where the customers rely on the
> SAN for booting.

Our goal is to make the boot by label method work reliably in a SAN
environment. If there are problems in addition to BZ 116300, we would
like to resolve them.

If users insist on using a persistent device name in grub.conf, then, as
of U2, this can be done relatively easily after the system is installed
(open the initrd, add a /etc/udev/rules.d file, modify
/etc/scsi_id.config, then rebuild initrd). We do not intend to modify
the initd to include this by default, because we are not sure this is
the correct direction to head in for the future.

The next step in this process is for you to identify specific problems
with SAN booting in the current RHEL 4 approach, so we can address them.

Comment 30 Harald Hoyer 2005-12-07 11:10:24 EST
ok, here we go...
you need:
ftp://people.redhat.com/harald/udev/udev-039-10.11.EL4/
which should be in the next update release...
and
ftp://people.redhat.com/harald/udev/scsi-id-rootkit/
Follow the instructions in README and please tell me, if that works.
Hope this helps.
Comment 31 Tom Coughlan 2005-12-09 16:01:11 EST
*** Bug 173994 has been marked as a duplicate of this bug. ***
Comment 38 Hari Kannan 2006-01-13 10:50:11 EST
Current proposal to achieve Persistence is to LVM and volume labels.
We are evaluating scsi-id-rootkit in RHEL 4 U3 for persistence using 
Will post any issues to this bz.

Incidentally has any testing been done with MPIO and boot from SAN with LVM?
eLab Linux
Comment 43 Larry Troan 2006-03-02 13:44:09 EST
Bug 183672 may be a DUP of this bug or at least related to it.
Comment 44 Issue Tracker 2006-03-15 16:17:09 EST
Following the direction in /usr/share/doc/scsi-id-rootkit-01/README and
running "udevstart" causes the message below:
  "-s option must be specified"
Also, I don't see any device nodes created in /dev/disk/*



This event sent from IssueTracker by sbenjamin 
 issue 79515
Comment 49 Bob Johnson 2006-04-11 13:02:04 EDT
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.
Comment 51 Peter Jones 2006-04-12 18:19:52 EDT
This is _not_ on the planned work items for RHEL 4.4; it was put on the CanFix
list in error.
Comment 54 Andrius Benokraitis 2006-04-21 10:16:25 EDT
This issue has been discussed within Red Hat Engineering, and the final verdict
is that this item will not be included in RHEL4 (ever). Moving to RHEL5.
Comment 73 RHEL Product and Program Management 2006-09-27 13:15:24 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request. 
Comment 75 Peter Jones 2006-09-27 17:52:41 EDT
*** Bug 208332 has been marked as a duplicate of this bug. ***
Comment 78 Amit Bhutani 2006-10-30 14:50:33 EST
Why was Hari Kannan's email address removed from the CC: field today?
Is EMC not interested in pursuing a fix for this issue anymore ? 
Comment 79 Andrius Benokraitis 2006-10-30 14:55:30 EST
Amit - Hari is no longer working for eLabs at EMC, his replacement is Wayne
included in the CC-list.
Comment 95 John Poelstra 2007-08-14 20:20:06 EDT
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.
Comment 96 John Poelstra 2007-08-24 01:26:47 EDT
A fix for this issue should have been included in the packages contained in the
most recent snapshot (partners.redhat.com) for RHEL5.1.  

Requested action: Please verify that your issue is fixed as soon as possible to
ensure that it is included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

More assistance: If you cannot access bugzilla, please reply with a message to
Issue Tracker and I will change the status for you.  If you need assistance
accessing ftp://partners.redhat.com, please contact your Partner Manager.
Comment 97 Stephanie Glass 2007-08-28 14:31:29 EDT
I have been asked if Red Hat has any setup procedure to install SAN BOOT on 
RHEL 5.1
Comment 98 John Poelstra 2007-08-30 20:29:16 EDT
A fix for this issue should have been included in the packages contained in the
RHEL5.1-Snapshot4 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed *as soon as possible*
to ensure that it is included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.
Comment 99 John Poelstra 2007-09-11 15:22:56 EDT
A fix for this issue should have been included in the packages contained in the
RHEL5.1-Snapshot6 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed ASAP to confirm that it
will be included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.
Comment 102 errata-xmlrpc 2007-11-07 12:58:08 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0656.html

Note You need to log in before you can comment on or make changes to this bug.