Bug 157082
| Summary: | Drives aren't identified by unique identifiers during boot | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Heather Conway <conway_heather> | ||||
| Component: | mkinitrd | Assignee: | Dave Cantrell <dcantrell> | ||||
| Status: | CLOSED ERRATA | QA Contact: | |||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 5.1 | CC: | andriusb, berthiaume_wayne, bnocera, casmith, coldwell, coughlan, djuran, gbarros, harald, hgarcia, i-kitayama, james.smart, jfeeney, jlaska, jnomura, jrevilla, junichi.nomura, kanderso, katzj, kueda, kueda, kzak, laroche, nayfield, ogren_chris, perez-kolk_santiago, poelstra, rkenna, sbenjamin, sglass, wwlinuxengineering | ||||
| Target Milestone: | --- | Keywords: | FutureFeature, OtherQA, Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHBA-2007-0656 | Doc Type: | Enhancement | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2007-11-07 17:58:08 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 198201 | ||||||
| Bug Blocks: | 182355, 182356, 200222, 217104, 220653, 227613, 228021, 228988, 230627 | ||||||
| Attachments: |
|
||||||
|
Description
Heather Conway
2005-05-06 17:21:36 UTC
grub cannot use udev devices. as for the kernel you should use filesystem labels ( e.g. root=LABEL=/ROOT ) Created attachment 116817 [details]
Steps to achieve persisten binding
The solution to this requires the following steps: 1. Inclusion of statically linked udev/scsi_id executables in the initrd 2. Edits to the udev config file to include lines that will use the scsi_id to create entries for scsi devices. 3. Inclusion of the udev config files in the RAM disk. With this present when "udevstart" is called during the RAM disk load time, udev names will be populated for the scsi devices in the devfs tree as specified in the config file. If all these components are present, then the root device can be specified as "root=disk-360060160c5101100009fe284a90dd911" or something similar. In our three way call with Emulex/EMC/RedHat, Tom Coughlan had indicated that statically linked udev and scsi_id will be included in the RAM disk by default. Tom was to get back to us regarding the inclusion of the config files for udev in the etc directory during the creation of RAM disk. [A change may be needed in mkinitrd to accomadate this request.] If the config files are included then the edits to the udev config files can be made by the end user to achieve persistent/reliable booting. [That is Step 2 above need not be done by default, rather it can be user specified.] With the 2.6 kernel, HBA drivers can no longer provide persistence for devices. In the absence of such a mechanism, booting from SAN reliably is not feasible. Please let us know what further information is needed from us to assist in this process. Thanks and Regards, Hari Kannan System Integration Engineer eLab - Linux EMC Hi Tom, Any updates on this one? Thanks, Hari Changing the severity to high and requesting that the request be considered. Without a persistent binding mechanism in the OS, booting from the SAN will be unstable and potentially difficult for customers to implement properly. EMC considers booting from the SAN an important functionality and we would like to be able to provide support for it with both PowerPath and native MPIO. Thanks for your consideration on this issue. -H Per a request from Dell, I have discussed with RH engineering and attempted to to bring out some of the questions and concerns that have been expressed in this case. It would be beneficial for EMC to take into consideration some of the limitatons of the proposed solution and address the questions that RH engineering has raised. Status for IT 72398 / BZ 157082 : Comments from Engineering U2 will include statically linked udev and scsi_id. Now we can get persistent device names at boot time by doing the following ; 1. opening up the initrd, 2. adding a /etc/udev/rules.d file, 3. modifying the /etc/scsi_id.config file, and then 4. rebuilding the initrd. What we would really like to do now is avoid the requirement to unpack and then rebuild the initrd to accomplish this. Is there a way for the customer to make his udev rule and modified scsi_id.config available at mkinitrd time, so it gets put into the initrd more conveniently? --------------------------------- No, there's no way to specify something like this right now. And doing so in a way that is guaranteed to work is impossible. I'm not sure how you expect grub to find the right device if you're moving your devices around, The user is changing their hardware, adding in another device that claims to be the boot device, and the OS is believing it. We need to understand why they are doing so? There's really nothing going on during booting that uses "/dev/sda" Use the "root=LABEL=/" (the default), and new disks won't be a problem. Does the BIOS bootable disk list change? Is the original disk still at the same position in it? -------------------------------------- One could also LABEL the root partitions with specific names to avoid name clashes. RH: "And doing so in a way that is guaranteed to work is impossible. " We have succesfully been able to boot from SAN by making this modification. Have there been any tests at RedHat that reflect the above statement? RH: "I'm not sure how you expect grub to find the right device if you're moving your devices around" The Emulex/QLogic Bios determines which device is available for boot. Even if more devices are added to the system, the HBA BIOS will only make available those devices that have been configured in the BIOS. Thus the MBR on the configured device is the only one thats available and hence the system detects it and GRUB loads. In grub.conf is where the issue comes up. The device.map has the entry that positions the location of initrd and vmlinuz. Thus the RAM disk gets properly loaded. As part of the RAM disk, the driver for the HBA gets loaded. At this point, devices are detected and the device nodes are created in the devfs tree. Multiple issues can arise here: 1. The user could have added more devices to the system. 2. If more than one array has been zoned or if the order in which the targets are determnined change then the device nodes that are created are not consistent across reboots. Please note that there are no mechanisms currently within the 2.6 kernel, or the drivers thereof, to determine and ensure that the order in which the targets are detected are unique. Hence the proposed solution to use persistence via udev. The proposed solution will use the scsi_id of the devices to determine the boot device. Since the scsi_id is an unique signature it ensures that the system will boot up. RH: "The user is changing their hardware, adding in another device that claims to be the boot device, and the OS is believing it. We need to understand why they are doing so?" Its not that other devices are claiming to be the boot device. It is that whatever has been specified in GRUB to be the boot device is not necessarily accurate everytime the system boots. That is the crux of the issue and what needs to be fixed. RH: "There's really nothing going on during booting that uses "/dev/sda" " The driver loads in the RAM disk and that changes everything. RH: "Use the "root=LABEL=/" (the default), and new disks won't be a problem." Mounting by specifying LABEL names does not work. There are numerous bugs that have been opened on this issue already - including 116300. The crux of the issue here is that in any multipathing solution there will be more than one path to the same device. However, mount currently fails when more than one device presents the same label name which is the case in any multipathing solution. [Also in the default installation, the labels that are created are always the same. It might be useful to use a random number generator in the future to determine label names] RH: "Does the BIOS bootable disk list change? Is the original disk still at the same position in it?" Already answered. The disk that has been configured to boot at the BIOS level will not change. A reliable boot mechanism is crucial in a SAN environment. This issue is especially needed in Blade server deployments where the customers rely on the SAN for booting. Please write back with any further questions. Regards, Hari Kannan eLAB - EMC Corporation > RH: "And doing so in a way that is guaranteed to work is impossible. " > We have succesfully been able to boot from SAN by making this modification. Have > there been any tests at RedHat that reflect the above statement? I have been successful in simple configurations, like booting off a single disk. The potential problems arise with composite devices, like multipath, LVM and RAID. This is why we prefer to use filesystem or LVM VolGroup labels in grub.conf. > RH: "I'm not sure how you expect grub to find the right device if you're moving your > devices around" > The Emulex/QLogic Bios determines which device is available for boot. Even if > more devices are added to the system, the HBA BIOS will only make available > those devices that have been configured in the BIOS. Thus the MBR on the > configured device is the only one thats available and hence the system detects > it and GRUB loads. In grub.conf is where the issue comes up. > The device.map has the entry that positions the location of initrd and vmlinuz. > Thus the RAM disk gets properly loaded. > > As part of the RAM disk, the driver for the HBA gets loaded. At this point, > devices are detected and the device nodes are created in the devfs tree. > Multiple issues can arise here: > 1. The user could have added more devices to the system. > 2. If more than one array has been zoned or if the order in which the targets > are determnined change then the device nodes that are created are not consistent > across reboots. Please note that there are no mechanisms currently within the > 2.6 kernel, or the drivers thereof, to determine and ensure that the order in > which the targets are detected are unique. > Hence the proposed solution to use persistence via udev. The proposed solution > will use the scsi_id of the devices to determine the boot device. Since the > scsi_id is an unique signature it ensures that the system will boot up. If the root= parameter in Grub.conf identifies a unique filesystem label or VolGroup name, then the system should continue to boot even if the physical device names change. > RH: "The user is changing their hardware, adding in another device that claims > to be the boot device, and the OS is believing it. We need to understand why > they are doing so?" > Its not that other devices are claiming to be the boot device. It is that > whatever has been specified in GRUB to be the boot device is not necessarily > accurate everytime the system boots. That is the crux of the issue and what > needs to be fixed. > > RH: "There's really nothing going on during booting that uses "/dev/sda" " > The driver loads in the RAM disk and that changes everything. > > RH: "Use the "root=LABEL=/" (the default), and new disks won't be a problem." > Mounting by specifying LABEL names does not work. There are numerous bugs that > have been opened on this issue already - including 116300. I believe that is fixed in the next Update. If there are other issues, we should work to get them fixed. > The crux of the issue > here is that in any multipathing solution there will be more than one path to > the same device. However, mount currently fails when more than one device > presents the same label name which is the case in any multipathing solution. > [Also in the default installation, the labels that are created are always the > same. It might be useful to use a random number generator in the future to > determine label names] From what I have seen, the current code creates a label that is unique among all the currently visible labels (by incrementing "n" in /n or VolGroupnn as needed). If new storage is added to the SAN later, then the system manager must ensure that these labels remain unique. We are also considering adding the ability to use root=UUID=, so the system manager does not need to manage the labels. > RH: "Does the BIOS bootable disk list change? Is the original disk still at the > same position in it?" > Already answered. The disk that has been configured to boot at the BIOS level > will not change. > > A reliable boot mechanism is crucial in a SAN environment. This issue is > especially needed in Blade server deployments where the customers rely on the > SAN for booting. Our goal is to make the boot by label method work reliably in a SAN environment. If there are problems in addition to BZ 116300, we would like to resolve them. If users insist on using a persistent device name in grub.conf, then, as of U2, this can be done relatively easily after the system is installed (open the initrd, add a /etc/udev/rules.d file, modify /etc/scsi_id.config, then rebuild initrd). We do not intend to modify the initd to include this by default, because we are not sure this is the correct direction to head in for the future. The next step in this process is for you to identify specific problems with SAN booting in the current RHEL 4 approach, so we can address them. ok, here we go... you need: ftp://people.redhat.com/harald/udev/udev-039-10.11.EL4/ which should be in the next update release... and ftp://people.redhat.com/harald/udev/scsi-id-rootkit/ Follow the instructions in README and please tell me, if that works. Hope this helps. *** Bug 173994 has been marked as a duplicate of this bug. *** Current proposal to achieve Persistence is to LVM and volume labels. We are evaluating scsi-id-rootkit in RHEL 4 U3 for persistence using Will post any issues to this bz. Incidentally has any testing been done with MPIO and boot from SAN with LVM? eLab Linux Bug 183672 may be a DUP of this bug or at least related to it. Following the direction in /usr/share/doc/scsi-id-rootkit-01/README and running "udevstart" causes the message below: "-s option must be specified" Also, I don't see any device nodes created in /dev/disk/* This event sent from IssueTracker by sbenjamin issue 79515 This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release. This is _not_ on the planned work items for RHEL 4.4; it was put on the CanFix list in error. This issue has been discussed within Red Hat Engineering, and the final verdict is that this item will not be included in RHEL4 (ever). Moving to RHEL5. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. *** Bug 208332 has been marked as a duplicate of this bug. *** Why was Hari Kannan's email address removed from the CC: field today? Is EMC not interested in pursuing a fix for this issue anymore ? Amit - Hari is no longer working for eLabs at EMC, his replacement is Wayne included in the CC-list. A fix for this issue has been included in the packages contained in the beta (RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1. Please verify that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. A fix for this issue should have been included in the packages contained in the most recent snapshot (partners.redhat.com) for RHEL5.1. Requested action: Please verify that your issue is fixed as soon as possible to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. More assistance: If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. I have been asked if Red Hat has any setup procedure to install SAN BOOT on RHEL 5.1 A fix for this issue should have been included in the packages contained in the RHEL5.1-Snapshot4 on partners.redhat.com. Requested action: Please verify that your issue is fixed *as soon as possible* to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. A fix for this issue should have been included in the packages contained in the RHEL5.1-Snapshot6 on partners.redhat.com. Requested action: Please verify that your issue is fixed ASAP to confirm that it will be included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0656.html |