Bug 209160
Summary: | [RHEL5 Beta2] kernel(qla2xxx): parallel scanning of SCSI devices causes name changes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Kiyoshi Ueda <kueda> |
Component: | kernel | Assignee: | Chip Coldwell <coldwell> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.0 | CC: | andrew.vasquez, andriusb, coughlan, i-kitayama, jnomura, junichi.nomura, kueda, kueda, mchristi, tao, tatsu-ab1 |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-07-26 13:22:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 216989, 227613, 228988, 230627, 243319 |
Description
Kiyoshi Ueda
2006-10-03 16:51:13 UTC
I cloned this bug as a bug of the QLogic FC card driver. In the current RHEL5, FC disks which are connected to multiple QLogic FC HBAs are scanned in parallel. On the other hand, there are no persistent device naming scheme. So it is hard to identify some device in multiple QLoginc FC HBAs and multiple FC disks environment. Currently, it causes a system panic problem during boot time in the multipath root support. (See the original bug report.) If the driver scans FC disks serially like the RHEL4 driver, the multipath root support of current mkinitrd should work as long as physical configuration doesn't change. Example: ------------------------------------------------------------------- Environment: 2 HBAs (host0, host1) 4 LUNs multipath storage (lun0, lun1, lun2, lun3) Scan order: current RHEL4 current RHEL5 (2.6.9-42.EL) (2.6.18-1.2702.el5) -------------------------------------------- host0-lun0 host0-lun0 host0-lun1 host1-lun0 host0-lun2 host0-lun1 host0-lun3 host0-lun2 host1-lun0 host1-lun1 host1-lun1 host0-lun3 host1-lun2 host1-lun2 host1-lun3 host1-lun3 (Always same order) (Varies in each boot) ------------------------------------------------------------------- Andrew, Reliable persistent device naming is planned for RHEL 5.1. In the meantime, in RHEL 5.0, it may be helpful to consider a way to reduce the impact by disabling the parallel scan of SCSI hosts. An option to revert to sequential scanning would help avoid the most common cause of name changes. It is a partial solution, to be sure, but I wonder if it would be feasible? Tom Given the new FC transport infrastructure, the driver has no role in the lun-scan detection process. Instead the driver simply makes an upcall to the transport indicating the a new FC port has been discovered. If that port has a 'target' role, then a midlayer 'scan-work' event is placed on the shost's work-queue. Given the threaded/scheduled semantics of work-queue handling, there's no guarantee when a work-event will be processed. As can be seen by the customer, in his test cases, he's seeing parallel work-queue handling. Are there any work around for this? I don't know of any workaround. We are going to have to instruct customes in how to use persistent names (LVM, labels, udev), rather than "sd" (or major, minor numbers). This is not going to be easy, or perfect, but it is the direction they have to move in anyway. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. (In reply to comment #5) > I don't know of any workaround. We are going to have to instruct customes in how > to use persistent names (LVM, labels, udev), rather than "sd" (or major, minor > numbers). This is not going to be easy, or perfect, but it is the direction they > have to move in anyway. Does this mean a release note, then? Or perhaps a "persistent naming" whitepaper? Chip This bug could be related to bug 213039. Bug already in GSS list. Removing from feature list. Quality Engineering Management has reviewed and declined this request. You may appeal this decision by reopening this request. The pressing need for this is has been removed in 5.1 by the improved support for dm-multipath in Anaconda and the initrd. The need for persistent "sd" device names is mostly gone. What remains is to make customers aware of the fact that they can not depend on persistent "sd" device names, and have them remove this dependency from their applications and procedures. How about a knowledge base article on this Chip? What is 209160 for? Was it for multipath bugs that were a result of async scanning? I thought there were two bugs with multipath boot: 1. async scanning causes a device's names (/dev/sX) and major minor numbers to change between boots. This was bad for the initial multipath boot code back in 5.0 beta, because the multipath boot code was relying major minor numbers to be the same. I think Peter Jones or someone fixed that by having multipath assemble devices for boot using uuid like is done with the non-boot multipath setup. 2. Previously, userspace assumed that when a module was done loading the devices were added and ready to go, but async scanning causes the module loading to return before devices are found. This causes multipath boot not to find devices. I thought this was fixed with the wait fix in this bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=213039 Ooops, previous comment is Mike Christie's copy-pased from but 198666. Should have cited him there. Chip |