Description of problem: - RHEL4 is based on a 2.6.9+ kernel. This kernel, by default, does not delete devices when connectivity is lost. The sdev simply goes into a offline state. - Later upstream kernels, even as late as 2.6.18 (which is RHEL5) and later, do actively teardown the devices when connectivity is lost (part of a fc transport side effect). - Many bugs were found in the "removal" code paths, with several bugs being data structures that were erroneously freed, then reallocated by other code paths. This happened on target and sdev level structures. - The patches to correct this made it into 2.6.19 and 2.6.20, and there's still a couple of residual "sdev resurrection" patches that were posted in June 2007, that have yet to make usptream. - The discovery of many of these paths were so late in the RHEL5 schedule (and SLES10 actually), that both distros put non-upstream patches in the fc transport to avoid the deletion of devices. So, although the linux kernel says it supports hot removal, and the interfaces exist, as policy, I would not support "lun removal". Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Customers require the ability to de-allocate storage on the fly without rebooting. After discussion with Red Hat engineering and Emulex engineering, it appears the kernel in RHEL 5 has bugs in the remove path which prevent this from working reliably. This feature request is to apply whatever patches necessary from upstream and to test thoroughly so that customers can expect a way to execute a scan that would remove any storage that has been deallocated.
The upstream kernel struggled with this problem, as noted in comment 0. This leads me to think that the changes may be too disruptive to backport to RHEL 5. On the other hand, I believe there will be a strong demand for this, because storage arrays these days are dynamically creating and deleting LUNs (e.g. snapshots) as a matter or normal operation. Chip, please take a look and see what specific problems exist in 5.1, and what specific fixes we might backport. Also consider this as part of the larger effort to do discover and manage on-line storage config changes.
Created attachment 294606 [details] Online Storage Reconfiguration Guide (latest PDF build) as requested. mind the placeholders. please have all reviewers email me at ddomingo for revisions, suggestions and other concerns regarding this document. i'd be happy to integrate as much content as can be supplied. thanks!
Tom, can you sign off on the document for release with RHEL5.2 (to be made available online only)? when you do, i will remove all placeholders. the latest build can be found here: https://engineering.redhat.com/docbot/en-US/Red_Hat_Enterprise_Linux/0.0/html/Online_Storage_Reconfiguration_Guide/ thanks!
Putting on the 5.4 list to possibly move this doc from tech preview to fully supported...
The link in comment 37 is no longer available: Use the publically available link: http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Online_Storage_Reconfiguration_Guide/index.html
Andrius, Rob, the updated OSRG is public now on: html http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/Online_Storage_Reconfiguration_Guide/index.html pdf http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/pdf/Online_Storage_Reconfiguration_Guide.pdf Both links can be found on: http://www.redhat.com/docs/manuals/enterprise/ Note: i just discovered the link to the PDF on that index page was incorrect. i've pushed the correction earlier today and it should be up within the next 24 hours or so.
I believe since this has been built and synced to redhat.com (and doesn't require being built by rel-eng or be reviewed by QE), I think this can be closed manually. Will double-check with TomC.
SCSI rescan script stuff that went into *5.4* needs to be included.
Hi Don, My suggestion on documenting 'rescan-scsi-bus.sh' and its limitations follow. Rob I would add a new chapter to the 'Online Storage Configuration Guide' after chapter 7. 'Removing Devices' I would call the new chapter 'Automated LUN Addition and Removal' I would refer to this new chapter at the end of chapters 5 and 7. The new chapter 'Automated LUN Addition and Removal' should read as: The script rescan-scsi-bus.sh is available as part of the sg3_utils package. (May want to add a section here on how a customer gets the sg3_utils package. Is it something like: yum install sg3_utils?). The script can be used to automatically update host lun configuration following LUN addition and removal. Limitations of rescan-scsi-bus.sh (from bz507379 comment 31) LUN0 should be the first LUN mapped for rescan-scsi-bus.sh to work correctly. If LUN0 is not the first LUN mapped, the first LUN mapped will not get detected, nor will other LUNs that should be scanned. Using the --nooptscan option does not work around this. Due to a bug in the rescan-scsi-bus.sh script, the functionality to recognize a change in the size of a lun executes when the --remove option is used. rescan-scsi-bus.sh needs to be run twice when LUNs are mapped for the first time for all luns to be recognized.
i've revised the document as per specification (in source). since the only remaining issue in the doc was replacing "LUN" with "logical units" where applicable, i'll be pushing it to public later today. closing this bug as CLOSED -> CURRENTRELEASE.