Bug 236576

Summary: Scant information regarding SCSI bus scanning and devices hot adding/removal
Product: Red Hat Enterprise Linux 4 Reporter: Aleksander Adamowski <bugs-redhat>
Component: DocumentationAssignee: Don Domingo <ddomingo>
Status: CLOSED NOTABUG QA Contact: John Ha <jha>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: adstrong, tao, wmealing
Target Milestone: ---Keywords: Documentation
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-29 04:27:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aleksander Adamowski 2007-04-16 15:16:02 UTC
Description of problem:

There are 2 Redhat KB articles that cover the topic of scanning the SCSI bus and
hot adding and removing of devices:

http://kbase.redhat.com/faq/FAQ_85_7921.shtm

http://kbase.redhat.com/faq/FAQ_80_4011.shtm

Both of them specify that the operation isn't guaranteed to work and not corrupt
data.

This information isn't satisfactory for corporate RHEL customers. E.g. our
customer (who has quite a large RHEL deployment) has a IBM x346 production
server with SerweRAID 7k controller running RHEL4U4 and needs to hot-add a SCSI
drive.
Using IBM Director the disk has been added as a logical drive and initialized,
but the OS didn't discover the new device.

The KB information doesn't look encouraging and gives no hint as to what
hardware can handle the scsi rescanning/device addition, and what hardware will
bomb out.

Expected results:

IMO the KB article should be updated with a list of hardware configurations on
which SCSI scanning/adding/removal has been tested to work correctly, and
configurations on which it has failed in Redhat's labs.

Only Redhat has sufficient hardware resources to do such extensive testing.

Comment 2 Aleksander Adamowski 2007-06-14 10:41:47 UTC
Any updates on this?

Hotplugging of disk drives is a pretty basic expectation from an enterprise
grade OS...

Comment 3 Wade Mealing 2007-06-19 05:15:26 UTC
Gday,

This is a more complicated matter than the kbase covers.   I'll reply to your
comments in line.

> Both of them specify that the operation isn't guaranteed to work 

Correct, as many drivers, such as third party modules can influence how this
works.  Multipathing and fiber switches between the storage and the host machine
 may influence the ability to be able to reliably hot add devices.

> not corrupt data.

If you have a pending scsi write to a device on the chain, you issue a scsi
reset down the chain while the device is pending a write() or a read(), there is
no guarantee that this data will be synced to the disk before the scsi device
reset is done.

It is also possible that multiple host controllers are connected to a single
scsi bus chain (not common, but it has been seen in production) rescanning the
bus which is initiated by a single controller will cause the other scsi bus
controller to reset, losing possible writes queued on the controllers internal
buffers.


> This information isn't satisfactory for corporate RHEL customers.
> E.g. our customer (who has quite a large RHEL deployment) has a IBM x346 
> production server with SerweRAID 7k controller running RHEL4U4 and needs to
> hot-add a SCSI drive.

I would strongly reccomend that you test this kind of change to your server
infrastructure on your staging systems first before deploying to your production
systems, testing testing and more testing.

If you feel that this is a feature that you'd like to see in Red Hat Enterprise
Linux, please contact Red Hat Support (or your TAM ) and raise a feature request
citing this bugzilla.

Comment 4 Aleksander Adamowski 2007-06-19 16:13:04 UTC
I think it would be a good idea to copy-paste your explanation into the kb
articles in question since this makes some things more clear and is a good thing.

That said, you explanation covers problems with multipathing and shared storage
installations.

What about a simple case where the device is simply hot-added to a dedicated
box, and all devices on the SCSI chain are idle at the moment (e.g. all services
are stopped and filesystems are synced)?

Whatever the answer, it would be a good idea to add more info to kb to clarify
things.


Comment 5 Michael Hideo 2007-10-23 02:44:33 UTC
Removing automation notification

Comment 6 Don Domingo 2008-01-29 04:27:07 UTC
question satisfactorily answered by wmealing.

as for suggestion re: kbase, please use the rating buttons (i.e. "How well did
this entry answer your question?") to notify the specific kbase writer.

closing this bug as NOTABUG.