Bug 430494

Summary: [NetApp-S 4.7 bug] LUN removal status is not updated on the host without a driver reload
Product: Red Hat Enterprise Linux 4 Reporter: Andrius Benokraitis <andriusb>
Component: kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.7CC: andriusb, bmarzins, bmr, coughlan, cward, ddomingo, djuran, emcnabb, eriley, jturner, ldyrow, marting, mchristi, rajashekhar.a, rlerch, sunday, tyasui, vimarsh.pandey, xdl-redhat-bugzilla
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:25:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 238421    
Bug Blocks: 252336, 391231, 458752    

Comment 1 Andrius Benokraitis 2008-01-28 14:28:41 UTC
This bugzilla was cloned from bug 238421 (for RHEL 5.2). This bug is proposed
for RHEL 4.7. Looks like the script used in RHEL 5 needs to be tweaked a bit to
work in RHEL 4.

Comment 2 Tom Coughlan 2008-02-04 17:08:02 UTC
Netapp: please describe exactly what behavior you are seeing in this area on
RHEL 4, and what the problem is.

Comment 3 Martin George 2008-02-07 07:53:27 UTC
We see the same issues as seen on RHEL5. The script does not need any tweaking -
it runs fine on both RHEL4 & RHEL5 kernels.

Comment 4 Don Domingo 2008-03-18 04:56:37 UTC
adding to RHEL4.7 release notes under "Known Issues":
<quote>
When a LUN is deleted on a configured filer, the change is not reflected on the
host. In such cases, lvm commands will hang indefinitely when dm-multipath is
used, as the LUN has now become stale.

To work around this, delete all device and mpath link entries in /etc/lvm/.cache
specific to the stale LUN.

To find out what these entries are, run the following command:

ls -l /dev/mpath | grep <stale LUN>

For example, if <stale LUN> is 3600d0230003414f30000203a7bc41a00, the following
results may appear:

		lrwxrwxrwx 1 root root 7 Aug  2 10:33 /3600d0230003414f30000203a7bc41a00 ->
../dm-4
		lrwxrwxrwx 1 root root 7 Aug  2 10:33 /3600d0230003414f30000203a7bc41a00p1 ->
../dm-5
	

This means that 3600d0230003414f30000203a7bc41a00 is mapped to two mpath links:
dm-4 and dm-5.

As such, the following lines should be deleted from /etc/lvm/.cache:

		/dev/dm-4 
		/dev/dm-5 
		/dev/mapper/3600d0230003414f30000203a7bc41a00
		/dev/mapper/3600d0230003414f30000203a7bc41a00p1
		/dev/mpath/3600d0230003414f30000203a7bc41a00
		/dev/mpath/3600d0230003414f30000203a7bc41a00p1
</quote>

Please advise if any further revisions are required. thanks!	

Comment 5 NetApp filed bugzillas 2008-03-18 12:07:54 UTC
Use "Storage system" in place for "filer" 

<new text>
When a LUN is deleted on a configured Storage system, the change is not 
reflected on the host.


Comment 6 Don Domingo 2008-03-18 22:46:16 UTC
note revised. thanks!

Comment 7 Ben Marzinski 2008-04-14 22:45:29 UTC
Fixed the multipath part of this bugzilla.  There is a new parameter in
multipath.conf, "flush_on_last_del" that, if set, disables queueing for a
multipath device when the last path is deleted. You can also manually disable
and restore queueing for a multipath device via two new "multipathd -k" commands

disablequeueing map $map
restorequeueing map $map

This means that when the last path is deleted, queueing will be disabled, so
that processes that have the device open will get IO errors, and won't get stuck
in the D state.  This fixes the lockup while deleting the multipath map
(assuming that whatever is holding the device open exits after getting IO errors)

Comment 10 Don Domingo 2008-06-02 23:16:21 UTC
Hi,

the RHEL4.7 release notes deadline is on June 17, 2008 (Tuesday). they will
undergo a final proofread before being dropped to translation, at which point no
further additions or revisions will be entertained.

a mockup of the RHEL4.7 release notes can be viewed here:
http://intranet.corp.redhat.com/ic/intranet/RHEL4u7relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 11 Chris Ward 2008-06-05 15:56:11 UTC
~~~~~~~~~~~~~~
~ Attention: ~ Feedback requested regarding this **High Priority** bug. 
~~~~~~~~~~~~~~

A fix for this issue should be included in the latest packages contained in
RHEL4.7-Snapshot1--available now on partners.redhat.com.

After you (Red Hat Partner) have verified that this issue has been addressed,
submit a comment describing the passing results of your test in appropriate
detail, along with which snapshot and package version tested. The bugzilla will
be updated by Red Hat Quality Engineering for you when this information has been
received.

If you believe this issue has not properly fixed or you are unable to verify the
issue for any reason, please add a comment describing the most recent issues you
are experiencing, along with which snapshot and package version tested. 

If you believe the bug has not been fixed, change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and bugzilla will be updated for you. 

If you need assistance accessing ftp://partners.redhat.com, please contact your
Partner Manager.

Thank you
Red Hat QE Partner Management

Comment 12 Chris Ward 2008-06-16 08:43:08 UTC
~~~~~~~~~~~~~~
~ Attention: ~ Immediate attention required for this ***High Priority*** bug.
~~~~~~~~~~~~~~

A fix for this issue should be included in the latest packages contained in
**RHEL4.7-Snapshot2**, accessible now on http://partners.redhat.com.

After you (Red Hat Partner) have verified that this issue has been addressed,
submit a comment describing the results of your test in appropriate detail,    
 along with which snapshot and package version tested. The bugzilla will be
updated by Red Hat Quality Engineering for you when this information has been  
    received.

If this issue has not been properly fixed or you are unable to verify the issue
for any reason, please add a comment describing the most recent issues you are
experiencing, along with which snapshot and package version tested. If you are
sure the bug has not been fixed, change the status of the bug to ASSIGNED.

For IssueTracker users, submit verification results as usual; Bugzilla will be
updated by Red Hat Quality Engineering for you.

For additional information, contact your Partner Manager.

Thank you,
Red Hat QE Partner Management


Comment 13 Chris Ward 2008-06-19 13:07:35 UTC
~~~~~~~~~~~~~~
~ Attention: ~ 
~~~~~~~~~~~~~~

A fix for this issue should be included in the latest kernel packages contained
in **kernel 2.6.9-73.EL**, accessible now on http://partners.redhat.com.

After you (Red Hat Partner) have verified that this issue has been addressed,
submit a comment describing the results of your test in appropriate detail,    
 along with which snapshot and package version tested. The bugzilla will be
updated by Red Hat Quality Engineering for you when this information has been  
    received.

If this issue has not been properly fixed or you are unable to verify the issue
for any reason, please add a comment describing the most recent issues you are
experiencing, along with which snapshot and package version tested. If you are
sure the bug has not been fixed, change the status of the bug to ASSIGNED.

For IssueTracker users, submit verification results as usual; Bugzilla will be
updated by Red Hat Quality Engineering for you.

For additional information, contact your Partner Manager.

Thank you,
Red Hat QE Partner Management

Comment 14 Rajashekhar M A 2008-06-20 09:49:39 UTC
I have verified the option "flush_on_last_del yes" and it works fine for me.

Tested this on RHEL 4.7 Snapshot 2 (2.6.9-72.ELsmp) -

# rpm -qa | grep device
device-mapper-multipath-0.4.5-31.el4
device-mapper-1.02.25-1.el4
device-mapper-1.02.25-1.el4

Below are the details on how I tested -

1. Add the line in the device specific section in multipath.conf -
   flush_on_last_del yes
2. Configured the maps and started the daemon.
3. Run vgscan command to see the existing volume groups configured.
4. Delete one of the luns mapped to this host from the storage controller.
5. Observe the path checker erros on the host.
6. Run the command "vgscan", this time it hangs as expected, since I have set
the feature as "1 queue_if_no_path".
7. Delete all the corresponding SCSI devices to the lun from the host using the
command -
   # echo 1 > /sys/block/<device>/device/delete
8. Upon deleting the last device, I see that vgscan completes.

Thank you,
Raj

Comment 16 Chris Ward 2008-06-20 10:50:36 UTC
Netapp, thank you for your detailed test results, they're really appreciated.
Glad to see all is well!

Comment 18 errata-xmlrpc 2008-07-24 19:25:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html

Comment 19 Chris Ward 2008-07-29 07:27:40 UTC
Partners, I would like to thank you all for your participation in assuring the
quality of this RHEL 4.7 Update Release. My hat's off to you all. Thanks.