Bug 431947 - [NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP path faults
Summary: [NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP p...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Corey Marthaler
URL:
Whiteboard: GSSApproved
: 431994 (view as bug list)
Depends On: 355961 428338
Blocks: 217208 RHEL5u2_relnotes 431994 438150
TreeView+ depends on / blocked
 
Reported: 2008-02-07 22:50 UTC by Ben Marzinski
Modified: 2010-01-12 02:40 UTC (History)
30 users (show)

Fixed In Version: RHBA-2008-0128
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-10 16:25:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0128 0 normal SHIPPED_LIVE device-mapper-multipath bug fix update 2008-03-10 16:25:52 UTC

Comment 1 Ben Marzinski 2008-02-07 22:52:11 UTC
There is still a bug here.  The priority callouts need to be static, or else
they will still lockup reading the library files.

Comment 2 Ben Marzinski 2008-02-07 22:53:21 UTC
I made all the non-static priority callouts simply symlinks to the static ones.
This fixes the problem transparently to the user.

Comment 7 Suzanne Logcher 2008-02-08 16:12:58 UTC
*** Bug 431994 has been marked as a duplicate of this bug. ***

Comment 9 Ritesh Raj Sarraf 2008-02-11 12:36:21 UTC
So what is our recommended setting in /etc/multipath.conf ?
We've been using /sbin/mpath_prio_[netapp|ontap]

The .static binary has never been used in the config files anywhere IIRC.
Why not just have the normal binaries static ?

Comment 10 Martin George 2008-02-11 20:34:02 UTC
Ben,

So are you planning a new errata release which would make all the non-static
callouts symlinks to the static ones? 

Or simply make only static callouts (without the .static extension)?

Comment 11 Ben Marzinski 2008-02-11 21:23:42 UTC
In response to comment #10, Yes, there is a new errata ready for QA that makes
all the non-static callouts links to the static ones.

The callouts with the .static extension need to be there for mkinitrd to work
correctly. Eventually, we can remove them and just compile the regular ones
statically.

In response to comment #9, /sbin/mpath_prio_ontap is fine if you'd like, but
anyone of them will work just fine.

/sbin/mpath_prio_ontap
/sbin/mpath_prio_netapp
/sbin/mpath_prio_netapp.static
are all just symlinks to
/sbin/mpath_prio_ontap.static

Comment 14 Don Domingo 2008-02-20 22:50:55 UTC
added to RHEl5.2 release notes under "Resolved Issues":

<quote>
The priority callouts of dm-multipath are now statically compiled. This fixes a
problem that occurs when running dm-multipath on devices containing the root
filesystem, which caused such devices to freeze during fibre-channel path faults.
</quote>

please advise if any further revisions are required. thanks!


Comment 15 Martin George 2008-02-22 11:49:04 UTC
Currently, scanning for new devices is done by running the following commands:

# echo "1" > /sys/class/fc_host/host<AdapterNo>/issue_lip
# echo "- - -" > /sys/class/scsi_host/host<AdapterNo>/scan

This is repeated for all host HBA ports.

Would these commands remain the same for a root device multipath scenario? 

On my RHEL 5.1 root device multipathed host with Qlogic adapters, the host seems
to freeze when the above commands are run. The console also throws up messages
like the one listed below during the freeze:

BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ>  [<ffffffff800b50fa>] softlockup_tick+0xd5/0xe7
 [<ffffffff800930e2>] update_process_times+0x42/0x68
 [<ffffffff800746e3>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074da5>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff880941fc>] :scsi_mod:scsi_device_dev_release+0x0/0x16
 [<ffffffff80140dc7>] kobject_release+0x0/0x9
 [<ffffffff8812ef42>] :scsi_transport_fc:fc_user_scan+0x23/0x8b
 [<ffffffff8812ef7a>] :scsi_transport_fc:fc_user_scan+0x5b/0x8b
 [<ffffffff8809497c>] :scsi_mod:store_scan+0x9b/0xc5
 [<ffffffff800ff5d6>] sysfs_write_file+0xb9/0xe8
 [<ffffffff800161c7>] vfs_write+0xce/0x174
 [<ffffffff80016a94>] sys_write+0x45/0x6e
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0


Comment 17 errata-xmlrpc 2008-03-10 16:25:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0128.html


Comment 18 Martin George 2008-03-24 12:26:26 UTC
Please ignore comment #15. 

The freeze was hit in my case because I had issued a LIP reset before the SCSI 
scan, which was actually not required. Once that was removed, the dynamic 
rescan worked fine on the root device multipathed host.



Comment 19 Don Domingo 2008-04-02 02:15:02 UTC
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don


Note You need to log in before you can comment on or make changes to this bug.