Bug 431947 - [NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP path faults
[NetApp-S 5.2 bug] RHEL 5.1 root device multipathed host freezes during FCP p...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.1
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Ben Marzinski
Corey Marthaler
GSSApproved
: ZStream
: 431994 (view as bug list)
Depends On: 355961 428338
Blocks: 217208 RHEL5u2_relnotes 431994 438150
  Show dependency treegraph
 
Reported: 2008-02-07 17:50 EST by Ben Marzinski
Modified: 2010-01-11 21:40 EST (History)
30 users (show)

See Also:
Fixed In Version: RHBA-2008-0128
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-03-10 12:25:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Ben Marzinski 2008-02-07 17:52:11 EST
There is still a bug here.  The priority callouts need to be static, or else
they will still lockup reading the library files.
Comment 2 Ben Marzinski 2008-02-07 17:53:21 EST
I made all the non-static priority callouts simply symlinks to the static ones.
This fixes the problem transparently to the user.
Comment 7 Suzanne Yeghiayan 2008-02-08 11:12:58 EST
*** Bug 431994 has been marked as a duplicate of this bug. ***
Comment 9 Ritesh Raj Sarraf 2008-02-11 07:36:21 EST
So what is our recommended setting in /etc/multipath.conf ?
We've been using /sbin/mpath_prio_[netapp|ontap]

The .static binary has never been used in the config files anywhere IIRC.
Why not just have the normal binaries static ?
Comment 10 Martin George 2008-02-11 15:34:02 EST
Ben,

So are you planning a new errata release which would make all the non-static
callouts symlinks to the static ones? 

Or simply make only static callouts (without the .static extension)?
Comment 11 Ben Marzinski 2008-02-11 16:23:42 EST
In response to comment #10, Yes, there is a new errata ready for QA that makes
all the non-static callouts links to the static ones.

The callouts with the .static extension need to be there for mkinitrd to work
correctly. Eventually, we can remove them and just compile the regular ones
statically.

In response to comment #9, /sbin/mpath_prio_ontap is fine if you'd like, but
anyone of them will work just fine.

/sbin/mpath_prio_ontap
/sbin/mpath_prio_netapp
/sbin/mpath_prio_netapp.static
are all just symlinks to
/sbin/mpath_prio_ontap.static
Comment 14 Don Domingo 2008-02-20 17:50:55 EST
added to RHEl5.2 release notes under "Resolved Issues":

<quote>
The priority callouts of dm-multipath are now statically compiled. This fixes a
problem that occurs when running dm-multipath on devices containing the root
filesystem, which caused such devices to freeze during fibre-channel path faults.
</quote>

please advise if any further revisions are required. thanks!
Comment 15 Martin George 2008-02-22 06:49:04 EST
Currently, scanning for new devices is done by running the following commands:

# echo "1" > /sys/class/fc_host/host<AdapterNo>/issue_lip
# echo "- - -" > /sys/class/scsi_host/host<AdapterNo>/scan

This is repeated for all host HBA ports.

Would these commands remain the same for a root device multipath scenario? 

On my RHEL 5.1 root device multipathed host with Qlogic adapters, the host seems
to freeze when the above commands are run. The console also throws up messages
like the one listed below during the freeze:

BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ>  [<ffffffff800b50fa>] softlockup_tick+0xd5/0xe7
 [<ffffffff800930e2>] update_process_times+0x42/0x68
 [<ffffffff800746e3>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074da5>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff880941fc>] :scsi_mod:scsi_device_dev_release+0x0/0x16
 [<ffffffff80140dc7>] kobject_release+0x0/0x9
 [<ffffffff8812ef42>] :scsi_transport_fc:fc_user_scan+0x23/0x8b
 [<ffffffff8812ef7a>] :scsi_transport_fc:fc_user_scan+0x5b/0x8b
 [<ffffffff8809497c>] :scsi_mod:store_scan+0x9b/0xc5
 [<ffffffff800ff5d6>] sysfs_write_file+0xb9/0xe8
 [<ffffffff800161c7>] vfs_write+0xce/0x174
 [<ffffffff80016a94>] sys_write+0x45/0x6e
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0
Comment 17 errata-xmlrpc 2008-03-10 12:25:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0128.html
Comment 18 Martin George 2008-03-24 08:26:26 EDT
Please ignore comment #15. 

The freeze was hit in my case because I had issued a LIP reset before the SCSI 
scan, which was actually not required. Once that was removed, the dynamic 
rescan worked fine on the root device multipathed host.

Comment 19 Don Domingo 2008-04-01 22:15:02 EDT
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Note You need to log in before you can comment on or make changes to this bug.