Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 696157

Summary:

[NetApp 6.1 Bug] multipathd crashes occasionally during IO with fabric faults

Product:

Red Hat Enterprise Linux 6

Reporter:

Rajashekhar M A <rajashekhar.a>

Component:

device-mapper-multipath

Assignee:

Ben Marzinski <bmarzins>

Status:

CLOSED ERRATA

QA Contact:

Storage QE <storage-qe>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

6.1

CC:

agk, bdonahue, bmarzins, coughlan, ddumas, dwysocha, heinzm, marting, mbroz, prajnoha, prockai, revers, xdl-redhat-bugzilla, zkabelac

Target Milestone:

Keywords:

ZStream

Target Release:

---

Hardware:

All

OS:

All

Whiteboard:

Fixed In Version:

device-mapper-multipath-0.4.9-41.el6

Doc Type:

Bug Fix

Doc Text:

When a device path came online after another device path failed, the multipathd daemon failed to remove the restored path correctly. Consequently, multipathd sometimes terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to "group_by_prio". With this update, multipathd removes and restores such paths as expected, thus fixing this bug.

Story Points:

---

Clone Of:

Clones:

721245 (view as bug list)

Environment:

Last Closed:

2011-05-19 14:13:04 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

702402, 721245

Attachments:

Description	Flags
core dump file, messages and multipath.conf	none
6.0.z multipathd crash coredump	none

Description Rajashekhar M A 2011-04-13 12:56:29 UTC

Description of problem:

The multipathd daemon on a RHEL6.1 host crashes during IO with fabric faults.

We see the followig message in the syslog:

kernel: multipathd[9675]: segfault at 170 ip 00000000004073f1 sp 00007ffba3544c60 error 4 in multipathd (deleted)[400000+10000]

The stack trace is as below:

(gdb) bt
#0  0x00000000004073f1 in update_prio (pp=0x7ffa74000b80, refresh_all=1)
    at main.c:964
#1  0x0000000000407aff in check_path (vecs=0xc1bf30, pp=0x7ffa3c1330a0)
    at main.c:1116
#2  0x0000000000407db5 in checkerloop (ap=0xc1bf30) at main.c:1159
#3  0x00007ffba54497e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffba46bd78d in clone () from /lib64/libc.so.6
(gdb)


Version-Release number of selected component (if applicable):

RHEL6.1 Snapshot 2 -
kernel-2.6.32-128.el6.x86_64
device-mapper-multipath-0.4.9-40.el6.x86_64
device-mapper-1.02.62-3.el6.x86_64

How reproducible:

Frequently observed.

Steps to Reproduce:

1. Map 40 LUNs (with 4 FC paths each, i.e, 160 SCSI devices) from controllers and configure multipath devices on the host.
2. Create 5 LVs on the dm-multipath devices and start IO to the Lvs.
3. Introduce fabric faults repeatedly.
  
Additional info:

The multipath.conf, syslog and the core dump file are attached will be attached with the bugzilla.

Comment 2 Rajashekhar M A 2011-04-13 13:13:39 UTC

Created attachment 491753 [details]
core dump file, messages and multipath.conf

Attaching a zip file with core dump, full syslog and multipath.conf file.

Comment 5 Martin George 2011-04-14 13:29:21 UTC

And I'm now hitting the same crash on a 6.0.z host as well with device-mapper-multipath-0.4.9-31.el6_0.3:

Core was generated by `/sbin/multipathd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004073c2 in update_prio (pp=0x7f9754000d30, refresh_all=1) at main.c:960
960                     vector_foreach_slot (pp->mpp->pg, pgp, i) {
Missing separate debuginfos, use: debuginfo-install device-mapper-libs-1.02.53-8.el6_0.4.x86_64 glibc-2.12-1.7.el6_0.4.x86_64 libaio-0.3.107-10.el6.x86_64 libselinux-2.0.94-2.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libudev-147-2.29.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 readline-6.0-3.el6.x86_64
(gdb) where
#0  0x00000000004073c2 in update_prio (pp=0x7f9754000d30, refresh_all=1) at main.c:960
#1  0x0000000000407a43 in check_path (vecs=0x6c57a0, pp=0x8b1af0) at main.c:1108
#2  0x0000000000407cf9 in checkerloop (ap=0x6c57a0) at main.c:1151
#3  0x00000037a62077e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00000037a5ae151d in clone () from /lib64/libc.so.6
(gdb)

Comment 6 Martin George 2011-04-15 10:31:05 UTC

Created attachment 492333 [details]
6.0.z multipathd crash coredump

Comment 7 Ben Marzinski 2011-04-16 03:49:25 UTC

I'm pretty sure that I've fixed this. Can you try the patches available at:

http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/RHEL6/i686/

and

http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/RHEL6/x86_64

Comment 8 Ben Marzinski 2011-04-16 03:58:25 UTC

*** Bug 693524 has been marked as a duplicate of this bug. ***

Comment 11 Ben Marzinski 2011-04-19 16:21:38 UTC

Have you been able to try out the test patches above?

Comment 13 Martin George 2011-04-19 17:21:27 UTC

(In reply to comment #11)
> Have you been able to try out the test patches above?

We are currently testing it. Will keep you posted on the results.

Comment 14 Ben Marzinski 2011-04-19 19:43:51 UTC

The fix from the test packages fixes this for me. Please let know how your testing turns out.

Comment 16 Rajashekhar M A 2011-04-20 10:39:25 UTC

Yes, the test package seems to have fixed the issue. Our tests ran successfully and we did not hit this issue.

Comment 17 Eva Kopalova 2011-05-02 13:57:32 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The multipathd daemon could have terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to the group_by_prio value. This occurred when a device path came online after another device path failed because the multipath daemon did not manage to remove the restored path correctly. With this update multipath removes and restores such paths correctly.

Comment 20 errata-xmlrpc 2011-05-19 14:13:04 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0725.html

Comment 21 Tomas Capek 2011-08-09 16:46:00 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-The multipathd daemon could have terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to the group_by_prio value. This occurred when a device path came online after another device path failed because the multipath daemon did not manage to remove the restored path correctly. With this update multipath removes and restores such paths correctly.+When a device path came online after another device path failed, the multipathd daemon failed to remove the restored path correctly. Consequently, multipathd sometimes terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to "group_by_prio". With this update, multipathd removes and restores such paths as expected, thus fixing this bug.