Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
When a device path came online after another device path failed, the multipathd daemon failed to remove the restored path correctly. Consequently, multipathd sometimes terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to "group_by_prio". With this update, multipathd removes and restores such paths as expected, thus fixing this bug.
DescriptionRajashekhar M A
2011-04-13 12:56:29 UTC
Description of problem:
The multipathd daemon on a RHEL6.1 host crashes during IO with fabric faults.
We see the followig message in the syslog:
kernel: multipathd[9675]: segfault at 170 ip 00000000004073f1 sp 00007ffba3544c60 error 4 in multipathd (deleted)[400000+10000]
The stack trace is as below:
(gdb) bt
#0 0x00000000004073f1 in update_prio (pp=0x7ffa74000b80, refresh_all=1)
at main.c:964
#1 0x0000000000407aff in check_path (vecs=0xc1bf30, pp=0x7ffa3c1330a0)
at main.c:1116
#2 0x0000000000407db5 in checkerloop (ap=0xc1bf30) at main.c:1159
#3 0x00007ffba54497e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007ffba46bd78d in clone () from /lib64/libc.so.6
(gdb)
Version-Release number of selected component (if applicable):
RHEL6.1 Snapshot 2 -
kernel-2.6.32-128.el6.x86_64
device-mapper-multipath-0.4.9-40.el6.x86_64
device-mapper-1.02.62-3.el6.x86_64
How reproducible:
Frequently observed.
Steps to Reproduce:
1. Map 40 LUNs (with 4 FC paths each, i.e, 160 SCSI devices) from controllers and configure multipath devices on the host.
2. Create 5 LVs on the dm-multipath devices and start IO to the Lvs.
3. Introduce fabric faults repeatedly.
Additional info:
The multipath.conf, syslog and the core dump file are attached will be attached with the bugzilla.
Created attachment 491753[details]
core dump file, messages and multipath.conf
Attaching a zip file with core dump, full syslog and multipath.conf file.
And I'm now hitting the same crash on a 6.0.z host as well with device-mapper-multipath-0.4.9-31.el6_0.3:
Core was generated by `/sbin/multipathd'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004073c2 in update_prio (pp=0x7f9754000d30, refresh_all=1) at main.c:960
960 vector_foreach_slot (pp->mpp->pg, pgp, i) {
Missing separate debuginfos, use: debuginfo-install device-mapper-libs-1.02.53-8.el6_0.4.x86_64 glibc-2.12-1.7.el6_0.4.x86_64 libaio-0.3.107-10.el6.x86_64 libselinux-2.0.94-2.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libudev-147-2.29.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 readline-6.0-3.el6.x86_64
(gdb) where
#0 0x00000000004073c2 in update_prio (pp=0x7f9754000d30, refresh_all=1) at main.c:960
#1 0x0000000000407a43 in check_path (vecs=0x6c57a0, pp=0x8b1af0) at main.c:1108
#2 0x0000000000407cf9 in checkerloop (ap=0x6c57a0) at main.c:1151
#3 0x00000037a62077e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00000037a5ae151d in clone () from /lib64/libc.so.6
(gdb)
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
The multipathd daemon could have terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to the group_by_prio value. This occurred when a device path came online after another device path failed because the multipath daemon did not manage to remove the restored path correctly. With this update multipath removes and restores such paths correctly.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
http://rhn.redhat.com/errata/RHBA-2011-0725.html
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1 +1 @@
-The multipathd daemon could have terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to the group_by_prio value. This occurred when a device path came online after another device path failed because the multipath daemon did not manage to remove the restored path correctly. With this update multipath removes and restores such paths correctly.+When a device path came online after another device path failed, the multipathd daemon failed to remove the restored path correctly. Consequently, multipathd sometimes terminated unexpectedly with a segmentation fault on a multipath device with the path_grouping_policy option set to "group_by_prio". With this update, multipathd removes and restores such paths as expected, thus fixing this bug.
Description of problem: The multipathd daemon on a RHEL6.1 host crashes during IO with fabric faults. We see the followig message in the syslog: kernel: multipathd[9675]: segfault at 170 ip 00000000004073f1 sp 00007ffba3544c60 error 4 in multipathd (deleted)[400000+10000] The stack trace is as below: (gdb) bt #0 0x00000000004073f1 in update_prio (pp=0x7ffa74000b80, refresh_all=1) at main.c:964 #1 0x0000000000407aff in check_path (vecs=0xc1bf30, pp=0x7ffa3c1330a0) at main.c:1116 #2 0x0000000000407db5 in checkerloop (ap=0xc1bf30) at main.c:1159 #3 0x00007ffba54497e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffba46bd78d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): RHEL6.1 Snapshot 2 - kernel-2.6.32-128.el6.x86_64 device-mapper-multipath-0.4.9-40.el6.x86_64 device-mapper-1.02.62-3.el6.x86_64 How reproducible: Frequently observed. Steps to Reproduce: 1. Map 40 LUNs (with 4 FC paths each, i.e, 160 SCSI devices) from controllers and configure multipath devices on the host. 2. Create 5 LVs on the dm-multipath devices and start IO to the Lvs. 3. Introduce fabric faults repeatedly. Additional info: The multipath.conf, syslog and the core dump file are attached will be attached with the bugzilla.