This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 875199 - multipathd crash in find_slot()
multipathd crash in find_slot()
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
6.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Ben Marzinski
yanfu,wang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-09 14:27 EST by Ben Marzinski
Modified: 2017-02-06 10:16 EST (History)
13 users (show)

See Also:
Fixed In Version: device-mapper-multipath-0.4.9-65.el6
Doc Type: Bug Fix
Doc Text:
Cause: Multipath wasn't checking if a pointer was NULL before dereferencing it. Consequence: Occasionally, when the scsi layer deleted failed path devices, multipathd would crash Fix: Multipath now checks if the pointer is NULL before dereferencing it. Result: Multipath no longer crashes when the scsi layer removes path devices.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 02:44:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
check if the vector exists before dereferencing it. (446 bytes, patch)
2012-11-09 14:36 EST, Ben Marzinski
no flags Details | Diff

  None (edit)
Description Ben Marzinski 2012-11-09 14:27:36 EST
Description of problem:
When paths go down and are removed multipathd is crashing in find_slot()

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-56.el6.x86_64

How reproducible:
Don't know.

Steps to Reproduce:
1. Don't know
2.
3.
  
Actual results:
segfault in find_slot()

Expected results:
No segfault


Additional info:
This report is coming in from Mike Christie at Fusion-io
Comment 2 Ben Marzinski 2012-11-09 14:36:38 EST
Created attachment 641787 [details]
check if the vector exists before dereferencing it.

This patch makes sure the the vector in find_slot is not NULL before dereferencing it.
Comment 3 mchristie 2012-11-09 16:42:51 EST
We hit this bug by forcing paths to be added/deleted.

- Set dev_loss_tmo relativately low, so we can replciate it faster. Maybe 15-20 secs.

- Run IO test to dm-multipath device. Have dm multipath device setup with queue_if_no_path.

- Inject transport problem for dev_loss_tmo seconds, so the paths (/dev/sdXs) are deleted by the scsi layer, and so multipathd handles the removal by removing the path.

- Correct transport problem, so paths are added back by the scsi layer and multipathd.

- Repeat. We run this test for a several hours.
Comment 4 RHEL Product and Program Management 2012-12-14 03:50:18 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 6 Ben Marzinski 2013-06-10 17:31:56 EDT
applied patch.
Comment 9 Steven J. Levine 2013-07-19 17:17:23 EDT
It doesn't look as if this will affect the DM-Multipath document, so I'm marking this as docs_scoped-
Comment 10 mchristie 2013-09-16 13:06:15 EDT
Hi Ben,

We seem to have customers hitting this with RHEL 6.3. Would it be possible to add this to a zstream?
Comment 11 yanfu,wang 2013-10-18 03:09:09 EDT
@mchristie,
Would customer help to test against fixed version? test build could be provided by developer.
Comment 12 yanfu,wang 2013-10-18 04:53:16 EDT
QE couldn't trigger the issue. Could I know your multipath configure and what I/O you did? More detailed info would be welcome, thanks.
Comment 13 mchristie 2013-10-18 14:37:19 EDT
For mulitpath.conf we used

devices {
        device {
                vendor                  "FUSIONIO"
                features                "3 queue_if_no_path pg_init_retries 50"
                hardware_handler        "1 alua"
                path_grouping_policy    group_by_prio
                path_selector           "queue-length 0"
                failback                immediate
                path_checker            tur
                prio                    alua

                fast_io_fail_tmo       15
                dev_loss_tmo           60
        }
}

For IO we just ran fio

fio --filename=/dev/mapper/mpathaj --bs=256K --size=5G --name=mpathaj --refill_buffers --iodepth=128 --iodepth_batch=128  --numjobs=16 --thread  --rw=randwrite --time_based --runtime=13d  --ioengine=libaio
Comment 14 mchristie 2013-10-18 14:37:55 EDT
If you can provide a test build we can test here.
Comment 15 yanfu,wang 2013-10-20 22:03:16 EDT
(In reply to mchristie from comment #14)
> If you can provide a test build we can test here.
hi,

Below is test build provided from developer:
http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL6/x86_64/

You could use the -65 or -71 that are fixed version, thanks for your testing!
Comment 16 yanfu,wang 2013-10-24 23:07:26 EDT
Do code review against device-mapper-multipath-0.4.9-72.el6 and verify patch is applied correctly.
Comment 17 mchristie 2013-11-05 16:54:59 EST
It works for me here. Not seeing segfault. Thanks.
Comment 19 errata-xmlrpc 2013-11-21 02:44:56 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1574.html

Note You need to log in before you can comment on or make changes to this bug.