Bug 875199
Summary: | multipathd crash in find_slot() | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Ben Marzinski <bmarzins> | ||||
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3 | CC: | acathrow, agk, bdonahue, bmarzins, dwysocha, heinzm, mchristie, msnitzer, prajnoha, prockai, slevine, yanwang, zkabelac | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | device-mapper-multipath-0.4.9-65.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: Multipath wasn't checking if a pointer was NULL before dereferencing it.
Consequence: Occasionally, when the scsi layer deleted failed path devices, multipathd would crash
Fix: Multipath now checks if the pointer is NULL before dereferencing it.
Result: Multipath no longer crashes when the scsi layer removes path devices.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-11-21 07:44:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ben Marzinski
2012-11-09 19:27:36 UTC
Created attachment 641787 [details]
check if the vector exists before dereferencing it.
This patch makes sure the the vector in find_slot is not NULL before dereferencing it.
We hit this bug by forcing paths to be added/deleted. - Set dev_loss_tmo relativately low, so we can replciate it faster. Maybe 15-20 secs. - Run IO test to dm-multipath device. Have dm multipath device setup with queue_if_no_path. - Inject transport problem for dev_loss_tmo seconds, so the paths (/dev/sdXs) are deleted by the scsi layer, and so multipathd handles the removal by removing the path. - Correct transport problem, so paths are added back by the scsi layer and multipathd. - Repeat. We run this test for a several hours. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. applied patch. It doesn't look as if this will affect the DM-Multipath document, so I'm marking this as docs_scoped- Hi Ben, We seem to have customers hitting this with RHEL 6.3. Would it be possible to add this to a zstream? @mchristie, Would customer help to test against fixed version? test build could be provided by developer. QE couldn't trigger the issue. Could I know your multipath configure and what I/O you did? More detailed info would be welcome, thanks. For mulitpath.conf we used devices { device { vendor "FUSIONIO" features "3 queue_if_no_path pg_init_retries 50" hardware_handler "1 alua" path_grouping_policy group_by_prio path_selector "queue-length 0" failback immediate path_checker tur prio alua fast_io_fail_tmo 15 dev_loss_tmo 60 } } For IO we just ran fio fio --filename=/dev/mapper/mpathaj --bs=256K --size=5G --name=mpathaj --refill_buffers --iodepth=128 --iodepth_batch=128 --numjobs=16 --thread --rw=randwrite --time_based --runtime=13d --ioengine=libaio If you can provide a test build we can test here. (In reply to mchristie from comment #14) > If you can provide a test build we can test here. hi, Below is test build provided from developer: http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL6/x86_64/ You could use the -65 or -71 that are fixed version, thanks for your testing! Do code review against device-mapper-multipath-0.4.9-72.el6 and verify patch is applied correctly. It works for me here. Not seeing segfault. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1574.html |