Bug 1459370
| Summary: | Segfault when reading all_devs device with no_path_retry 4 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Nir Soffer <nsoffer> | ||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | ||||
| Severity: | urgent | Docs Contact: | Marek Suchánek <msuchane> | ||||
| Priority: | urgent | ||||||
| Version: | 7.3 | CC: | agk, bmarzins, bmcclain, heinzm, jbrassow, lilin, lmiksik, loberman, mgandhi, msnitzer, prajnoha, rhandlin, vanhoof, ylavi | ||||
| Target Milestone: | rc | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | device-mapper-multipath-0.4.9-112.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: |
DM Multipath no longer crashes when adding a feature to an empty string
Previously, the DM Multipath service terminated unexpectedly when it attempted to add a feature to the features string of a built-in device configuration that had no features string. With this update, DM Multipath first checks if the features string exists, and creates one if necessary. As a result, DM Multipath no longer crashes when trying to modify a nonexistent features string.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1510837 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-04-10 16:10:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1298243, 1420851, 1469559, 1510837 | ||||||
| Attachments: |
|
||||||
Can you try the rpms at http://download-node-02.eng.bos.redhat.com/brewroot/scratch/bmarzins/task_13374131/ and see if they fix the issue. There is a bug in add_features when trying to add a new feature to a device configuration that doesn't already have a feature. Created attachment 1286465 [details]
Output of "multipathd show config" with the scratch build
Great. We're in the blockers only phase of rhel-7.4, so how urgent is this bugfix for you? (In reply to Ben Marzinski from comment #5) > Great. We're in the blockers only phase of rhel-7.4, so how urgent is this > bugfix for you? This can wait to 7.4.z. (In reply to Yaniv Lavi from comment #6) > (In reply to Ben Marzinski from comment #5) > > Great. We're in the blockers only phase of rhel-7.4, so how urgent is this > > bugfix for you? > > This can wait to 7.4.z. This configuration was tested by RHV QE on 2016-07-28: https://bugzilla.redhat.com/show_bug.cgi?id=1335176#c31 We are recommending the "no_path_retry 4" option for about a year in the users mailing list: http://lists.ovirt.org/pipermail/users/2016-August/041949.html So this seems to be a regression in 7.3. I don't know about customers cases yet, but I don't think we should wait for them. I would like this fix in 7.3.z. (In reply to Nir Soffer from comment #7) > (In reply to Yaniv Lavi from comment #6) > > (In reply to Ben Marzinski from comment #5) > > > Great. We're in the blockers only phase of rhel-7.4, so how urgent is this > > > bugfix for you? > > > > This can wait to 7.4.z. > > This configuration was tested by RHV QE on 2016-07-28: > https://bugzilla.redhat.com/show_bug.cgi?id=1335176#c31 > > We are recommending the "no_path_retry 4" option for about a year in the > users > mailing list: > http://lists.ovirt.org/pipermail/users/2016-August/041949.html > > So this seems to be a regression in 7.3. > > I don't know about customers cases yet, but I don't think we should wait for > them. > > I would like this fix in 7.3.z. Nir, is correct. Me comment was under the assumption this isn't a regression. Bronce, can you mark as blocker? For what it's worth, this isn't a regression from rhel-7.3. It was broken there too. It was working in rhel-7.2, however. *** Bug 1462134 has been marked as a duplicate of this bug. *** multipath wasn't correctly adding features to a configuration if the current features string was NULL. It now handles this correctly. Reproduced on device-mapper-multipath-0.4.9-111.el7
1, # rpm -qa | grep multipath
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64
2, edit /etc/multipath.conf
# cat /etc/multipath.conf
defaults {
polling_interval 5
no_path_retry 4
user_friendly_names no
flush_on_last_del yes
fast_io_fail_tmo 5
dev_loss_tmo 30
max_fds 4096
}
devices {
device {
all_devs yes
no_path_retry 4
}
}
3, # multipath -ll
Segmentation fault
4,# service multipathd reload
Redirecting to /bin/systemctl reload multipathd.service
Job for multipathd.service failed because a fatal signal was delivered to the control process. See "systemctl status multipathd.service" and "journalctl -xe" for details.
5,# systemctl status multipathd.service
● multipathd.service - Device-Mapper Multipath Device Controller
Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
Active: failed (Result: signal) since Tue 2017-11-14 07:09:22 EST; 42s ago
Process: 13257 ExecReload=/sbin/multipathd reconfigure (code=exited, status=0/SUCCESS)
Main PID: 1189 (code=killed, signal=SEGV)
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c56786f: stop event chec...84)
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567871: stop event chec...16)
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567873: stop event chec...48)
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567875: stop event chec...80)
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[13257]: error receiving packet
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service: main process exited, code=killed, s...SEGV
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: PID 1189 read from file /run/multipathd/multipathd.pid ...bie.
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Reload failed for Device-Mapper Multipath Device Controller.
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Unit multipathd.service entered failed state.
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
6, # journalctl -xe
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com polkitd[1521]: Registered Authentication Agent for unix-process:13241:655805
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: reconfigure (operator)
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c56786d: stop event checker thre
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c56786f: stop event checker thre
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567871: stop event checker thre
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567873: stop event checker thre
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567875: stop event checker thre
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com kernel: multipathd[1222]: segfault at 0 ip 00007f6091eb2f5b sp 00007f6093431
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[13257]: error receiving packet
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service: main process exited, code=killed, status=11/
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: PID 1189 read from file /run/multipathd/multipathd.pid does not
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Reload failed for Device-Mapper Multipath Device Controller.
-- Subject: Unit multipathd.service has finished reloading its configuration
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit multipathd.service has finished reloading its configuration
--
-- The result is failed.
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Unit multipathd.service entered failed state.
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service failed.
Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com polkitd[1521]: Unregistered Authentication Agent for unix-process:13241:6558
Nov 14 07:09:36 storageqe-06.rhts.eng.bos.redhat.com kernel: multipath[13271]: segfault at 0 ip 00007fb9904d1f5b sp 00007ffde827d
Verified on device-mapper-multipath-0.4.9-116
1, # rpm -qa | grep multipath
device-mapper-multipath-libs-0.4.9-116.el7.x86_64
device-mapper-multipath-devel-0.4.9-116.el7.x86_64
device-mapper-multipath-debuginfo-0.4.9-116.el7.x86_64
device-mapper-multipath-0.4.9-116.el7.x86_64
device-mapper-multipath-sysvinit-0.4.9-116.el7.x86_64
2, edit /etc/multipath.conf
# cat /etc/multipath.conf
defaults {
polling_interval 5
no_path_retry 4
user_friendly_names no
flush_on_last_del yes
fast_io_fail_tmo 5
dev_loss_tmo 30
max_fds 4096
}
devices {
device {
all_devs yes
no_path_retry 4
}
}
3, # multipath -ll
360a98000324669436c2b45666c56786d dm-0 NETAPP ,LUN
size=20G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:0 sdg 8:96 active ready running
| `- 4:0:0:0 sdl 8:176 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:0 sdb 8:16 active ready running
`- 4:0:1:0 sdq 65:0 active ready running
360a98000324669436c2b45666c567875 dm-8 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:4 sdk 8:160 active ready running
| `- 4:0:0:4 sdp 8:240 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:4 sdf 8:80 active ready running
`- 4:0:1:4 sdu 65:64 active ready running
360a98000324669436c2b45666c567873 dm-7 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:3 sdj 8:144 active ready running
| `- 4:0:0:3 sdo 8:224 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:3 sde 8:64 active ready running
`- 4:0:1:3 sdt 65:48 active ready running
360a98000324669436c2b45666c567871 dm-6 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:2 sdi 8:128 active ready running
| `- 4:0:0:2 sdn 8:208 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:2 sdd 8:48 active ready running
`- 4:0:1:2 sds 65:32 active ready running
360a98000324669436c2b45666c56786f dm-5 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:1 sdh 8:112 active ready running
| `- 4:0:0:1 sdm 8:192 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:1 sdc 8:32 active ready running
`- 4:0:1:1 sdr 65:16 active ready running
4, # service multipathd reload
Reloading multipathd configuration (via systemctl): [ OK ]
5, # multipath -ll
360a98000324669436c2b45666c56786d dm-0 NETAPP ,LUN
size=20G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:0 sdg 8:96 active ready running
| `- 4:0:0:0 sdl 8:176 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:0 sdb 8:16 active ready running
`- 4:0:1:0 sdq 65:0 active ready running
360a98000324669436c2b45666c567875 dm-8 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:4 sdk 8:160 active ready running
| `- 4:0:0:4 sdp 8:240 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:4 sdf 8:80 active ready running
`- 4:0:1:4 sdu 65:64 active ready running
360a98000324669436c2b45666c567873 dm-7 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:3 sdj 8:144 active ready running
| `- 4:0:0:3 sdo 8:224 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:3 sde 8:64 active ready running
`- 4:0:1:3 sdt 65:48 active ready running
360a98000324669436c2b45666c567871 dm-6 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:2 sdi 8:128 active ready running
| `- 4:0:0:2 sdn 8:208 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:2 sdd 8:48 active ready running
`- 4:0:1:2 sds 65:32 active ready running
360a98000324669436c2b45666c56786f dm-5 NETAPP ,LUN
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:1 sdh 8:112 active ready running
| `- 4:0:0:1 sdm 8:192 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:1 sdc 8:32 active ready running
`- 4:0:1:1 sdr 65:16 active ready running
Test result:
multipath no longer crashes when trying to modify the features string of built-in device configurations with no feature string.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0884 |
Description of problem: Using this multipath.conf: $ cat /etc/multpath.conf # cat /etc/multipath.conf # VDSM REVISION 1.4 defaults { polling_interval 5 no_path_retry 4 user_friendly_names no flush_on_last_del yes fast_io_fail_tmo 5 dev_loss_tmo 30 max_fds 4096 } devices { device { all_devs yes no_path_retry 4 } } # multipath -ll Segmentation fault (core dumped) # gdb multipath GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/multipath...Reading symbols from /usr/sbin/multipath...(no debugging symbols found)...done. (no debugging symbols found)...done. Missing separate debuginfos, use: debuginfo-install device-mapper-multipath-0.4.9-99.el7_3.1.x86_64 (gdb) run -ll Starting program: /usr/sbin/multipath -ll [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00007ffff727b4ab in __strstr_sse42 () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff727b4ab in __strstr_sse42 () from /lib64/libc.so.6 #1 0x00007ffff751ff1c in add_feature () from /lib64/libmultipath.so.0 #2 0x00007ffff751e2e6 in factorize_hwtable () from /lib64/libmultipath.so.0 #3 0x00007ffff751efc8 in load_config () from /lib64/libmultipath.so.0 #4 0x0000000000402009 in main () Version-Release number of selected component (if applicable): # rpm -qa | grep device-mapper device-mapper-libs-1.02.135-1.el7_3.3.x86_64 device-mapper-persistent-data-0.6.3-1.el7.x86_64 device-mapper-multipath-libs-0.4.9-99.el7_3.1.x86_64 device-mapper-event-libs-1.02.135-1.el7_3.3.x86_64 device-mapper-1.02.135-1.el7_3.3.x86_64 device-mapper-event-1.02.135-1.el7_3.3.x86_64 device-mapper-multipath-0.4.9-99.el7_3.1.x86_64 How reproducible: Always Setting no_path_retry in the all_devs device to fail, multipath works normally.