Bug 1459370
Summary: | Segfault when reading all_devs device with no_path_retry 4 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Nir Soffer <nsoffer> | ||||
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | ||||
Severity: | urgent | Docs Contact: | Marek Suchánek <msuchane> | ||||
Priority: | urgent | ||||||
Version: | 7.3 | CC: | agk, bmarzins, bmcclain, heinzm, jbrassow, lilin, lmiksik, loberman, mgandhi, msnitzer, prajnoha, rhandlin, vanhoof, ylavi | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | device-mapper-multipath-0.4.9-112.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
DM Multipath no longer crashes when adding a feature to an empty string
Previously, the DM Multipath service terminated unexpectedly when it attempted to add a feature to the features string of a built-in device configuration that had no features string. With this update, DM Multipath first checks if the features string exists, and creates one if necessary. As a result, DM Multipath no longer crashes when trying to modify a nonexistent features string.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1510837 (view as bug list) | Environment: | |||||
Last Closed: | 2018-04-10 16:10:28 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1298243, 1420851, 1469559, 1510837 | ||||||
Attachments: |
|
Description
Nir Soffer
2017-06-07 00:37:30 UTC
Can you try the rpms at http://download-node-02.eng.bos.redhat.com/brewroot/scratch/bmarzins/task_13374131/ and see if they fix the issue. There is a bug in add_features when trying to add a new feature to a device configuration that doesn't already have a feature. Created attachment 1286465 [details]
Output of "multipathd show config" with the scratch build
Great. We're in the blockers only phase of rhel-7.4, so how urgent is this bugfix for you? (In reply to Ben Marzinski from comment #5) > Great. We're in the blockers only phase of rhel-7.4, so how urgent is this > bugfix for you? This can wait to 7.4.z. (In reply to Yaniv Lavi from comment #6) > (In reply to Ben Marzinski from comment #5) > > Great. We're in the blockers only phase of rhel-7.4, so how urgent is this > > bugfix for you? > > This can wait to 7.4.z. This configuration was tested by RHV QE on 2016-07-28: https://bugzilla.redhat.com/show_bug.cgi?id=1335176#c31 We are recommending the "no_path_retry 4" option for about a year in the users mailing list: http://lists.ovirt.org/pipermail/users/2016-August/041949.html So this seems to be a regression in 7.3. I don't know about customers cases yet, but I don't think we should wait for them. I would like this fix in 7.3.z. (In reply to Nir Soffer from comment #7) > (In reply to Yaniv Lavi from comment #6) > > (In reply to Ben Marzinski from comment #5) > > > Great. We're in the blockers only phase of rhel-7.4, so how urgent is this > > > bugfix for you? > > > > This can wait to 7.4.z. > > This configuration was tested by RHV QE on 2016-07-28: > https://bugzilla.redhat.com/show_bug.cgi?id=1335176#c31 > > We are recommending the "no_path_retry 4" option for about a year in the > users > mailing list: > http://lists.ovirt.org/pipermail/users/2016-August/041949.html > > So this seems to be a regression in 7.3. > > I don't know about customers cases yet, but I don't think we should wait for > them. > > I would like this fix in 7.3.z. Nir, is correct. Me comment was under the assumption this isn't a regression. Bronce, can you mark as blocker? For what it's worth, this isn't a regression from rhel-7.3. It was broken there too. It was working in rhel-7.2, however. *** Bug 1462134 has been marked as a duplicate of this bug. *** multipath wasn't correctly adding features to a configuration if the current features string was NULL. It now handles this correctly. Reproduced on device-mapper-multipath-0.4.9-111.el7 1, # rpm -qa | grep multipath device-mapper-multipath-0.4.9-111.el7.x86_64 device-mapper-multipath-libs-0.4.9-111.el7.x86_64 2, edit /etc/multipath.conf # cat /etc/multipath.conf defaults { polling_interval 5 no_path_retry 4 user_friendly_names no flush_on_last_del yes fast_io_fail_tmo 5 dev_loss_tmo 30 max_fds 4096 } devices { device { all_devs yes no_path_retry 4 } } 3, # multipath -ll Segmentation fault 4,# service multipathd reload Redirecting to /bin/systemctl reload multipathd.service Job for multipathd.service failed because a fatal signal was delivered to the control process. See "systemctl status multipathd.service" and "journalctl -xe" for details. 5,# systemctl status multipathd.service ● multipathd.service - Device-Mapper Multipath Device Controller Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Tue 2017-11-14 07:09:22 EST; 42s ago Process: 13257 ExecReload=/sbin/multipathd reconfigure (code=exited, status=0/SUCCESS) Main PID: 1189 (code=killed, signal=SEGV) Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c56786f: stop event chec...84) Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567871: stop event chec...16) Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567873: stop event chec...48) Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567875: stop event chec...80) Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[13257]: error receiving packet Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service: main process exited, code=killed, s...SEGV Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: PID 1189 read from file /run/multipathd/multipathd.pid ...bie. Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Reload failed for Device-Mapper Multipath Device Controller. Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Unit multipathd.service entered failed state. Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service failed. Hint: Some lines were ellipsized, use -l to show in full. 6, # journalctl -xe Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com polkitd[1521]: Registered Authentication Agent for unix-process:13241:655805 Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: reconfigure (operator) Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c56786d: stop event checker thre Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c56786f: stop event checker thre Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567871: stop event checker thre Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567873: stop event checker thre Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[1189]: 360a98000324669436c2b45666c567875: stop event checker thre Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com kernel: multipathd[1222]: segfault at 0 ip 00007f6091eb2f5b sp 00007f6093431 Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com multipathd[13257]: error receiving packet Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service: main process exited, code=killed, status=11/ Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: PID 1189 read from file /run/multipathd/multipathd.pid does not Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Reload failed for Device-Mapper Multipath Device Controller. -- Subject: Unit multipathd.service has finished reloading its configuration -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit multipathd.service has finished reloading its configuration -- -- The result is failed. Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: Unit multipathd.service entered failed state. Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com systemd[1]: multipathd.service failed. Nov 14 07:09:22 storageqe-06.rhts.eng.bos.redhat.com polkitd[1521]: Unregistered Authentication Agent for unix-process:13241:6558 Nov 14 07:09:36 storageqe-06.rhts.eng.bos.redhat.com kernel: multipath[13271]: segfault at 0 ip 00007fb9904d1f5b sp 00007ffde827d Verified on device-mapper-multipath-0.4.9-116 1, # rpm -qa | grep multipath device-mapper-multipath-libs-0.4.9-116.el7.x86_64 device-mapper-multipath-devel-0.4.9-116.el7.x86_64 device-mapper-multipath-debuginfo-0.4.9-116.el7.x86_64 device-mapper-multipath-0.4.9-116.el7.x86_64 device-mapper-multipath-sysvinit-0.4.9-116.el7.x86_64 2, edit /etc/multipath.conf # cat /etc/multipath.conf defaults { polling_interval 5 no_path_retry 4 user_friendly_names no flush_on_last_del yes fast_io_fail_tmo 5 dev_loss_tmo 30 max_fds 4096 } devices { device { all_devs yes no_path_retry 4 } } 3, # multipath -ll 360a98000324669436c2b45666c56786d dm-0 NETAPP ,LUN size=20G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:0 sdg 8:96 active ready running | `- 4:0:0:0 sdl 8:176 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:0 sdb 8:16 active ready running `- 4:0:1:0 sdq 65:0 active ready running 360a98000324669436c2b45666c567875 dm-8 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:4 sdk 8:160 active ready running | `- 4:0:0:4 sdp 8:240 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:4 sdf 8:80 active ready running `- 4:0:1:4 sdu 65:64 active ready running 360a98000324669436c2b45666c567873 dm-7 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:3 sdj 8:144 active ready running | `- 4:0:0:3 sdo 8:224 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:3 sde 8:64 active ready running `- 4:0:1:3 sdt 65:48 active ready running 360a98000324669436c2b45666c567871 dm-6 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:2 sdi 8:128 active ready running | `- 4:0:0:2 sdn 8:208 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:2 sdd 8:48 active ready running `- 4:0:1:2 sds 65:32 active ready running 360a98000324669436c2b45666c56786f dm-5 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:1 sdh 8:112 active ready running | `- 4:0:0:1 sdm 8:192 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:1 sdc 8:32 active ready running `- 4:0:1:1 sdr 65:16 active ready running 4, # service multipathd reload Reloading multipathd configuration (via systemctl): [ OK ] 5, # multipath -ll 360a98000324669436c2b45666c56786d dm-0 NETAPP ,LUN size=20G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:0 sdg 8:96 active ready running | `- 4:0:0:0 sdl 8:176 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:0 sdb 8:16 active ready running `- 4:0:1:0 sdq 65:0 active ready running 360a98000324669436c2b45666c567875 dm-8 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:4 sdk 8:160 active ready running | `- 4:0:0:4 sdp 8:240 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:4 sdf 8:80 active ready running `- 4:0:1:4 sdu 65:64 active ready running 360a98000324669436c2b45666c567873 dm-7 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:3 sdj 8:144 active ready running | `- 4:0:0:3 sdo 8:224 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:3 sde 8:64 active ready running `- 4:0:1:3 sdt 65:48 active ready running 360a98000324669436c2b45666c567871 dm-6 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:2 sdi 8:128 active ready running | `- 4:0:0:2 sdn 8:208 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:2 sdd 8:48 active ready running `- 4:0:1:2 sds 65:32 active ready running 360a98000324669436c2b45666c56786f dm-5 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:1 sdh 8:112 active ready running | `- 4:0:0:1 sdm 8:192 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:1 sdc 8:32 active ready running `- 4:0:1:1 sdr 65:16 active ready running Test result: multipath no longer crashes when trying to modify the features string of built-in device configurations with no feature string. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0884 |