Bug 1069584
| Summary: | multipathd core dumped if the fast_io_fail_tmo is set but no value is provided. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Xiaowei Li <xiaoli> | ||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.0 | CC: | agk, bdonahue, bmarzins, heinzm, jcastillo, msnitzer, prajnoha, qcai, zkabelac | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | device-mapper-multipath-0.4.9-68.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause: Multipath wasn't checking if the pointer for the fail_io_fail_tmo value was NULL before using it as a string.
Consequence: Multipath would crash on configuration if the fast_io_fail_tmo option was added to /etc/multipath.conf with no value
Fix: Multipath now makes sure to check these pointers before trying to use them.
Result: Multipath will not crash if users add the fail_io_fail_tmo option to /etc/multipath.conf and forget to add a value.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-03-05 08:25:38 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 867376 [details]
propsed patch v1
Created attachment 867494 [details]
propsed patch v2
Xiaowei, I've created a RHEL 7 x86_64 build with the latest version of the patch. It is here: https://brewweb.devel.redhat.com/taskinfo?taskID=7105361 In case you want to try it. When I ran some tests, I got: [root@client abrt]# multipathd reconfigure fail [root@client abrt]# systemctl status multipathd.service multipathd.service - Device-Mapper Multipath Device Controller Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled) Active: active (running) since Tue 2014-02-25 19:11:26 CET; 30s ago Process: 4104 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS) Process: 4103 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS) Main PID: 4108 (multipathd) CGroup: /system.slice/multipathd.service └─4108 /sbin/multipathd Feb 25 19:11:25 client.example.org modprobe[4103]: Executing: /sbin/modprobe dm-multipath Feb 25 19:11:26 client.example.org multipathd[4104]: Executing: /sbin/multipathd Feb 25 19:11:44 client.example.org multipathd[4108]: path checkers start up Feb 25 19:11:44 client.example.org multipathd[4108]: reconfigure (operator) Feb 25 19:11:44 client.example.org multipathd[4108]: error parsing config file (In reply to Jose Castillo from comment #2) > I think the patch attached solve the problem for both fast_io_fail_tmo and > dev_loss_tmo. Xiaowei, did you have any problems with dev_loss_tmo as well? > > The patch is based on the same approach for the options "rr_min_io_rq" and > "flush_on_last_del", but lets see if Ben think is correct. when testing the dev_loss_tmo, get the following result. please let me know if it's the same issue and if you want me to file another bug for the dev_loss_tmo. 1. does not provide the value to dev_loss_tmo section -- pass for example: defaults { user_friendly_names yes dev_loss_tmo } # multipathd reconfig fail # service multipathd status >>> snip >>> : error parsing config file >>> snip >>> 2. provide the wrong string value to dev_loss_tmo section -- fail, coredump for example: defaults { user_friendly_names yes dev_loss_tmo aaa } Core was generated by `/sbin/multipathd'. Program terminated with signal 11, Segmentation fault. #0 __GI___libc_free (mem=0x63687461706d) at malloc.c:2903 2903 if (chunk_is_mmapped(p)) /* release mmapped memory. */ (gdb) bt #0 __GI___libc_free (mem=0x63687461706d) at malloc.c:2903 #1 0x00007fca831f4a9a in xfree (string=<optimized out>) at ../xfree.c:49 #2 0x00007fca82b7d09e in free_config (conf=conf@entry=0x7fca7400f920) at config.c:473 #3 0x00007fca83a8f033 in reconfigure (vecs=0x7fca840acf00) at main.c:1418 #4 0x00007fca83a92157 in parse_cmd (cmd=cmd@entry=0x7fca7400cee0 "reconfig \n", reply=reply@entry=0x7fca83a3dd58, len=len@entry=0x7fca83a3dd40, data=data@entry=0x7fca840acf00) at cli.c:399 #5 0x00007fca83a8d7f5 in uxsock_trigger (str=0x7fca7400cee0 "reconfig \n", reply=reply@entry=0x7fca83a3dd58, len=len@entry=0x7fca83a3dd40, trigger_data=trigger_data@entry=0x7fca840acf00) at main.c:744 #6 0x00007fca83a91052 in uxsock_listen ( uxsock_trigger=uxsock_trigger@entry=0x7fca83a8d7b0 <uxsock_trigger>, trigger_data=trigger_data@entry=0x7fca840acf00) at uxlsnr.c:168 #7 0x00007fca83a8e19f in uxlsnrloop (ap=0x7fca840acf00) at main.c:905 #8 0x00007fca83650df3 in start_thread (arg=0x7fca83a3e700) at pthread_create.c:308 #9 0x00007fca8247e39d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) l 2898 if (mem == 0) /* free(0) has no effect */ 2899 return; 2900 2901 p = mem2chunk(mem); 2902 2903 if (chunk_is_mmapped(p)) /* release mmapped memory. */ 2904 { 2905 /* see if the dynamic brk/mmap threshold needs adjusting */ 2906 if (!mp_.no_dyn_threshold 2907 && p->size > mp_.mmap_threshold also hit the similar core dump when passing the wrong string value to the fast_io_fail_tmo. (In reply to Jose Castillo from comment #6) > Xiaowei, I've created a RHEL 7 x86_64 build with the latest version of the > patch. It is here: > > https://brewweb.devel.redhat.com/taskinfo?taskID=7105361 > > In case you want to try it. When I ran some tests, I got: > > [root@client abrt]# multipathd reconfigure > fail > [root@client abrt]# systemctl status multipathd.service > multipathd.service - Device-Mapper Multipath Device Controller > Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled) > Active: active (running) since Tue 2014-02-25 19:11:26 CET; 30s ago > Process: 4104 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS) > Process: 4103 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, > status=0/SUCCESS) > Main PID: 4108 (multipathd) > CGroup: /system.slice/multipathd.service > └─4108 /sbin/multipathd > > Feb 25 19:11:25 client.example.org modprobe[4103]: Executing: /sbin/modprobe > dm-multipath > Feb 25 19:11:26 client.example.org multipathd[4104]: Executing: > /sbin/multipathd > Feb 25 19:11:44 client.example.org multipathd[4108]: path checkers start up > Feb 25 19:11:44 client.example.org multipathd[4108]: reconfigure (operator) > Feb 25 19:11:44 client.example.org multipathd[4108]: error parsing config > file tested the rpm. the original issue was fixed. but we still have the following issues: 1. the patch cannot fix the issue that fast_io_fail_tmo is in device sector not in the default section devices { device { vendor "DGC" product ".*" product_blacklist "LUNZ" fast_io_fail_tmo } } 2. the patch cannot fix the issue in comment 7. also, let me know if you want me to file the new bug to track the new issues since your patch can fix the original issue. (In reply to Xiaowei Li from comment #9) > tested the rpm. the original issue was fixed. but we still have the > following issues: > > 1. the patch cannot fix the issue that fast_io_fail_tmo is in device sector > not in the default section > > devices { > device { > vendor "DGC" > product ".*" > product_blacklist "LUNZ" > fast_io_fail_tmo > } > > } > This may be a new, different problem, but see below first. > 2. the patch cannot fix the issue in comment 7. > Were you using in this test the patched version or the original one? I ask you because I have the feeling that is the patch itself what causes the coredump in comment #7. > also, let me know if you want me to file the new bug to track the new issues > since your patch can fix the original issue. Fill a new bugzilla only if the test in #7 was done with the original multipath package, not the patched one. If that is not the case, then I'll fix the patch to avoid this problem. since it's not a blocker and RHEL 7.0 is going to RC we should move it to 7.1. Patch applied. Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0367.html |
Description of problem: Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.9-64.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. multipath.conf, set fast_io_fail_tmo but forget to provide the value. defaults { user_friendly_names yes fast_io_fail_tmo } 2. multipathd reconfig error receiving packet 3. Actual results: Feb 25 05:48:52 sun-x2270m2-01 kernel: [ 689.330054] multipathd[1592]: segfault at 0 ip 00007f14ea615a61 sp 00007f14ebb68ae8 error 4 in libc-2.17.so[7f14ea4b3000+1b6000] Feb 25 05:48:52 sun-x2270m2-01 kernel: multipathd[1592]: segfault at 0 ip 00007f14ea615a61 sp 00007f14ebb68ae8 error 4 in libc-2.17.so[7f14ea4b3000+1b6000] Feb 25 05:48:52 sun-x2270m2-01 abrt-hook-ccpp: Saved core dump of pid 1589 (/usr/sbin/multipathd) to /var/tmp/abrt/ccpp-2014-02-25-05:48:52-1589 (3534848 bytes) Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/sbin/multipathd'. Program terminated with signal 11, Segmentation fault. #0 __strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:38 38 movdqu (%rdi), %xmm1 (gdb) bt #0 __strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:38 #1 0x00007f0476634a32 in hw_fast_io_fail_handler (strvec=<optimized out>) at dict.c:964 #2 0x00007f047662673c in process_stream (keywords=0x7f0468028fb0) at parser.c:515 #3 0x00007f0476626756 in process_stream (keywords=0x7f0468038550) at parser.c:519 #4 0x00007f0476626756 in process_stream (keywords=0x7f0468038530) at parser.c:519 #5 0x00007f04766267ec in init_data (conf_file=conf_file@entry=0x7f0477543b2b "/etc/multipath.conf", init_keywords=0x7f0476637540 <init_keywords>) at parser.c:571 #6 0x00007f047662c3b6 in load_config (file=file@entry=0x7f0477543b2b "/etc/multipath.conf", udev=<optimized out>) at config.c:581 #7 0x00007f047753e006 in reconfigure (vecs=0x7f047796f000) at main.c:1414 #8 0x00007f0477541157 in parse_cmd (cmd=cmd@entry=0x7f04680827d0 "reconfig \n", reply=reply@entry=0x7f04774ecd58, len=len@entry=0x7f04774ecd40, data=data@entry=0x7f047796f000) at cli.c:399 #9 0x00007f047753c7f5 in uxsock_trigger (str=0x7f04680827d0 "reconfig \n", reply=reply@entry=0x7f04774ecd58, len=len@entry=0x7f04774ecd40, trigger_data=trigger_data@entry=0x7f047796f000) at main.c:744 #10 0x00007f0477540052 in uxsock_listen ( uxsock_trigger=uxsock_trigger@entry=0x7f047753c7b0 <uxsock_trigger>, trigger_data=trigger_data@entry=0x7f047796f000) at uxlsnr.c:168 #11 0x00007f047753d19f in uxlsnrloop (ap=0x7f047796f000) at main.c:905 #12 0x00007f04770ffdf3 in start_thread (arg=0x7f04774ed700) at pthread_create.c:308 #13 0x00007f0475f2d39d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) l 33 mov %edi, %ecx 34 and $0x3f, %ecx 35 pxor %xmm0, %xmm0 36 cmp $0x30, %ecx 37 ja L(next) 38 movdqu (%rdi), %xmm1 39 pcmpeqb %xmm1, %xmm0 40 pmovmskb %xmm0, %edx 41 test %edx, %edx 42 jnz L(exit_less16) Expected results: Additional info: