Bug 587336
| Summary: | Multipathd exits with -1 in main.c prepare_namespace() | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Shane Bradley <sbradley> | ||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.3 | CC: | agk, bmarzins, bmr, christophe.varoqui, dwysocha, heinzm, junichi.nomura, kueda, lmb, mbroz, prajnoha, prockai, tao | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2010-07-01 19:15:05 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 410200 [details]
systemtap script to figure out what is causing the error
run
# stap it800403.stp
When it says
"starting"
run
# service multipathd start
If things work correctly, you should see something like
starting
multipathd entered sys_mount for <unknown> on /var/cache/multipathd
entered copy_mount_options for ramfs
exitted copy_mount_options with 0
entered getname for /var/cache/multipathd
exitted getname with -139636435095552
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered copy_mount_options for maxsize=3664746
exitted copy_mount_options with 0
entered do_mount for /var/cache/multipathd
exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /sbin
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered getname for /sbin
exitted getname with -139636435095552
entered copy_mount_options for /var/cache/multipathd
exitted copy_mount_options with 0
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered do_mount for /sbin
exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /bin
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered getname for /bin
exitted getname with -139636435095552
entered copy_mount_options for /var/cache/multipathd
exitted copy_mount_options with 0
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered do_mount for /bin
exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /tmp
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered getname for /tmp
exitted getname with -139636435095552
entered copy_mount_options for /var/cache/multipathd
exitted copy_mount_options with 0
entered copy_mount_options for <unknown>
exitted copy_mount_options with 0
entered do_mount for /tmp
exitted do_mount with 0
multipathd exitted sys_mount with 0
If it fails, you should see some exitted lines that return -14 (-EFAULT)
getname() is returning a pointer, so it most likely won't return the same as this one. As long as it doesn't return -14, it should be fine.
Please run this script on the machine that can reproduce the issue, and
copy the results into the bugzilla.
Have you gotten a chance to reproduce this with the systemtap script? |
Description of problem: Multipathd fails to start in daemon mode. Running without daemoning mode works with no issue. Multipathd will run fine in foreground. An strace was collected and showed an EFAULT error: 15926 10:05:47.818185 stat("/etc/localtime", <unfinished ...> 15925 10:05:47.818209 mount("/var/cache/multipathd", "/sbin", NULL, MS_BIND, NULL <unfinished ...> 15926 10:05:47.818242 <... stat resumed> {st_dev=makedev(253, 0), st_ino=98358, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=2945, st_atime=2010/04/12-10:05:47, st_mtime=2009/10/08-14:56:03, st_ctime=2009/10/08-14:56:03}) = 0 15925 10:05:47.818287 <... mount resumed> ) = -1 EFAULT (Bad address) 15926 10:05:47.818314 sendto(6, "<27>Apr 12 10:05:47 multipathd: "..., 71, MSG_NOSIGNAL, NULL, 0 <unfinished ...> 15925 10:05:47.818341 rt_sigprocmask(SIG_BLOCK, [USR1], [], 8) = 0 EFAULT One of the pointer arguments points outside the user address space. Running through gdb determined the failure is in the prepare_namespace(): 1349 vector_foreach_slot (conf->binvec, bin,i) { 1357 free_strvec(conf->binvec); 1358 conf->binvec = NULL; 1366 if (mount(CALLOUT_DIR, "/sbin", NULL, MS_BIND, NULL) < 0) { 1367 condlog(0, "cannot bind ramfs on /sbin"); 1368 return -1; The values in the strace look valid for the function call. Customer has tried running the mount command in the function and replicated the procedure manual. Everything seems to work just fine when ran manually. Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.7-23.el5-x86_64 How reproducible: Everytime on customer's specific system. Have not been able to recreate in lab. Steps to Reproduce: 1. /etc/init.d/multipathd start Actual results: Multipathd start fails. $ /etc/init.d/multipathd status multipathd dead but pid file exists Expected results: Multipathd should start up correctly. Additional info: Kernel: 2.6.18-128.el5 $ grep device-mapper installed-rpms device-mapper-1.02.28-2.el5-x86_64 device-mapper-event-1.02.28-2.el5-x86_64 device-mapper-multipath-0.4.7-23.el5-x86_64 $ grep udev installed-rpms udev-095-14.19.el5-x86_64 $ grep util-linux installed-rpms util-linux-2.13-0.50.el5-x86_64