Description of problem: Multipathd fails to start in daemon mode. Running without daemoning mode works with no issue. Multipathd will run fine in foreground. An strace was collected and showed an EFAULT error: 15926 10:05:47.818185 stat("/etc/localtime", <unfinished ...> 15925 10:05:47.818209 mount("/var/cache/multipathd", "/sbin", NULL, MS_BIND, NULL <unfinished ...> 15926 10:05:47.818242 <... stat resumed> {st_dev=makedev(253, 0), st_ino=98358, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=2945, st_atime=2010/04/12-10:05:47, st_mtime=2009/10/08-14:56:03, st_ctime=2009/10/08-14:56:03}) = 0 15925 10:05:47.818287 <... mount resumed> ) = -1 EFAULT (Bad address) 15926 10:05:47.818314 sendto(6, "<27>Apr 12 10:05:47 multipathd: "..., 71, MSG_NOSIGNAL, NULL, 0 <unfinished ...> 15925 10:05:47.818341 rt_sigprocmask(SIG_BLOCK, [USR1], [], 8) = 0 EFAULT One of the pointer arguments points outside the user address space. Running through gdb determined the failure is in the prepare_namespace(): 1349 vector_foreach_slot (conf->binvec, bin,i) { 1357 free_strvec(conf->binvec); 1358 conf->binvec = NULL; 1366 if (mount(CALLOUT_DIR, "/sbin", NULL, MS_BIND, NULL) < 0) { 1367 condlog(0, "cannot bind ramfs on /sbin"); 1368 return -1; The values in the strace look valid for the function call. Customer has tried running the mount command in the function and replicated the procedure manual. Everything seems to work just fine when ran manually. Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.7-23.el5-x86_64 How reproducible: Everytime on customer's specific system. Have not been able to recreate in lab. Steps to Reproduce: 1. /etc/init.d/multipathd start Actual results: Multipathd start fails. $ /etc/init.d/multipathd status multipathd dead but pid file exists Expected results: Multipathd should start up correctly. Additional info: Kernel: 2.6.18-128.el5 $ grep device-mapper installed-rpms device-mapper-1.02.28-2.el5-x86_64 device-mapper-event-1.02.28-2.el5-x86_64 device-mapper-multipath-0.4.7-23.el5-x86_64 $ grep udev installed-rpms udev-095-14.19.el5-x86_64 $ grep util-linux installed-rpms util-linux-2.13-0.50.el5-x86_64
Created attachment 410200 [details] systemtap script to figure out what is causing the error run # stap it800403.stp When it says "starting" run # service multipathd start If things work correctly, you should see something like starting multipathd entered sys_mount for <unknown> on /var/cache/multipathd entered copy_mount_options for ramfs exitted copy_mount_options with 0 entered getname for /var/cache/multipathd exitted getname with -139636435095552 entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered copy_mount_options for maxsize=3664746 exitted copy_mount_options with 0 entered do_mount for /var/cache/multipathd exitted do_mount with 0 multipathd exitted sys_mount with 0 multipathd entered sys_mount for /var/cache/multipathd on /sbin entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered getname for /sbin exitted getname with -139636435095552 entered copy_mount_options for /var/cache/multipathd exitted copy_mount_options with 0 entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered do_mount for /sbin exitted do_mount with 0 multipathd exitted sys_mount with 0 multipathd entered sys_mount for /var/cache/multipathd on /bin entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered getname for /bin exitted getname with -139636435095552 entered copy_mount_options for /var/cache/multipathd exitted copy_mount_options with 0 entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered do_mount for /bin exitted do_mount with 0 multipathd exitted sys_mount with 0 multipathd entered sys_mount for /var/cache/multipathd on /tmp entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered getname for /tmp exitted getname with -139636435095552 entered copy_mount_options for /var/cache/multipathd exitted copy_mount_options with 0 entered copy_mount_options for <unknown> exitted copy_mount_options with 0 entered do_mount for /tmp exitted do_mount with 0 multipathd exitted sys_mount with 0 If it fails, you should see some exitted lines that return -14 (-EFAULT) getname() is returning a pointer, so it most likely won't return the same as this one. As long as it doesn't return -14, it should be fine. Please run this script on the machine that can reproduce the issue, and copy the results into the bugzilla.
Have you gotten a chance to reproduce this with the systemtap script?