Bug 587336 - Multipathd exits with -1 in main.c prepare_namespace()
Multipathd exits with -1 in main.c prepare_namespace()
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.3
All Linux
low Severity medium
: rc
: ---
Assigned To: Ben Marzinski
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-29 11:53 EDT by Shane Bradley
Modified: 2010-10-23 11:09 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-01 15:15:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
systemtap script to figure out what is causing the error (1.05 KB, application/octet-stream)
2010-04-29 14:51 EDT, Ben Marzinski
no flags Details

  None (edit)
Description Shane Bradley 2010-04-29 11:53:34 EDT
Description of problem:
Multipathd fails to start in daemon mode. Running without daemoning
mode works with no issue. Multipathd will run fine in foreground.

An strace was collected and showed an EFAULT error:
15926 10:05:47.818185 stat("/etc/localtime",  <unfinished ...>
15925 10:05:47.818209 mount("/var/cache/multipathd", "/sbin", NULL, MS_BIND, NULL <unfinished ...>
15926 10:05:47.818242 <... stat resumed> {st_dev=makedev(253, 0), st_ino=98358, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=2945, st_atime=2010/04/12-10:05:47, st_mtime=2009/10/08-14:56:03, st_ctime=2009/10/08-14:56:03}) = 0
15925 10:05:47.818287 <... mount resumed> ) = -1 EFAULT (Bad address)
15926 10:05:47.818314 sendto(6, "<27>Apr 12 10:05:47 multipathd: "..., 71, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
15925 10:05:47.818341 rt_sigprocmask(SIG_BLOCK, [USR1], [], 8) = 0

  EFAULT One of the pointer arguments points outside the user address space.

Running through gdb determined the failure is in the prepare_namespace():
1349            vector_foreach_slot (conf->binvec, bin,i) {
1357            free_strvec(conf->binvec);
1358            conf->binvec = NULL;
1366            if (mount(CALLOUT_DIR, "/sbin", NULL, MS_BIND, NULL) < 0) {
1367                    condlog(0, "cannot bind ramfs on /sbin");
1368                    return -1;

The values in the strace look valid for the function call.  Customer
has tried running the mount command in the function and replicated the
procedure manual. Everything seems to work just fine when ran
manually.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-23.el5-x86_64

How reproducible:
Everytime on customer's specific system.
Have not been able to recreate in lab.

Steps to Reproduce:
1. /etc/init.d/multipathd start
  
Actual results:
Multipathd start fails.
$ /etc/init.d/multipathd status
multipathd dead but pid file exists

Expected results:
Multipathd should start up correctly.

Additional info:

Kernel: 2.6.18-128.el5
$ grep device-mapper installed-rpms
device-mapper-1.02.28-2.el5-x86_64
device-mapper-event-1.02.28-2.el5-x86_64
device-mapper-multipath-0.4.7-23.el5-x86_64

$ grep udev installed-rpms
udev-095-14.19.el5-x86_64

$ grep util-linux installed-rpms
util-linux-2.13-0.50.el5-x86_64
Comment 2 Ben Marzinski 2010-04-29 14:51:15 EDT
Created attachment 410200 [details]
systemtap script to figure out what is causing the error

run

# stap it800403.stp

When it says
"starting"

run

# service multipathd start

If things work correctly, you should see something like

starting
multipathd entered sys_mount for <unknown> on /var/cache/multipathd
 entered copy_mount_options for ramfs
 exitted copy_mount_options with 0
 entered getname for /var/cache/multipathd
 exitted getname with -139636435095552
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered copy_mount_options for maxsize=3664746
 exitted copy_mount_options with 0
 entered do_mount for /var/cache/multipathd
 exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /sbin
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered getname for /sbin
 exitted getname with -139636435095552
 entered copy_mount_options for /var/cache/multipathd
 exitted copy_mount_options with 0
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered do_mount for /sbin
 exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /bin
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered getname for /bin
 exitted getname with -139636435095552
 entered copy_mount_options for /var/cache/multipathd
 exitted copy_mount_options with 0
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered do_mount for /bin
 exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /tmp
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered getname for /tmp
 exitted getname with -139636435095552
 entered copy_mount_options for /var/cache/multipathd
 exitted copy_mount_options with 0
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered do_mount for /tmp
 exitted do_mount with 0
multipathd exitted sys_mount with 0

If it fails, you should see some exitted lines that return -14 (-EFAULT)
getname() is returning a pointer, so it most likely won't return the same as this one. As long as it doesn't return -14, it should be fine.

Please run this script on the machine that can reproduce the issue, and
copy the results into the bugzilla.
Comment 4 Ben Marzinski 2010-06-30 12:40:05 EDT
Have you gotten a chance to reproduce this with the systemtap script?

Note You need to log in before you can comment on or make changes to this bug.