Bug 587336 - Multipathd exits with -1 in main.c prepare_namespace()
Summary: Multipathd exits with -1 in main.c prepare_namespace()
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-29 15:53 UTC by Shane Bradley
Modified: 2018-10-27 14:53 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-01 19:15:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
systemtap script to figure out what is causing the error (1.05 KB, application/octet-stream)
2010-04-29 18:51 UTC, Ben Marzinski
no flags Details

Description Shane Bradley 2010-04-29 15:53:34 UTC
Description of problem:
Multipathd fails to start in daemon mode. Running without daemoning
mode works with no issue. Multipathd will run fine in foreground.

An strace was collected and showed an EFAULT error:
15926 10:05:47.818185 stat("/etc/localtime",  <unfinished ...>
15925 10:05:47.818209 mount("/var/cache/multipathd", "/sbin", NULL, MS_BIND, NULL <unfinished ...>
15926 10:05:47.818242 <... stat resumed> {st_dev=makedev(253, 0), st_ino=98358, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=2945, st_atime=2010/04/12-10:05:47, st_mtime=2009/10/08-14:56:03, st_ctime=2009/10/08-14:56:03}) = 0
15925 10:05:47.818287 <... mount resumed> ) = -1 EFAULT (Bad address)
15926 10:05:47.818314 sendto(6, "<27>Apr 12 10:05:47 multipathd: "..., 71, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
15925 10:05:47.818341 rt_sigprocmask(SIG_BLOCK, [USR1], [], 8) = 0

  EFAULT One of the pointer arguments points outside the user address space.

Running through gdb determined the failure is in the prepare_namespace():
1349            vector_foreach_slot (conf->binvec, bin,i) {
1357            free_strvec(conf->binvec);
1358            conf->binvec = NULL;
1366            if (mount(CALLOUT_DIR, "/sbin", NULL, MS_BIND, NULL) < 0) {
1367                    condlog(0, "cannot bind ramfs on /sbin");
1368                    return -1;

The values in the strace look valid for the function call.  Customer
has tried running the mount command in the function and replicated the
procedure manual. Everything seems to work just fine when ran
manually.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-23.el5-x86_64

How reproducible:
Everytime on customer's specific system.
Have not been able to recreate in lab.

Steps to Reproduce:
1. /etc/init.d/multipathd start
  
Actual results:
Multipathd start fails.
$ /etc/init.d/multipathd status
multipathd dead but pid file exists

Expected results:
Multipathd should start up correctly.

Additional info:

Kernel: 2.6.18-128.el5
$ grep device-mapper installed-rpms
device-mapper-1.02.28-2.el5-x86_64
device-mapper-event-1.02.28-2.el5-x86_64
device-mapper-multipath-0.4.7-23.el5-x86_64

$ grep udev installed-rpms
udev-095-14.19.el5-x86_64

$ grep util-linux installed-rpms
util-linux-2.13-0.50.el5-x86_64

Comment 2 Ben Marzinski 2010-04-29 18:51:15 UTC
Created attachment 410200 [details]
systemtap script to figure out what is causing the error

run

# stap it800403.stp

When it says
"starting"

run

# service multipathd start

If things work correctly, you should see something like

starting
multipathd entered sys_mount for <unknown> on /var/cache/multipathd
 entered copy_mount_options for ramfs
 exitted copy_mount_options with 0
 entered getname for /var/cache/multipathd
 exitted getname with -139636435095552
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered copy_mount_options for maxsize=3664746
 exitted copy_mount_options with 0
 entered do_mount for /var/cache/multipathd
 exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /sbin
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered getname for /sbin
 exitted getname with -139636435095552
 entered copy_mount_options for /var/cache/multipathd
 exitted copy_mount_options with 0
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered do_mount for /sbin
 exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /bin
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered getname for /bin
 exitted getname with -139636435095552
 entered copy_mount_options for /var/cache/multipathd
 exitted copy_mount_options with 0
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered do_mount for /bin
 exitted do_mount with 0
multipathd exitted sys_mount with 0
multipathd entered sys_mount for /var/cache/multipathd on /tmp
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered getname for /tmp
 exitted getname with -139636435095552
 entered copy_mount_options for /var/cache/multipathd
 exitted copy_mount_options with 0
 entered copy_mount_options for <unknown>
 exitted copy_mount_options with 0
 entered do_mount for /tmp
 exitted do_mount with 0
multipathd exitted sys_mount with 0

If it fails, you should see some exitted lines that return -14 (-EFAULT)
getname() is returning a pointer, so it most likely won't return the same as this one. As long as it doesn't return -14, it should be fine.

Please run this script on the machine that can reproduce the issue, and
copy the results into the bugzilla.

Comment 4 Ben Marzinski 2010-06-30 16:40:05 UTC
Have you gotten a chance to reproduce this with the systemtap script?


Note You need to log in before you can comment on or make changes to this bug.