Bug 838081

Summary: ocaml/t/guestfs_500_parallel_mount_local crashes in caml_thread_reinitialize
Product: [Community] Virtualization Tools Reporter: Richard W.M. Jones <rjones>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED DEFERRED QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: dyasny, mbooth
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 11:39:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard W.M. Jones 2012-07-06 12:45:38 UTC
Description of problem:

In the test suite, ocaml/t/guestfs_500_parallel_mount_local (both
bytecode and native code) crashes occasionally.  With core dumps
enabled you will see a segfault in caml_thread_reinitialize called
in the child process after the program forks:

#0  0x00000000004370c8 in caml_thread_reinitialize ()
#1  0x00000030e12bd7c6 in __libc_fork ()
    at ../nptl/sysdeps/unix/sysv/linux/fork.c:188
#2  0x00000030e1a108c5 in __fork ()
    at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25
#3  0x00000030ee619e86 in fuse_mount_fusermount (
    mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", 
    opts=0x7f5e70000f30 "rw,nosuid,nodev", quiet=quiet@entry=0) at mount.c:338
#4  0x00000030ee61aa18 in fuse_kern_mount (
    mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", 
    args=args@entry=0x7f5e7fffe9d0) at mount.c:581
#5  0x00000030ee616d75 in fuse_mount_compat25 (
    mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", 
    args=args@entry=0x7f5e7fffe9d0) at helper.c:447
#6  0x00000030ee616dc6 in fuse_mount_common (
    mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", 
    args=args@entry=0x7f5e7fffe9d0) at helper.c:210
#7  0x00000030ee617115 in fuse_mount (
    mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", 
    args=args@entry=0x7f5e7fffe9d0) at helper.c:223
#8  0x00007f5e98e374be in guestfs__mount_local (g=g@entry=0x7f5e700008c0, 
    localmountpoint=localmountpoint@entry=0x7f5e70000d00 "mp7", 
    optargs=optargs@entry=0x7f5e7fffeab0) at fuse.c:965
#9  0x00007f5e98dda7c6 in guestfs_mount_local_argv (g=g@entry=0x7f5e700008c0, 
    localmountpoint=localmountpoint@entry=0x7f5e70000d00 "mp7", 
    optargs=optargs@entry=0x7f5e7fffeab0) at actions.c:4191
#10 0x0000000000426e61 in ocaml_guestfs_mount_local (gv=-4485090715960753727, 
    readonlyv=5, optionsv=209939302224, cachetimeoutv=39024112, 
    debugcallsv=96, localmountpointv=72340172838076673)
    at guestfs_c_actions.c:9334
#11 0x0000000000451aa9 in caml_interprete ()
#12 0x000000000044db67 in caml_callbackN_exn ()
#13 0x000000000044dbc5 in caml_callback_exn ()
#14 0x0000000000436ec9 in caml_thread_start ()
#15 0x00000030e1a07ef5 in start_thread (arg=0x7f5e7ffff700)
    at pthread_create.c:308
#16 0x00000030e12f4ead in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114

Note what happens here is that g#mount_local calls fuse_mount
which eventually forks the fusermount subprocess:

http://fuse.git.sourceforge.net/git/gitweb.cgi?p=fuse/fuse;a=blob;f=lib/mount.c;h=6a9da9eefd5641fc2b790738a1e739525016e5b5;hb=HEAD#l359

*After* the fork (in the new subprocess) there is some OCaml
runtime code (caml_thread_reinitialize) to delete the stacks of
the other threads which no longer exist:

http://caml.inria.fr/mantis/view.php?id=4577

This runtime code segfaults when traversing the list.

Version-Release number of selected component (if applicable):

1.19.16

How reproducible:

Frequent

Steps to Reproduce:
1. ulimit -Hc unlimited
2. ulimit -Sc unlimited
3. Repeatedly call: make -C ocaml check

Actual results:

You'll see core dumps in the ocaml/ directory.

Comment 1 Richard W.M. Jones 2012-07-06 13:12:30 UTC
I pushed this hack which appears to work around the
problem:
https://github.com/libguestfs/libguestfs/commit/ad7c4498f66f37c4219242c6df04d28e9ee7877f

Comment 2 Richard W.M. Jones 2012-07-16 12:15:51 UTC
The workaround doesn't cure the problem, so I have reverted it.

Comment 3 Richard W.M. Jones 2012-07-16 12:22:29 UTC
(In reply to comment #2)
> The workaround doesn't cure the problem, so I have reverted it.

Ignore that.

I saw what seems to be the same bug, affecting a different
piece of code.  Perhaps adding Gc.compact could work around
this place too?

#0  0x00000000004abcc8 in caml_thread_reinitialize ()
#1  0x0000003fc26bab9e in __libc_fork ()
    at ../nptl/sysdeps/unix/sysv/linux/fork.c:188
#2  0x0000003fc266cf34 in _IO_new_proc_open (fp=fp@entry=0x7f7868000f30,
    command=command@entry=0x7f787a7fa2c0 "LC_ALL=C '/bin/qemu-kvm' -nographic -\
version 2>/dev/null", mode=<optimized out>, mode@entry=0x7f7881ecd090 "r")
    at iopopen.c:187
#3  0x0000003fc266d1c7 in _IO_new_popen (
    command=0x7f787a7fa2c0 "LC_ALL=C '/bin/qemu-kvm' -nographic -version 2>/dev\
/null", mode=0x7f7881ecd090 "r") at iopopen.c:308
#4  0x00007f7881eab88d in test_qemu_cmd (g=g@entry=0x7f78680008c0,
    cmd=cmd@entry=0x7f787a7fa2c0 "LC_ALL=C '/bin/qemu-kvm' -nographic -version \
2>/dev/null", ret=ret@entry=0x7f7868000910) at launch.c:1428
#5  0x00007f7881eaba04 in test_qemu (g=0x7f78680008c0) at launch.c:1410
#6  0x00007f7881eabaea in qemu_supports (g=g@entry=0x7f78680008c0,
    option=option@entry=0x0) at launch.c:1479
#7  0x00007f7881eac921 in launch_appliance (g=g@entry=0x7f78680008c0)
    at launch.c:586
#8  0x00007f7881ead991 in guestfs__launch (g=g@entry=0x7f78680008c0)
    at launch.c:530
#9  0x00007f7881e4fda8 in guestfs_launch (g=g@entry=0x7f78680008c0)
    at actions.c:1119
#10 0x0000000000496ad8 in ocaml_guestfs_launch (gv=578721382704613384)
    at guestfs_c_actions.c:7544
#11 0x00000000004c3f9a in caml_c_call ()
#12 0x0000000000000001 in ?? ()

Comment 4 Richard W.M. Jones 2013-03-06 11:39:07 UTC
I suspect this is a bug in OCaml itself, specifically in
the rather hairy fork handling (see
http://caml.inria.fr/mantis/view.php?id=4577).

In any case, we no longer do a multithreaded test of
mount-local in OCaml.  It was rewritten in C.  So this
bug doesn't apply to libguestfs any longer.