Description of problem: In the test suite, ocaml/t/guestfs_500_parallel_mount_local (both bytecode and native code) crashes occasionally. With core dumps enabled you will see a segfault in caml_thread_reinitialize called in the child process after the program forks: #0 0x00000000004370c8 in caml_thread_reinitialize () #1 0x00000030e12bd7c6 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:188 #2 0x00000030e1a108c5 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25 #3 0x00000030ee619e86 in fuse_mount_fusermount ( mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", opts=0x7f5e70000f30 "rw,nosuid,nodev", quiet=quiet@entry=0) at mount.c:338 #4 0x00000030ee61aa18 in fuse_kern_mount ( mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", args=args@entry=0x7f5e7fffe9d0) at mount.c:581 #5 0x00000030ee616d75 in fuse_mount_compat25 ( mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", args=args@entry=0x7f5e7fffe9d0) at helper.c:447 #6 0x00000030ee616dc6 in fuse_mount_common ( mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", args=args@entry=0x7f5e7fffe9d0) at helper.c:210 #7 0x00000030ee617115 in fuse_mount ( mountpoint=mountpoint@entry=0x7f5e70000d00 "mp7", args=args@entry=0x7f5e7fffe9d0) at helper.c:223 #8 0x00007f5e98e374be in guestfs__mount_local (g=g@entry=0x7f5e700008c0, localmountpoint=localmountpoint@entry=0x7f5e70000d00 "mp7", optargs=optargs@entry=0x7f5e7fffeab0) at fuse.c:965 #9 0x00007f5e98dda7c6 in guestfs_mount_local_argv (g=g@entry=0x7f5e700008c0, localmountpoint=localmountpoint@entry=0x7f5e70000d00 "mp7", optargs=optargs@entry=0x7f5e7fffeab0) at actions.c:4191 #10 0x0000000000426e61 in ocaml_guestfs_mount_local (gv=-4485090715960753727, readonlyv=5, optionsv=209939302224, cachetimeoutv=39024112, debugcallsv=96, localmountpointv=72340172838076673) at guestfs_c_actions.c:9334 #11 0x0000000000451aa9 in caml_interprete () #12 0x000000000044db67 in caml_callbackN_exn () #13 0x000000000044dbc5 in caml_callback_exn () #14 0x0000000000436ec9 in caml_thread_start () #15 0x00000030e1a07ef5 in start_thread (arg=0x7f5e7ffff700) at pthread_create.c:308 #16 0x00000030e12f4ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114 Note what happens here is that g#mount_local calls fuse_mount which eventually forks the fusermount subprocess: http://fuse.git.sourceforge.net/git/gitweb.cgi?p=fuse/fuse;a=blob;f=lib/mount.c;h=6a9da9eefd5641fc2b790738a1e739525016e5b5;hb=HEAD#l359 *After* the fork (in the new subprocess) there is some OCaml runtime code (caml_thread_reinitialize) to delete the stacks of the other threads which no longer exist: http://caml.inria.fr/mantis/view.php?id=4577 This runtime code segfaults when traversing the list. Version-Release number of selected component (if applicable): 1.19.16 How reproducible: Frequent Steps to Reproduce: 1. ulimit -Hc unlimited 2. ulimit -Sc unlimited 3. Repeatedly call: make -C ocaml check Actual results: You'll see core dumps in the ocaml/ directory.
I pushed this hack which appears to work around the problem: https://github.com/libguestfs/libguestfs/commit/ad7c4498f66f37c4219242c6df04d28e9ee7877f
The workaround doesn't cure the problem, so I have reverted it.
(In reply to comment #2) > The workaround doesn't cure the problem, so I have reverted it. Ignore that. I saw what seems to be the same bug, affecting a different piece of code. Perhaps adding Gc.compact could work around this place too? #0 0x00000000004abcc8 in caml_thread_reinitialize () #1 0x0000003fc26bab9e in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:188 #2 0x0000003fc266cf34 in _IO_new_proc_open (fp=fp@entry=0x7f7868000f30, command=command@entry=0x7f787a7fa2c0 "LC_ALL=C '/bin/qemu-kvm' -nographic -\ version 2>/dev/null", mode=<optimized out>, mode@entry=0x7f7881ecd090 "r") at iopopen.c:187 #3 0x0000003fc266d1c7 in _IO_new_popen ( command=0x7f787a7fa2c0 "LC_ALL=C '/bin/qemu-kvm' -nographic -version 2>/dev\ /null", mode=0x7f7881ecd090 "r") at iopopen.c:308 #4 0x00007f7881eab88d in test_qemu_cmd (g=g@entry=0x7f78680008c0, cmd=cmd@entry=0x7f787a7fa2c0 "LC_ALL=C '/bin/qemu-kvm' -nographic -version \ 2>/dev/null", ret=ret@entry=0x7f7868000910) at launch.c:1428 #5 0x00007f7881eaba04 in test_qemu (g=0x7f78680008c0) at launch.c:1410 #6 0x00007f7881eabaea in qemu_supports (g=g@entry=0x7f78680008c0, option=option@entry=0x0) at launch.c:1479 #7 0x00007f7881eac921 in launch_appliance (g=g@entry=0x7f78680008c0) at launch.c:586 #8 0x00007f7881ead991 in guestfs__launch (g=g@entry=0x7f78680008c0) at launch.c:530 #9 0x00007f7881e4fda8 in guestfs_launch (g=g@entry=0x7f78680008c0) at actions.c:1119 #10 0x0000000000496ad8 in ocaml_guestfs_launch (gv=578721382704613384) at guestfs_c_actions.c:7544 #11 0x00000000004c3f9a in caml_c_call () #12 0x0000000000000001 in ?? ()
I suspect this is a bug in OCaml itself, specifically in the rather hairy fork handling (see http://caml.inria.fr/mantis/view.php?id=4577). In any case, we no longer do a multithreaded test of mount-local in OCaml. It was rewritten in C. So this bug doesn't apply to libguestfs any longer.