Bug 909624 - Unexpected non-tail recursion in recv_from_daemon results in stack overflow in very long-running API calls that send progress messages
Summary: Unexpected non-tail recursion in recv_from_daemon results in stack overflow i...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 909666 909667
TreeView+ depends on / blocked
 
Reported: 2013-02-09 22:19 UTC by Eric Blake
Modified: 2013-02-12 09:09 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
: 909666 909667 (view as bug list)
Environment:
Last Closed: 2013-02-11 19:06:09 UTC
Embargoed:


Attachments (Terms of Use)

Description Eric Blake 2013-02-09 22:19:25 UTC
Description of problem:
I was trying to resize a disk image containing a Windows XP guest (single NTFS partition under MBR, aligned only at 512 bytes), and expand it from 30GB to 40GiB in the process.  Both the original and the destination reside on a remote NFS mount.  The process started just fine, and estimated about 4 hours to completion; about three hours in, the process died with stack overflow.  I was able to reproduce the crash (slightly different percent complete on retry).

Version-Release number of selected component (if applicable):
libguestfs-1.20.1-3.fc18.x86_64
ocaml-4.00.1-1.fc18.x86_64

How reproducible:
100%

Steps to Reproduce:
1. qemu-img info lounge_c.qcow2
2. virt-filesystems -a lounge_c.qcow2 -l -h --parts --blkdevs
3. virt-resize --align-first auto --alignment 2048 --expand /dev/sda1 lounge_c.qcow2 windows_xp_c.img 
  
Actual results:
1. 
image: lounge_c.qcow2
file format: qcow2
virtual size: 28G (30000000000 bytes)
disk size: 17G
cluster_size: 65536
backing file: lounge_c.img
backing file format: raw
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         sp3                       0 2013-01-26 05:33:20   00:00:00.000
2         ie8                       0 2013-01-26 11:02:16   00:00:00.000

2.
Name       Type       MBR  Size  Parent
/dev/sda1  partition  07   28G   /dev/sda
/dev/sda   device     -    28G   -

3.
Examining lounge_c.qcow2 ...
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
**********

Summary of changes:

/dev/sda1: This partition will be resized from 27.9G to 40.0G.  The 
    filesystem ntfs on /dev/sda1 will be expanded using the 
    'ntfsresize' method.

**********
Setting up initial partition table on windows_xp_c.img ...
Copying /dev/sda1 ...
◐ 77% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒═══════════════⟧ 54:12
libguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowSegmentation fault (core dumped)

Expected results:
No core dump, successful resize instead of dying 3 hours in

Additional info:

Comment 1 Eric Blake 2013-02-09 22:22:43 UTC
This was with fedora-virt-preview installed, so it was with:
libvirt-1.0.2-2.fc18.x86_64.rpm

Comment 2 Richard W.M. Jones 2013-02-09 22:32:02 UTC
This is bad!

libguestfs: uncaught OCaml exception in event callback: Stack_overflow

I will leave this program running overnight:

$ cat test.ml 
let () =
  let g = new Guestfs.guestfs () in
  g#add_drive_ro "/dev/null";
  g#launch ();

  let cb _ _ _ _ _ = Printf.printf "hello!\n" in
  ignore (g#set_event_callback cb [Guestfs.EVENT_PROGRESS]);

  ignore (g#debug "progress" [| "1000000" |]);

  g#close ();

  (* Try to force memory errors by running the GC. *)
  Gc.compact ()

$ ocamlfind ocamlopt -package guestfs -linkpkg test.ml -o test

Comment 3 Eric Blake 2013-02-09 22:39:12 UTC
The core file is over 26M, and over 1M when xz-compressed, so I'm having a hard time attaching it.  But from the core file:

Program terminated with signal 11, Segmentation fault.
#0  _IO_vfprintf_internal (s=0x7fff9d9a3420, 
    format=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=0x7fff9d9a5b98) at vfprintf.c:237
237	  int save_errno = errno;

Thread 1 (Thread 0x7f353aa9f840 (LWP 15709)):
#0  _IO_vfprintf_internal (s=0x7fff9d9a3420, 
    format=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=0x7fff9d9a5b98) at vfprintf.c:237
#1  0x00000032f964babf in buffered_vfprintf (
    s=s@entry=0x32f99b21a0 <_IO_2_1_stderr_>, 
    format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", args=args@entry=0x7fff9d9a5b98) at vfprintf.c:2299
#2  0x00000032f9646c1e in _IO_vfprintf_internal (
    s=s@entry=0x32f99b21a0 <_IO_2_1_stderr_>, 
    format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=ap@entry=0x7fff9d9a5b98) at vfprintf.c:1269
#3  0x00000032f97081fe in ___fprintf_chk (fp=0x32f99b21a0 <_IO_2_1_stderr_>, 
    flag=flag@entry=1, 
    format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s") at fprintf_chk.c:36
#4  0x000000000051b314 in fprintf (
    __fmt=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", __stream=<optimized out>) at /usr/include/bits/stdio2.h:97
#5  event_callback_wrapper_locked (array_len=4, array=0x7fff9d9a5e20, 
    buf_len=0, buf=<optimized out>, event_handle=<optimized out>, 
    event=<optimized out>, data=0x202f6d0, g=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    flags=<optimized out>) at guestfs-c.c:379
#6  event_callback_wrapper (g=<optimized out>, data=0x202f6d0, 
    event=<optimized out>, event_handle=<optimized out>, 
    flags=<optimized out>, buf=<optimized out>, buf_len=0, 
    array=0x7fff9d9a5e20, array_len=4) at guestfs-c.c:400
#7  0x0000003cf9283e1b in guestfs___call_callbacks_array (g=g@entry=0x20417d0, 
    event=event@entry=8, array=array@entry=0x7fff9d9a5e20, 
    array_len=array_len@entry=4) at events.c:197
#8  0x0000003cf929c3c6 in guestfs___progress_message_callback (
    g=g@entry=0x20417d0, message=message@entry=0x7fff9d9a5e60) at proto.c:303
#9  0x0000003cf929d23c in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:657
#10 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
#11 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
#12 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
...
#13073 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
#13074 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663

#13075 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663

#13076 0x0000003cf929dc6a in guestfs___recv (g=g@entry=0x20417d0, 
    fn=fn@entry=0x3cf92b8078 "copy_device_to_device", 
    hdr=hdr@entry=0x7fff9e19fab0, err=err@entry=0x7fff9e19fa50, 
    xdrp=xdrp@entry=0x0, ret=ret@entry=0x0) at proto.c:986
#13077 0x0000003cf922e9eb in guestfs_copy_device_to_device_argv (
    g=g@entry=0x20417d0, src=src@entry=0x2034f30 "/dev/sda1", 
---Type <return> to continue, or q <return> to quit---
    dest=dest@entry=0x2036480 "/dev/sdb1", optargs=<optimized out>, 
    optargs@entry=0x7fff9e19fb60) at actions-1.c:5280

#13078 0x00000000005223ff in ocaml_guestfs_copy_device_to_device (
    gv=140735837516832, srcoffsetv=5661240, destoffsetv=140735837526936, 

    sizev=33771424, srcv=11, destv=218933494512) at guestfs-c-actions.c:3129
#13079 0x00000000004c6c0d in camlGuestfs__fun_23824 ()
#13080 0x0000000000500671 in camlList__iter_1061 ()
#13081 0x00000000004ba596 in camlResize__entry () at resize.ml:1013
#13082 0x000000000044ff39 in caml_program ()
#13083 0x0000000000564fe6 in caml_start_program ()
#13084 0x0000000000552089 in caml_main ()
#13085 0x000000000044fb6c in main ()

Comment 4 Richard W.M. Jones 2013-02-09 22:41:08 UTC
> <a big number> 0x0000003cf929d260 in guestfs___recv_from_daemon

That's definitely not supposed to happen.  Thanks for the stack trace.

Comment 5 Richard W.M. Jones 2013-02-10 09:27:27 UTC
BTW my simple test reproducer (comment 2) eventually dumped
core:

helibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowSegmentation fault (core dumped)

although I can't exactly see where abrt hid the core dump ...

Comment 7 Eric Blake 2013-02-11 20:22:26 UTC
Is it worth using 'ulimit -s' to reduce the stack size and make the added test case trigger with less testing time?  After all, I was able to use ulimit to increase stack size and avoid the overflow in my particular case, without waiting for the new upstream build.

Comment 8 Richard W.M. Jones 2013-02-12 09:09:07 UTC
I thought about that, but it's sort of hard to choose a good
number (particularly since we one day hope to get libguestfs
working again on ppc64 and other odd architectures).

The 1000000 event test works, albeit slowly.


Note You need to log in before you can comment on or make changes to this bug.