Bug 909666 - Unexpected non-tail recursion in recv_from_daemon results in stack overflow in very long-running API calls that send progress messages
Unexpected non-tail recursion in recv_from_daemon results in stack overflow i...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libguestfs (Show other bugs)
6.5
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Richard W.M. Jones
Virtualization Bugs
:
Depends On: 909624 958183
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-10 04:28 EST by Richard W.M. Jones
Modified: 2013-12-25 19:15 EST (History)
10 users (show)

See Also:
Fixed In Version: libguestfs-1.20.9-5.el6
Doc Type: Bug Fix
Doc Text:
Cause: Long-running libguestfs APIs which send progress messages (eg. very large resize operations). Consequence: The libguestfs stack would overflow resulting in a crash. Fix: The code has been rewritten to make it tail-recursive, avoiding the stack overflow. Result: Long running operations now work.
Story Points: ---
Clone Of: 909624
Environment:
Last Closed: 2013-11-20 23:42:52 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2013-02-10 04:28:29 EST
+++ This bug was initially created as a clone of Bug #909624 +++

Description of problem:
I was trying to resize a disk image containing a Windows XP guest (single NTFS partition under MBR, aligned only at 512 bytes), and expand it from 30GB to 40GiB in the process.  Both the original and the destination reside on a remote NFS mount.  The process started just fine, and estimated about 4 hours to completion; about three hours in, the process died with stack overflow.  I was able to reproduce the crash (slightly different percent complete on retry).

Version-Release number of selected component (if applicable):
libguestfs-1.20.1-3.fc18.x86_64
ocaml-4.00.1-1.fc18.x86_64

How reproducible:
100%

Steps to Reproduce:
1. qemu-img info lounge_c.qcow2
2. virt-filesystems -a lounge_c.qcow2 -l -h --parts --blkdevs
3. virt-resize --align-first auto --alignment 2048 --expand /dev/sda1 lounge_c.qcow2 windows_xp_c.img 
  
Actual results:
1. 
image: lounge_c.qcow2
file format: qcow2
virtual size: 28G (30000000000 bytes)
disk size: 17G
cluster_size: 65536
backing file: lounge_c.img
backing file format: raw
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         sp3                       0 2013-01-26 05:33:20   00:00:00.000
2         ie8                       0 2013-01-26 11:02:16   00:00:00.000

2.
Name       Type       MBR  Size  Parent
/dev/sda1  partition  07   28G   /dev/sda
/dev/sda   device     -    28G   -

3.
Examining lounge_c.qcow2 ...
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
**********

Summary of changes:

/dev/sda1: This partition will be resized from 27.9G to 40.0G.  The 
    filesystem ntfs on /dev/sda1 will be expanded using the 
    'ntfsresize' method.

**********
Setting up initial partition table on windows_xp_c.img ...
Copying /dev/sda1 ...
◐ 77% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒═══════════════⟧ 54:12
libguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowSegmentation fault (core dumped)

Expected results:
No core dump, successful resize instead of dying 3 hours in

Additional info:

--- Additional comment from Eric Blake on 2013-02-09 17:22:43 EST ---

This was with fedora-virt-preview installed, so it was with:
libvirt-1.0.2-2.fc18.x86_64.rpm

--- Additional comment from Richard W.M. Jones on 2013-02-09 17:32:02 EST ---

This is bad!

libguestfs: uncaught OCaml exception in event callback: Stack_overflow

I will leave this program running overnight:

$ cat test.ml 
let () =
  let g = new Guestfs.guestfs () in
  g#add_drive_ro "/dev/null";
  g#launch ();

  let cb _ _ _ _ _ = Printf.printf "hello!\n" in
  ignore (g#set_event_callback cb [Guestfs.EVENT_PROGRESS]);

  ignore (g#debug "progress" [| "1000000" |]);

  g#close ();

  (* Try to force memory errors by running the GC. *)
  Gc.compact ()

$ ocamlfind ocamlopt -package guestfs -linkpkg test.ml -o test

--- Additional comment from Eric Blake on 2013-02-09 17:39:12 EST ---

The core file is over 26M, and over 1M when xz-compressed, so I'm having a hard time attaching it.  But from the core file:

Program terminated with signal 11, Segmentation fault.
#0  _IO_vfprintf_internal (s=0x7fff9d9a3420, 
    format=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=0x7fff9d9a5b98) at vfprintf.c:237
237	  int save_errno = errno;

Thread 1 (Thread 0x7f353aa9f840 (LWP 15709)):
#0  _IO_vfprintf_internal (s=0x7fff9d9a3420, 
    format=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=0x7fff9d9a5b98) at vfprintf.c:237
#1  0x00000032f964babf in buffered_vfprintf (
    s=s@entry=0x32f99b21a0 <_IO_2_1_stderr_>, 
    format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", args=args@entry=0x7fff9d9a5b98) at vfprintf.c:2299
#2  0x00000032f9646c1e in _IO_vfprintf_internal (
    s=s@entry=0x32f99b21a0 <_IO_2_1_stderr_>, 
    format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=ap@entry=0x7fff9d9a5b98) at vfprintf.c:1269
#3  0x00000032f97081fe in ___fprintf_chk (fp=0x32f99b21a0 <_IO_2_1_stderr_>, 
    flag=flag@entry=1, 
    format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s") at fprintf_chk.c:36
#4  0x000000000051b314 in fprintf (
    __fmt=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", __stream=<optimized out>) at /usr/include/bits/stdio2.h:97
#5  event_callback_wrapper_locked (array_len=4, array=0x7fff9d9a5e20, 
    buf_len=0, buf=<optimized out>, event_handle=<optimized out>, 
    event=<optimized out>, data=0x202f6d0, g=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    flags=<optimized out>) at guestfs-c.c:379
#6  event_callback_wrapper (g=<optimized out>, data=0x202f6d0, 
    event=<optimized out>, event_handle=<optimized out>, 
    flags=<optimized out>, buf=<optimized out>, buf_len=0, 
    array=0x7fff9d9a5e20, array_len=4) at guestfs-c.c:400
#7  0x0000003cf9283e1b in guestfs___call_callbacks_array (g=g@entry=0x20417d0, 
    event=event@entry=8, array=array@entry=0x7fff9d9a5e20, 
    array_len=array_len@entry=4) at events.c:197
#8  0x0000003cf929c3c6 in guestfs___progress_message_callback (
    g=g@entry=0x20417d0, message=message@entry=0x7fff9d9a5e60) at proto.c:303
#9  0x0000003cf929d23c in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:657
#10 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
#11 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
#12 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
...
#13073 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663
#13074 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663

#13075 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, 
    size_rtn=size_rtn@entry=0x7fff9e19f9d4, 
    buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663

#13076 0x0000003cf929dc6a in guestfs___recv (g=g@entry=0x20417d0, 
    fn=fn@entry=0x3cf92b8078 "copy_device_to_device", 
    hdr=hdr@entry=0x7fff9e19fab0, err=err@entry=0x7fff9e19fa50, 
    xdrp=xdrp@entry=0x0, ret=ret@entry=0x0) at proto.c:986
#13077 0x0000003cf922e9eb in guestfs_copy_device_to_device_argv (
    g=g@entry=0x20417d0, src=src@entry=0x2034f30 "/dev/sda1", 
---Type <return> to continue, or q <return> to quit---
    dest=dest@entry=0x2036480 "/dev/sdb1", optargs=<optimized out>, 
    optargs@entry=0x7fff9e19fb60) at actions-1.c:5280

#13078 0x00000000005223ff in ocaml_guestfs_copy_device_to_device (
    gv=140735837516832, srcoffsetv=5661240, destoffsetv=140735837526936, 

    sizev=33771424, srcv=11, destv=218933494512) at guestfs-c-actions.c:3129
#13079 0x00000000004c6c0d in camlGuestfs__fun_23824 ()
#13080 0x0000000000500671 in camlList__iter_1061 ()
#13081 0x00000000004ba596 in camlResize__entry () at resize.ml:1013
#13082 0x000000000044ff39 in caml_program ()
#13083 0x0000000000564fe6 in caml_start_program ()
#13084 0x0000000000552089 in caml_main ()
#13085 0x000000000044fb6c in main ()

--- Additional comment from Richard W.M. Jones on 2013-02-09 17:41:08 EST ---

> <a big number> 0x0000003cf929d260 in guestfs___recv_from_daemon

That's definitely not supposed to happen.  Thanks for the stack trace.

--- Additional comment from Richard W.M. Jones on 2013-02-10 04:27:27 EST ---

BTW my simple test reproducer (comment 2) eventually dumped
core:

helibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowSegmentation fault (core dumped)

although I can't exactly see where abrt hid the core dump ...
Comment 2 RHEL Product and Program Management 2013-02-14 01:48:59 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 4 Richard W.M. Jones 2013-04-11 13:54:42 EDT
Fix included in upstream libguestfs 1.16.35.
Comment 5 Richard W.M. Jones 2013-06-28 07:04:21 EDT
Fixed by the rebase (bug 958183).
Comment 10 Can Zhang 2013-10-23 21:53:17 EDT
I reproduced and verified this bug using the `test.ml` program. Not sure if this is acceptable so I set "need info".

Reproduce:
1. Install libguestfs-1.16.34-2.el6 and compile `test.ml` against its libraries.
2. Run `test.ml` for nearly a day and it finally crashes:
hello!
hello!
hello!
hello!
hello!
hello!
hello!
hello!
hello!
hello!
helSegmentation fault (core dumped)

Verify:
1. Install the latest version from rhel6.5 repo and compile `test.ml` again.
2. Run `test.ml` for nearly 3 days and it doesn't crash.
Comment 11 Richard W.M. Jones 2013-10-28 09:25:55 EDT
(In reply to Can Zhang from comment #10)
> I reproduced and verified this bug using the `test.ml` program. Not sure if
> this is acceptable so I set "need info".
> 
> Reproduce:
> 1. Install libguestfs-1.16.34-2.el6 and compile `test.ml` against its
> libraries.
> 2. Run `test.ml` for nearly a day and it finally crashes:
> hello!
> hello!
> hello!
> hello!
> hello!
> hello!
> hello!
> hello!
> hello!
> hello!
> helSegmentation fault (core dumped)
> 
> Verify:
> 1. Install the latest version from rhel6.5 repo and compile `test.ml` again.
> 2. Run `test.ml` for nearly 3 days and it doesn't crash.

I'm slightly surprised it took so long to
get the core dump, but the verification seems
fine to me.
Comment 12 bfan 2013-10-30 02:50:25 EDT
Change the status to verified according to #C10 and #C11
Comment 14 errata-xmlrpc 2013-11-20 23:42:52 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1536.html

Note You need to log in before you can comment on or make changes to this bug.