Description of problem: I was trying to resize a disk image containing a Windows XP guest (single NTFS partition under MBR, aligned only at 512 bytes), and expand it from 30GB to 40GiB in the process. Both the original and the destination reside on a remote NFS mount. The process started just fine, and estimated about 4 hours to completion; about three hours in, the process died with stack overflow. I was able to reproduce the crash (slightly different percent complete on retry). Version-Release number of selected component (if applicable): libguestfs-1.20.1-3.fc18.x86_64 ocaml-4.00.1-1.fc18.x86_64 How reproducible: 100% Steps to Reproduce: 1. qemu-img info lounge_c.qcow2 2. virt-filesystems -a lounge_c.qcow2 -l -h --parts --blkdevs 3. virt-resize --align-first auto --alignment 2048 --expand /dev/sda1 lounge_c.qcow2 windows_xp_c.img Actual results: 1. image: lounge_c.qcow2 file format: qcow2 virtual size: 28G (30000000000 bytes) disk size: 17G cluster_size: 65536 backing file: lounge_c.img backing file format: raw Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 sp3 0 2013-01-26 05:33:20 00:00:00.000 2 ie8 0 2013-01-26 11:02:16 00:00:00.000 2. Name Type MBR Size Parent /dev/sda1 partition 07 28G /dev/sda /dev/sda device - 28G - 3. Examining lounge_c.qcow2 ... 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00 ********** Summary of changes: /dev/sda1: This partition will be resized from 27.9G to 40.0G. The filesystem ntfs on /dev/sda1 will be expanded using the 'ntfsresize' method. ********** Setting up initial partition table on windows_xp_c.img ... Copying /dev/sda1 ... ◐ 77% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒═══════════════⟧ 54:12 libguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowSegmentation fault (core dumped) Expected results: No core dump, successful resize instead of dying 3 hours in Additional info:
This was with fedora-virt-preview installed, so it was with: libvirt-1.0.2-2.fc18.x86_64.rpm
This is bad! libguestfs: uncaught OCaml exception in event callback: Stack_overflow I will leave this program running overnight: $ cat test.ml let () = let g = new Guestfs.guestfs () in g#add_drive_ro "/dev/null"; g#launch (); let cb _ _ _ _ _ = Printf.printf "hello!\n" in ignore (g#set_event_callback cb [Guestfs.EVENT_PROGRESS]); ignore (g#debug "progress" [| "1000000" |]); g#close (); (* Try to force memory errors by running the GC. *) Gc.compact () $ ocamlfind ocamlopt -package guestfs -linkpkg test.ml -o test
The core file is over 26M, and over 1M when xz-compressed, so I'm having a hard time attaching it. But from the core file: Program terminated with signal 11, Segmentation fault. #0 _IO_vfprintf_internal (s=0x7fff9d9a3420, format=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=0x7fff9d9a5b98) at vfprintf.c:237 237 int save_errno = errno; Thread 1 (Thread 0x7f353aa9f840 (LWP 15709)): #0 _IO_vfprintf_internal (s=0x7fff9d9a3420, format=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=0x7fff9d9a5b98) at vfprintf.c:237 #1 0x00000032f964babf in buffered_vfprintf ( s=s@entry=0x32f99b21a0 <_IO_2_1_stderr_>, format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", args=args@entry=0x7fff9d9a5b98) at vfprintf.c:2299 #2 0x00000032f9646c1e in _IO_vfprintf_internal ( s=s@entry=0x32f99b21a0 <_IO_2_1_stderr_>, format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", ap=ap@entry=0x7fff9d9a5b98) at vfprintf.c:1269 #3 0x00000032f97081fe in ___fprintf_chk (fp=0x32f99b21a0 <_IO_2_1_stderr_>, flag=flag@entry=1, format=format@entry=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s") at fprintf_chk.c:36 #4 0x000000000051b314 in fprintf ( __fmt=0x566238 "libguestfs: uncaught OCaml exception in event callback: %s", __stream=<optimized out>) at /usr/include/bits/stdio2.h:97 #5 event_callback_wrapper_locked (array_len=4, array=0x7fff9d9a5e20, buf_len=0, buf=<optimized out>, event_handle=<optimized out>, event=<optimized out>, data=0x202f6d0, g=<optimized out>, ---Type <return> to continue, or q <return> to quit--- flags=<optimized out>) at guestfs-c.c:379 #6 event_callback_wrapper (g=<optimized out>, data=0x202f6d0, event=<optimized out>, event_handle=<optimized out>, flags=<optimized out>, buf=<optimized out>, buf_len=0, array=0x7fff9d9a5e20, array_len=4) at guestfs-c.c:400 #7 0x0000003cf9283e1b in guestfs___call_callbacks_array (g=g@entry=0x20417d0, event=event@entry=8, array=array@entry=0x7fff9d9a5e20, array_len=array_len@entry=4) at events.c:197 #8 0x0000003cf929c3c6 in guestfs___progress_message_callback ( g=g@entry=0x20417d0, message=message@entry=0x7fff9d9a5e60) at proto.c:303 #9 0x0000003cf929d23c in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:657 #10 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663 #11 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663 #12 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663 ... #13073 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663 #13074 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663 #13075 0x0000003cf929d260 in guestfs___recv_from_daemon (g=g@entry=0x20417d0, size_rtn=size_rtn@entry=0x7fff9e19f9d4, buf_rtn=buf_rtn@entry=0x7fff9e19f9d8) at proto.c:663 #13076 0x0000003cf929dc6a in guestfs___recv (g=g@entry=0x20417d0, fn=fn@entry=0x3cf92b8078 "copy_device_to_device", hdr=hdr@entry=0x7fff9e19fab0, err=err@entry=0x7fff9e19fa50, xdrp=xdrp@entry=0x0, ret=ret@entry=0x0) at proto.c:986 #13077 0x0000003cf922e9eb in guestfs_copy_device_to_device_argv ( g=g@entry=0x20417d0, src=src@entry=0x2034f30 "/dev/sda1", ---Type <return> to continue, or q <return> to quit--- dest=dest@entry=0x2036480 "/dev/sdb1", optargs=<optimized out>, optargs@entry=0x7fff9e19fb60) at actions-1.c:5280 #13078 0x00000000005223ff in ocaml_guestfs_copy_device_to_device ( gv=140735837516832, srcoffsetv=5661240, destoffsetv=140735837526936, sizev=33771424, srcv=11, destv=218933494512) at guestfs-c-actions.c:3129 #13079 0x00000000004c6c0d in camlGuestfs__fun_23824 () #13080 0x0000000000500671 in camlList__iter_1061 () #13081 0x00000000004ba596 in camlResize__entry () at resize.ml:1013 #13082 0x000000000044ff39 in caml_program () #13083 0x0000000000564fe6 in caml_start_program () #13084 0x0000000000552089 in caml_main () #13085 0x000000000044fb6c in main ()
> <a big number> 0x0000003cf929d260 in guestfs___recv_from_daemon That's definitely not supposed to happen. Thanks for the stack trace.
BTW my simple test reproducer (comment 2) eventually dumped core: helibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowlibguestfs: uncaught OCaml exception in event callback: Stack_overflowSegmentation fault (core dumped) although I can't exactly see where abrt hid the core dump ...
This is fixed upstream in: https://github.com/libguestfs/libguestfs/commit/b3cf5d1d9638313a656ee61f1bd5f06bbca21abf https://github.com/libguestfs/libguestfs/commit/05444da98351c9269fc8c6278f8bb5c5cfd4bb83 I will backport these fixes to F18 and RHEL 7.
Is it worth using 'ulimit -s' to reduce the stack size and make the added test case trigger with less testing time? After all, I was able to use ulimit to increase stack size and avoid the overflow in my particular case, without waiting for the new upstream build.
I thought about that, but it's sort of hard to choose a good number (particularly since we one day hope to get libguestfs working again on ppc64 and other odd architectures). The 1000000 event test works, albeit slowly.