Created attachment 857793 [details] Complete stdout of libguestfs-test-tool Description of problem ---------------------- Running libguestfs-test-tool ( with 'direct' backend) on a machine with btrfs just hangs. # NOTE: Full stdout, refer attachment $ libguestfs-test-tool [. . .] guestfsd: main_loop: new request, len 0x34 guestfsd: main_loop: proc 3 (touch) took 0.06 seconds guestfsd: main_loop: new request, len 0x28 umount /sysroot fsync /dev/sda libguestfs: sending SIGTERM to process 48146 Contextual `strace` of qemu-system-x86 further below. (Full strace is 37MB) Version ------- $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm \ libguestfs btrfs-progs 3.14.0-0.rc0.git12.1.fc21.x86_64 qemu-system-x86-1.7.0-4.fc21.x86_64 libvirt-daemon-kvm-1.2.1-2.fc21.x86_64 libguestfs-1.25.29-1.fc21.x86_64 btrfs-progs-0.20.rc1.20131114git9f0c53f-1.fc20.x86_64 How reproducible: Consistently Steps to reproduce ------------------ - Make a qumu wrapper script to strace qemu process, set libguestfs backend as 'direct' (use QEMU) - Run libguestfs-test-tool with the above qemu wrapper to capture strace results of QEMU binary: $ cat qemu-strace.sh #!/bin/bash - exec strace -o /tmp/qemu-strace.log -f -s 1024 /usr/bin/qemu-system-x86_64 "$@" $ chmod +x qemu-strace.sh $ export LIBGUESTFS_QEMU=/root/qemu-strace.sh $ export LIBGUESTFS_BACKEND=direct $ guestfish get_backend direct $ libguestfs-test-tool Actual results -------------- libguestfs-test-tool Expected results ---------------- Appliance creation should be successful on a Btrfs machine. Additional info --------------- Contextual `strace` of qemu-system-x86 =========== [. . .] 48170 +++ exited with 0 +++ 48168 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 48168 madvise(0x7f7c14ffa000, 8368128, MADV_DONTNEED) = 0 48168 munmap(0x7f7c15ffc000, 8392704) = 0 48168 _exit(0) = ? 48168 +++ exited with 0 +++ 48161 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 48161 madvise(0x7f7c485fc000, 8368128, MADV_DONTNEED) = 0 48161 munmap(0x7f7c147f9000, 8392704) = 0 48161 _exit(0) = ? 48161 +++ exited with 0 +++ 48165 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 48165 madvise(0x7f7c167fd000, 8368128, MADV_DONTNEED) = 0 48165 munmap(0x7f7c243ff000, 8392704) = 0 48165 _exit(0) = ? 48165 +++ exited with 0 +++ 48154 <... ioctl resumed> , 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN, 0) = 0 48154 ioctl(13, KVM_RUN <unfinished ...> 48152 <... ppoll resumed> ) = ? ERESTARTNOHAND (To be restarted if no handler) 48152 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- 48152 ppoll([{fd=80, events=POLLIN|POLLERR|POLLHUP}, {fd=79, events=POLLIN|POLLERR|POLLHUP}, {fd=78, events=POLLIN|POLLERR|POLLHUP}, {fd=77, events=POLLIN|POLLERR|POLLHUP}, {fd=76, events=POLLIN|POLLERR|POLLHUP}, {fd=75, events=POLLIN|POLLERR|POLLHUP}, {fd=74, events=POLLIN|POLLERR|POLLHUP}, {fd=73, events=POLLIN|POLLERR|POLLHUP}, {fd=72, events=POLLIN|POLLERR|POLLHUP}, {fd=71, events=POLLIN|POLLERR|POLLHUP}, {fd=70, events=POLLIN|POLLERR|POLLHUP}, {fd=69, events=POLLIN|POLLERR|POLLHUP}, {fd=68, events=POLLIN|POLLERR|POLLHUP}, {fd=67, events=POLLIN|POLLERR|POLLHUP}, {fd=66, events=POLLIN|POLLERR|POLLHUP}, {fd=65, events=POLLIN|POLLERR|POLLHUP}, {fd=64, events=POLLIN|POLLERR|POLLHUP}, {fd=63, events=POLLIN|POLLERR|POLLHUP}, {fd=62, events=POLLIN|POLLERR|POLLHUP}, {fd=61, events=POLLIN|POLLERR|POLLHUP}, {fd=60, events=POLLIN|POLLERR|POLLHUP}, {fd=59, events=POLLIN|POLLERR|POLLHUP}, {fd=58, events=POLLIN|POLLERR|POLLHUP}, {fd=57, events=POLLIN|POLLERR|POLLHUP}, {fd=56, events=POLLIN|POLLERR|POLLHUP}, {fd=55, events=POLLIN|POLLERR|POLLHUP}, {fd=54, events=POLLIN|POLLERR|POLLHUP}, {fd=53, events=POLLIN|POLLERR|POLLHUP}, {fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=51, events=POLLIN|POLLERR|POLLHUP}, {fd=50, events=POLLIN|POLLERR|POLLHUP}, {fd=49, events=POLLIN|POLLERR|POLLHUP}, {fd=48, events=POLLIN|POLLERR|POLLHUP}, {fd=47, events=POLLIN|POLLERR|POLLHUP}, {fd=46, events=POLLIN|POLLERR|POLLHUP}, {fd=45, events=POLLIN|POLLERR|POLLHUP}, {fd=44, events=POLLIN|POLLERR|POLLHUP}, {fd=43, events=POLLIN|POLLERR|POLLHUP}, {fd=42, events=POLLIN|POLLERR|POLLHUP}, {fd=41, events=POLLIN|POLLERR|POLLHUP}, {fd=40, events=POLLIN|POLLERR|POLLHUP}, {fd=39, events=POLLIN|POLLERR|POLLHUP}, {fd=38, events=POLLIN|POLLERR|POLLHUP}, {fd=37, events=POLLIN|POLLERR|POLLHUP}, {fd=36, events=POLLIN|POLLERR|POLLHUP}, {fd=35, events=POLLIN|POLLERR|POLLHUP}, {fd=34, events=POLLIN|POLLERR|POLLHUP}, {fd=33, events=POLLIN|POLLERR|POLLHUP}, {fd=32, events=POLLIN|POLLERR|POLLHUP}, {fd=31, events=POLLIN|POLLERR|POLLHUP}, {fd=30, events=POLLIN|POLLERR|POLLHUP}, {fd=29, events=POLLIN|POLLERR|POLLHUP}, {fd=28, events=POLLIN|POLLERR|POLLHUP}, {fd=27, events=POLLIN|POLLERR|POLLHUP}, {fd=26, events=POLLIN|POLLERR|POLLHUP}, {fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=24, events=POLLIN|POLLERR|POLLHUP}, {fd=23, events=POLLIN|POLLERR|POLLHUP}, {fd=22, events=POLLIN|POLLERR|POLLHUP}, {fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=20, events=POLLIN|POLLERR|POLLHUP}, {fd=19, events=POLLIN|POLLERR|POLLHUP}, {fd=18, events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP}, {fd=16, events=POLLIN|POLLERR|POLLHUP}, {fd=15, events=POLLIN|POLLERR|POLLHUP}, {fd=14, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=5, events=POLLIN}, {fd=0, events=POLLIN}, {fd=6, events=POLLIN}, {fd=11, events=POLLIN}, {fd=4, events=POLLIN}], 73, NULL, NULL, 8 (END) =========== Just for reference, Btrfs partitioning scheme on the host: $ df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/sda3 btrfs 123G 15G 107G 13% / devtmpfs devtmpfs 32G 0 32G 0% /dev tmpfs tmpfs 32G 0 32G 0% /dev/shm tmpfs tmpfs 32G 1.4M 32G 1% /run tmpfs tmpfs 32G 0 32G 0% /sys/fs/cgroup tmpfs tmpfs 32G 42M 32G 1% /tmp /dev/sda3 btrfs 123G 15G 107G 13% /home
(In reply to Kashyap Chamarthy from comment #0) > Created attachment 857793 [details] > Complete stdout of libguestfs-test-tool > > Description of problem > ---------------------- > > Running libguestfs-test-tool ( with 'direct' backend) on a machine with > btrfs just hangs. > > # NOTE: Full stdout, refer attachment > > $ libguestfs-test-tool > [. . .] > guestfsd: main_loop: new request, len 0x34 > guestfsd: main_loop: proc 3 (touch) took 0.06 seconds > guestfsd: main_loop: new request, len 0x28 > umount /sysroot > fsync /dev/sda > libguestfs: sending SIGTERM to process 48146 > > Contextual `strace` of qemu-system-x86 further below. (Full strace is > 37MB) > > > Version > ------- > > $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm \ > libguestfs btrfs-progs > 3.14.0-0.rc0.git12.1.fc21.x86_64 > qemu-system-x86-1.7.0-4.fc21.x86_64 > libvirt-daemon-kvm-1.2.1-2.fc21.x86_64 > libguestfs-1.25.29-1.fc21.x86_64 > btrfs-progs-0.20.rc1.20131114git9f0c53f-1.fc20.x86_64 > > > How reproducible: Consistently > > > Steps to reproduce > ------------------ > > - Make a qumu wrapper script to strace qemu process, set libguestfs > backend as 'direct' (use QEMU) > > - Run libguestfs-test-tool with the above qemu wrapper to capture strace > results of QEMU binary: > > $ cat qemu-strace.sh > #!/bin/bash - > exec strace -o /tmp/qemu-strace.log -f -s 1024 > /usr/bin/qemu-system-x86_64 "$@" > > $ chmod +x qemu-strace.sh > > $ export LIBGUESTFS_QEMU=/root/qemu-strace.sh > > $ export LIBGUESTFS_BACKEND=direct > > $ guestfish get_backend > direct > > $ libguestfs-test-tool > > > Actual results > -------------- > > libguestfs-test-tool s/libguestfs-test-tool/Appliance creation hangs as noted in the attachment https://bugzilla.redhat.com/attachment.cgi?id=857793
Complete `strace`: http://kashyapc.fedorapeople.org/temp/qemu-strace.log
I suspect this isn't btrfs, but just a general hang. I've seen Rawhide hanging quite a lot when running the tests, but it is frustratingly unreproducible. It indicates some kind of kernel/ qemu emulation bug. Firstly, I'm assuming because of previous conversations that this is running on your baremetal 48 core monster machine? So we're not talking about nested KVM. 'clocksource: kvm-clock' which indicates that it is using KVM. Also KVM_RUN ioctls appear in the strace. I notice that you've set the backend to 'direct'. Does it happen with the libvirt backend?
Also you could try: export LIBGUESTFS_BACKEND_SETTINGS=force_tcg (in combination with backend direct & libvirt).
(In reply to Richard W.M. Jones from comment #3) > I suspect this isn't btrfs, but just a general hang. I've seen > Rawhide hanging quite a lot when running the tests, but it is > frustratingly unreproducible. It indicates some kind of kernel/ > qemu emulation bug. I see. Even, I noticed some compiles were hanging in a unreliable way, but I haven't done a systematic analysis. Next time I compile, will try to keep notes. > > Firstly, I'm assuming because of previous conversations that this > is running on your baremetal 48 core monster machine? So we're > not talking about nested KVM. Correct, just bare-metal with: F20, with Kernel, QEMU, Libvirt from Rawhide. > > 'clocksource: kvm-clock' which indicates that it is using KVM. > Also KVM_RUN ioctls appear in the strace. > > I notice that you've set the backend to 'direct'. Does it happen > with the libvirt backend? IIRC, it did hang with libvirt back-end, let me double confirm. Yes, will try the 'force_tcg' back-end setting with both 'direct' and 'libvirt' back-end combinations.
With 'libvirt' backend (and KVM) -------------------------------- $ guestfish get_backend libvirt $ time libguestfs-test-tool |& tee stderr-libvirt-kvm.txt ************************************************************ * IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * ************************************************************ LIBGUESTFS_QEMU=/usr/local/bin/qemu-strace.sh LIBGUESTFS_BACKEND=libvirt PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin SELinux: Permissive guestfs_get_append: (null) guestfs_get_autosync: 1 guestfs_get_backend: libvirt guestfs_get_backend_settings: [] guestfs_get_cachedir: /var/tmp guestfs_get_direct: 0 guestfs_get_hv: /usr/local/bin/qemu-strace.sh guestfs_get_memsize: 500 guestfs_get_network: 0 guestfs_get_path: /usr/lib64/guestfs guestfs_get_pgroup: 0 guestfs_get_program: libguestfs-test-tool guestfs_get_recovery_proc: 1 guestfs_get_selinux: 0 guestfs_get_smp: 1 guestfs_get_tmpdir: /tmp guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 600 seconds. libguestfs: launch: program=libguestfs-test-tool libguestfs: launch: version=1.25.29fedora=21,release=1.fc21,libvirt libguestfs: launch: backend registered: unix libguestfs: launch: backend registered: uml libguestfs: launch: backend registered: libvirt libguestfs: launch: backend registered: direct libguestfs: launch: backend=libvirt libguestfs: launch: tmpdir=/tmp/libguestfsHhzoO1 libguestfs: launch: umask=0022 libguestfs: launch: euid=0 libguestfs: libvirt version = 1002001 (1.2.1) libguestfs: guest random name = guestfs-ehitwgt8t5q9o250 libguestfs: [00000ms] connect to libvirt libguestfs: opening libvirt handle: URI = qemu:///system, auth = default+wrapper, flags = 0 libguestfs: successfully opened libvirt handle: conn = 0x7f7547ee4880 libguestfs: [00015ms] get libvirt capabilities real 10m3.155s user 0m0.020s sys 0m0.322s $ echo $? 0 Note ---- The reason for 10 minutes to run w/ wrapper (from IRC discussion with Rich: From IRC discussion with Rich: I's basically because when you use a qemu wrapper w/ strace libvirt has trouble running qemu in order to query its capabilities.
Created attachment 858116 [details] Complete stderr of libguestfs test tool with direct backend and TCG.txt So, even with setting 'force, /direct/ backend, the appliance seems to be hung just as previously: ------ . . . [ 40.500245] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem [ 40.669298] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null) guestfsd: main_loop: proc 1 (mount) took 0.76 seconds guestfsd: main_loop: new request, len 0x34 guestfsd: main_loop: proc 3 (touch) took 0.02 seconds guestfsd: main_loop: new request, len 0x28 umount /sysroot fsync /dev/sda libguestfs: sending SIGTERM to process 40190 ------
Created attachment 858129 [details] Complete stderr of libguestfs-test-tool with direct backend and TCG Now for real, test with 'direct' backend and force_tcg. With TCG, it's hung here: $ export LIBGUESTFS_BACKEND=direct $ export LIBGUESTFS_BACKEND_SETTINGS=force_tcg $ guestfish get_backend_settings force_tcg $ libguestfs-test-tool |& tee stderr-direct-tcg.txt . . . [ 0.000000] \tRCU lockdep checking is enabled. [ 0.000000] \tRCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=1. [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 [ 0.000000] NR_IRQS:524544 nr_irqs:256 16 [ 0.000000] Console: colour *CGA 80x25 [ 0.000000] console [ttyS0] enabled [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.000000] ... MAX_LOCK_DEPTH: 48 [ 0.000000] ... MAX_LOCKDEP_KEYS: 8191 [ 0.000000] ... CLASSHASH_SIZE: 4096 [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384 [ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768 [ 0.000000] ... CHAINHASH_SIZE: 16384 [ 0.000000] memory used by lock dependency info: 6367 kB [ 0.000000] per task-struct memory footprint: 2688 bytes [ 0.000000] kmemleak: Kernel memory leak detector disabled main-loop: WARNING: I/O thread spun for 1000 iterations
(In reply to Kashyap Chamarthy from comment #8) > Created attachment 858129 [details] > Complete stderr of libguestfs-test-tool with direct backend and TCG > > Now for real, test with 'direct' backend and force_tcg. Here's the complete strace with 'direct' backend and force_tcg: http://kashyapc.fedorapeople.org/temp/qemu-strace-direct-backend-with-tcg.log PS: Forgot to note in the previous comment -- disregard comment #7]
Created attachment 861085 [details] Successful stderr of libguestfs-test-tool with direct backend Ok, I re-ran the test with "direct" and "libvirt" backends with (KVM), both are successful on this Btrfs machine with 3.14 rc1 Kernel -- 3.14.0-0.rc1.git3.1.fc21.x86_64 $ rpm -q libvirt-daemon-kvm qemu-system-x86 libguestfs libvirt-daemon-kvm-1.2.1-2.fc21.x86_64 qemu-system-x86-1.7.0-4.fc21.x86_64 libguestfs-1.25.29-1.fc21.x86_64
Created attachment 861097 [details] Successful stderr of libguestfs-test-tool with 'libvirt' backend
Since a mysterious rawhide Kernel fixed it, closing it with rationale: RAWHIDE I noted the fixed Kernel version in Comment #10