+++ This bug was initially created as a clone of Bug #1359324 +++ Description of problem: This is bug 2 of 2. This original bug 1359324 is about the apparent crash of qemu-system-x86. Where this bug 2 of 2 will be the side effect of this crash where the qcow2 backing file becomes extraordinarily large (37 Petabytes) and is also corrupt. Version-Release number of selected component (if applicable): qemu-system-x86-2.6.0-5.fc24.x86_64 4.6.4-301.fc24.x86_64 How reproducible: Unknown, not attempted again yet Steps to Reproduce: 1. # qemu-img create -f qcow2 -o nocow=on uefi_opensuseleap42.2a3-1.qcow2 50G # qemu-img create -f qcow2 -o nocow=on uefi_opensuseleap42.2a3-2.qcow2 50G 2. Both of these back virtio disks, appearing as vda and vdb in the guest. 3. In the guest, I ask YaST to use both vda and vdb, create an EFI System partition for both drives, and the rest of the free space on both drives become md members set to RAID level1. Then I start the installation. 4. At some point well past midway I get an rpm I/O error from YaST, none of this environment state was saved, except what might appear in the host's journal. After the error, YaST wouldn't continue so I chose the power off option. And it powered off the VM cleanly. The two obvious problems are the difference in qcow2 file sizes, as if the md RAID setup didn't work correctly. But I'm going to set aside the 2nd qcow2 not having been written to at all since it was created, and deal with this 37 Petabyte qcow2. 5. When I change to a Fedora 24 Workstation ISO to boot the VM, it fails to start, complaining about qcow2 corruption. Actual results: There are two results: Jul 22 13:24:30 f24m systemd-coredump[3914]: Process 3829 (qemu-system-x86) of user 107 dumped core. Stack trace of thread 3829: #0 0x00007f9faceec6f5 n/a (n/a) And also [root@f24m images]# ll total 59765472 -rw-r-----. 1 qemu qemu 1541406720 Jul 21 10:54 Fedora-Workstation-Live-x86_64-24-1.2.iso -rw-r--r--. 1 qemu qemu 1433403392 Jul 20 13:28 Fedora-Workstation-Live-x86_64-Rawhide-20160718.n.0.iso -rw-r-----. 1 qemu qemu 4647288832 Jul 22 10:43 openSUSE-Leap-42.2-DVD-x86_64-Build0109-Media.iso -rw-r--r--. 1 root root 40537894204538880 Jul 22 13:23 uefi_opensuseleap42.2a3-1.qcow2 -rw-r--r--. 1 root root 197632 Jul 22 08:46 uefi_opensuseleap42.2a3-2.qcow2 Yes that's a 37 Petabyte file. Expected results: qemu shouldn't crash, nor should it corrupt its backing files (while also making them 37P in size). Additional info: coredump file is ~185MiB https://drive.google.com/open?id=0B_2Asp8DGjJ9UHNJSXJBUTBCTzg [root@f24m images]# coredumpctl -o qemu-system-x86.coredump dump /usr/bin/qemu-system-x86_64 PID: 3829 (qemu-system-x86) UID: 107 (qemu) GID: 107 (qemu) Signal: 6 (ABRT) Timestamp: Fri 2016-07-22 13:24:21 MDT (40min ago) Command Line: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name UEFI,debug-threads=on -S -machine pc-i440fx-2.4,accel=kvm,usb=off,vmport=off -cpu SandyBridge -drive file=/usr/share/edk2/ovmf/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/UEFI_VARS.fd,if=pflash,format=raw,unit=1 -m 3072 -realtime mlock=off -smp 3,sockets=3,cores=1,threads=1 -uuid 11831a99-fad2-4e1f-8a31-f521cbf91ff3 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-UEFI/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=off,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device ahci,id=sata0,bus=pci.0,addr=0x8 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/Fedora-Workstation-Live-x86_64-24-1.2.iso,format=raw,if=none,media=cdrom,id=drive-sata0-0-1,readonly=on -device ide-cd,bus=sata0.1,drive=drive-sata0-0-1,id=sata0-0-1,bootindex=1 -drive file=/var/lib/libvirt/images/uefi_opensuseleap42.2a3-1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=unsafe,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/var/lib/libvirt/images/uefi_opensuseleap42.2a3-1.qcow2,format=qcow2,if=none,id=drive-virtio-disk1,cache=unsafe,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=3 -netdev tap,fd=25,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:fe:40:e3,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on Executable: /usr/bin/qemu-system-x86_64 Control Group: /machine.slice/machine-qemu\x2d2\x2dUEFI.scope Unit: machine-qemu\x2d2\x2dUEFI.scope Slice: machine.slice Boot ID: b91161300395440f96b49cd0b879488d Machine ID: 358f3fdc5df34832b44a6816f3b04881 Hostname: f24m Coredump: /var/lib/systemd/coredump/core.qemu-system-x86.107.b91161300395440f96b49cd0b879488d.3829.1469215461000000000000.lz4 Message: Process 3829 (qemu-system-x86) of user 107 dumped core. Stack trace of thread 3829: #0 0x00007f9faceec6f5 n/a (n/a) More than one entry matches, ignoring rest.
Created attachment 1182953 [details] filefrag output It gets suspicious at line 9753, between fragments 9746 and 0747. Filesystem type is: 9123683e File size of uefi_opensuseleap42.2a3-1.qcow2 is 40537894204538880 (9896946827280 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: [...snip...] 9746: 13207824.. 13212799: 153115392.. 153120367: 4976: 9747: 2279017021440..2279017021455: 150018353.. 150018368: 16: 153120368: OK so it's a sparse file with a rather large gap near the end.
kernel-4.6.4-301.fc24.x86_64 btrfs-progs-4.6.1-1.fc25.x86_64 Mounts, reads, writes, without Btrfs or device errors, before, during and after the qemu crash. Scrub comes up clean. Offline btrfs check has no complaints. Mount options /dev/sda5 on / type btrfs (rw,noatime,seclabel,ssd,space_cache,subvolid=583,subvol=/root24w) There are four subvolumes (two are read only snapshots). Anyway, if it's a Btrfs problem, it's not in the usual category that includes noisy+scary Btrfs messages. Asked upstream anyway: http://article.gmane.org/gmane.comp.file-systems.btrfs/58557
Created attachment 1183144 [details] bz1359325.pl I wasn't able to reproduce this. My reproducer (attached) is still running however so I'll see how it goes. What I did to attempt to recreate this situation in a reproducible manner. (1) Created a 200GB logical volume on a Fedora 24 host: $ sudo lvcreate -L 200G -n bz1359325-btrfs vg Logical volume "bz1359325-btrfs" created. (2) Formatted it with btrfs, all default options: $ sudo mkfs.btrfs /dev/vg/bz1359325-btrfs btrfs-progs v4.5.2 See http://btrfs.wiki.kernel.org for more information. Label: (null) UUID: c941bb1b-f8b0-482a-977a-1df47b5225bb Node size: 16384 Sector size: 4096 Filesystem size: 200.00GiB Block group profiles: Data: single 8.00MiB Metadata: DUP 1.01GiB System: DUP 12.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 200.00GiB /dev/vg/bz1359325-btrfs (3) Mounted it up: $ mkdir /tmp/mnt $ sudo mount /dev/vg/bz1359325-btrfs /tmp/mnt (4) Create 2 x qcow2 nocow files: $ sudo qemu-img create -f qcow2 -o nocow=on /tmp/mnt/disk1.qcow2 50G Formatting '/tmp/mnt/disk1.qcow2', fmt=qcow2 size=53687091200 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 nocow=on $ sudo qemu-img create -f qcow2 -o nocow=on /tmp/mnt/disk2.qcow2 50G Formatting '/tmp/mnt/disk2.qcow2', fmt=qcow2 size=53687091200 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 nocow=on (5) Used the attached guestfs script to format and hammer these disks with writes. Note this uses virtio-scsi, not virtio-blk. $ sudo /tmp/bz1359325.pl
So bug 1359324 determined that the corruption is due to user error causing qemu to write to the same backing file. Ending up with a 37 Petabyte sparse file certainly sounds strange but maybe we just chalk it up to the unpredictability of simultaneously writing to a single disk image? Of course the qemu crash in bug 1359324 should be investigated
I've tried reproducing this but it never grows beyond the qemu-img create specified size. For now this could be closed with insufficient data I guess.