Bug 504376
Summary: | F-11 guest takes 45 minutes in mke2fs for 8gb fs on virtio disk with a sparse backing file | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | erikj |
Component: | qemu | Assignee: | Glauber Costa <gcosta> |
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 11 | CC: | berrange, chellwig, clalance, dwmw2, ehabkost, gcosta, itamar, markmc, quintela, rjones, virt-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-07-09 01:29:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 480594 |
Description
erikj
2009-06-05 20:41:59 UTC
Interesting. I just switched it from 8 cpus to 1 cpu, and then I don't hang at mke2fs any more. If I don't get any ideas, I'll see what I can find by going ahead with this installation (with a 1 cpu guest). I'll switch it later to 8 cpus again. I'll see if I can duplicate the mke2fs hang on the running/non-anaconda system. If so, I'll update to the latest fedora11 kernels on the guest and see if it persists. If folks have better debugging ideas, let me know. I guess, on the running system, I can also take a crash dump if that helps. Sorry, my last comment (#2) was a mistake. In that pass, I mis-checked something in custom partitioning so that the filesystem that was created was tiny. The creation of the tiny filesystem worked ok. The 10gb ones are not working. I found out my mistake when anaconda told me I had more packages selected then would fit. Sorry about that. Wish I could delete Comment # 2 :) Very odd, haven't seen this and can't reproduce it Things to try: 1) Confirm for sure that it happens with UP guests, and stick with that config while trying to narrow it down 2) Switch to testing with virt-install, maybe use kickstart too, it'll make testing easier 3) Try final F11 media; also confirm that it works with F10 media 4) Try with 'virt-install --disk pool=default,size=8,cache=none' - maybe try cache=writeback too, but we don't recommend that 5) Add "console=ttyS0 vnc" to the guest command line when installing and then use "virsh console" to get to the guest console and see if there's anything unusual in the anaconda logs WRT #3 (Use real F-11 media) I think I have 2 days to wait for that, right? I'll start using the other debugging ideas and hope to report something back later today. Thanks much. I have found that it doesn't hang forever. It's just taking about 45 minutes to create the 8377MB filesystem. When not using emulated devices (not virtio) it's really fast. I did do a run using the suggestion from #5. I didn't notice anything abnormal but I'm going to set aside the logs (in ~root) in case you think they are noteworthy. update summary to remove "forever". I just didn't imagine waiting 45 minutes would have resulted from getting past the mke2fs. In the latest tests, I'm using the default 512mb memory (instead of 2048) and UP. I wonder if this slowdown is possibly related to use of a sparse file backing the disk.There's a bug causing all disks to be sparse in F11 https://bugzilla.redhat.com/show_bug.cgi?id=504605 I remember this made Xen PV catastrophically slow at I/O too, until the sparse file was fully allocated erikj: could you try pre-allocating the image with e.g. 'dd if=/dev/zero of=/var/lib/libvirt/images/foo.img bs=1M count=10240' (In reply to comment #9) > erikj: could you try pre-allocating the image with e.g. 'dd if=/dev/zero > of=/var/lib/libvirt/images/foo.img bs=1M count=10240' I followed this suggestion. It seemed to FIX the issue. I'd say the mke2fs operation only took 1 minute, maybe. That's still long-ish for a 10gb root but it's a lot better than 45! This does make me wonder why, using the same fedora 11 host, that a sles11 guest with a similarly sized root filesystem and also using virtio (I forced it) didn't seem to suffer from this issue. But in any case, this suggestion made a big difference. Details of test setup: * New * Name: f11-test * Local install media * Use ISO image * OS Type Linux, Version Fedora 11 * Kept defaults 512 MB, 1 CPU * Kept 'enable storage..' checked, kept 'Create disk image...' selected, Kept default disk size 8gb, kept 'allocate entire disk now' checked, forward * No advanced options adjusted (kept supplied mac address, virt type kvm, x86_64 arch, finish * At this point, the virtual machine was going to start up in install mode. * I did a force-shutdown * Looked at current image: # ls -lh f11-test-1.img -rw------- 1 root root 8.0G 2009-06-08 13:00 f11-test-1.img * Followed instructions from comment #9 using dd: # dd if=/dev/zero of=f11-test-1.img bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 293.755 s, 36.6 MB/s * re-connected the DVD iso as the emulated DVD for installation * Started up the virtual machine * Told bios to boot from DVD Updating the subject. Even if we fix virt-manager to honour requests for non-sparse disks, we still need to get a better understanding of why virtio-blk is soo negatively impacted here - it shouldn't be suffering worse than IDE emulation. Thanks Erik - the strange thing is that I cannot reproduce this with the same setup Could you give us some more details on the host? /proc/cpuinfo, lspci, what filesystem the image file is on etc. ? The host machine is an SGI-branded system, an XE310. This type of system is a 1U system that has two systems sharing a single power supply. I'm using one half of it for these tests. The mainboard is a Supermicro X7DBT. Here is output from lsscsi. The 2nd disk is being used for some virtio test runs where the device is imported wholesale as a build work area. [0:0:0:0] disk ATA HDS725050KLA360 K2AO /dev/sda [1:0:0:0] disk ATA HDT722525DLA380 V44O /dev/sdb [6:0:0:0] cd/dvd PepperC Virtual Disc 1 0.01 /dev/sr1 [7:0:0:0] cd/dvd PepperC Virtual Disc 2 0.01 /dev/sr0 The system has lots of roots configured and I'm just using one partition for these tests. This could be an important difference then. The root partition -- where I'm also placing the images for the guests -- is 46G. With two images in use (1 8gb sles11, 1 10 gb f11), there is 28G available -- utilization is 41%. Perhaps other test systems you have access to have a single root using 250gb of space with 200 free or something. Maybe that could make a difference fragmentation; I'm not sure. As I noted earlier, a SLES11 guest (virtio) doesn't seem to be as sensitive but I could try more experiments there -- especially now that more disk space is used -- if that would help. As already noted by others, that could be moot anyway if even ide emulated guests are doing better in this regard. PS: I have access to other systems. I recently installed F11 on one of my roots on my desktop, a Dell optiplex 745 dual-core (core2 2.4ghz). My noteook, with core 2 2.8ghz processors, and many other test machine resources in the labs at the office. I'm open to trying more hardware types if that would be helpful. I can also check with George Beshers; it's likely that the Westford RH office has equipment like the one I'm using in this bug. lspci: 00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 31) 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev 31) 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev 31) 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev 31) 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev 31) 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31) 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31) 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31) 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31) 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31) 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31) 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31) 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09) 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09) 01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01) 04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) 08:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4654.73 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.03 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 4 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.02 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 2 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.02 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: This just underscores the root disk utilization from my last add. [root@nada1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda11 46G 18G 26G 42% / tmpfs 2.0G 0 2.0G 0% /dev/shm attica.americas.sgi.com:/mirrors/redhat 3.7T 2.9T 811G 79% /data/mirrors/redhat [root@nada1 ~]# parted /dev/sda print Model: ATA HDS725050KLA360 (scsi) Disk /dev/sda: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 16.5MB 16.4MB primary ext3 2 16.5MB 50.0GB 50.0GB primary xfs boot 3 50.0GB 100GB 50.0GB primary 4 100GB 500GB 400GB extended 5 100GB 150GB 50.0GB logical 6 150GB 200GB 50.0GB logical 7 200GB 250GB 50.0GB logical 8 250GB 300GB 50.0GB logical 9 300GB 350GB 50.0GB logical 10 350GB 400GB 50.0GB logical 11 400GB 450GB 50.0GB logical ext3 12 450GB 500GB 50.0GB logical linux-swap [root@nada1 ~]# ls -lh /var/lib/libvirt/images/ total 14G -rw------- 1 root root 10G 2009-06-09 09:09 f11-test-1.img -rw-------. 1 root root 10G 2009-06-05 09:41 g1-sles11-build.img (In reply to comment #10) > This does make me wonder why, using the same fedora 11 host, that a sles11 > guest with a similarly sized root filesystem and also using virtio (I forced > it) didn't seem to suffer from this issue. This is probably the best lead we have so far. Perhaps you try the sles11 kernel in the f11 guest and, if that fixes the issue, build your own kernels and try and bisect the issue? We can pretty reliably reproduce this bug using libguestfs. Changing the disk interface from IDE to virtio causes mkfs and zerofree operations (only) to take about 6 times longer. The backing is also a sparse file, and the guest is UP, and the cache=off in both cases. Upstream discussion: https://www.redhat.com/archives/fedora-virt/2009-July/thread.html#00028 For libguestfs at least, it's NOT sparseness which is the problem. I changed our test to use posix_fallocate to create the files and still see the same slowdown. I created another bug for the non-sparse case: bug 509383 Can the reporter please try the suggestion here: https://bugzilla.redhat.com/show_bug.cgi?id=509383#c3 to do this in the guest: for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done (In reply to comment #19) > for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done Erik, if that works for you, we should mark this bug as a dup of bug #509383 Still trying to get system access. Got some access today but they had BIOS redirection so I couldn't turn off HT. I want to turn off HT so the comparison is apples to apples. We work-ordered a system that I should have access to in a week or so. PS: George Beshers tell me SGI-branded Nehalem XE270 systems have arrived in Westford and at least one will be in RHTS. Conclusion: enabling rotation as described made a huge difference. 27 minutes to just over 2 minutes. Details: HW: SGI XE270, 8gb host memory, 8 core, 2-socket, nehalem, Supermicro X8DTN Intel(R) Xeon(R) CPU X5570 @ 2.93GHz test.img (where I ran the mkfs.ext3 on) created like this on host: qemu-img create -f raw test.img 10G qemu command line: /usr/bin/qemu-kvm -M pc -m 4096 -smp 8 -name f11-test -uuid b7b4b7e4-9c07-22aa-0c95-d5c8a24176c5 -monitor pty -pidfile /var/run/libvirt/qemu//f11-test.pid -drive file=/var/lib/libvirt/images/f11-test.img,if=virtio,index=0,boot=on -drive file=/dev/sdb,if=virtio,index=1 -drive file=/var/lib/libvirt/images/test.img,if=virtio,index=2 -net nic,macaddr=54:52:00:46:48:0e,model=virtio -net user -serial pty -parallel none -usb -usbdevice tablet -vnc cct201:1 -soundhw es1370 -redir tcp:5555::22 test case commands ------------------ # parted -s /dev/vdc mklabel msdos # parted -s /dev/vdc mkpart primary ext2 0 10.7GB # time mkfs.ext3 /dev/vdc1 test: create filesystem on 10gb virtio disk image (test.img), no rotate ----------------------------------------------------------------------- timing: trial 1 real 27m38.213s user 0m0.008s sys 1m24.040s timing: trial 2 (re-used host image, re-created fs on re-used test.img) real 0m8.038s user 0m0.009s sys 0m0.589s timing: trial 3 (proving that I can reset the test... shut down qemu, removed test.img, created it again) real 28m4.008s user 0m0.004s sys 1m15.732s So yes, we can reset the failure case in a reliable way. test: create filesystem on 10gb virtio disk image (test.img), rotational ------------------------------------------------------------------------ ** reset the test by shutting down qemu, removing test.img, and re-creating it. ** issued this command before performing test case commands in guest: for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done Then ran test case commands to create the partition table and filesystem. timing: trial 1 real 2m14.717s user 0m0.008s sys 0m8.480s timing: trial 2 (re-used host image, re-created fs on re-used test.img) real 0m7.671s user 0m0.005s sys 0m0.618s *** This bug has been marked as a duplicate of bug 509383 *** |