I have a fedora11 virtualization host. When I use virt-manager to create a fedora 11 guest (using the preview media), the install hangs 100% of the time in the mke2fs operation (to make the ext3 root on the guest). The guest is configured with mostly defaults given to virt-manager. The settings result in a x86_64 guest using kvm and virtio. I give the guest 2048mb memory and 8 cpus (system has 8 cores). In anaconda, I'm able to do a custom partition just fine but we hang forever in the mke2fs step. Over on the virtualization host, I'm running the latest fedora11 bits available. kernel: 2.6.29.4-167.fc11.x86_64 qemu-kvm-0.10.4-4.fc11.x86_64 libvirt-0.6.2-11.fc11.x86_64 virt-manager-0.7.0-5.fc11.x86_64 As mentioned before, the guest is at the fedora 11 preview as the problem happens during an install. If I delete the problem guest and make a new guest using "Linux" for OS type and "Generic 2.6.x kernel" (this results in a guest that does not use virtio), I'm able to get past this point without any trouble. So the problem seems related to virtio. I'm very surprised I've hit this and that it hasn't been reported so I'm suspicious. However, I can't think of anything I'm doing wrong. I'll note that I had created a virtio guest for SLES11 (had to tell virt-manager that it was F11/F11 to get it to do virtio) and SLES11 with virtio didn't suffer from these issues. I'll also note that I saw the bug reports on the issue at the "eject dvd" point and my problem is well before then. * Start up virt-manager * Click 'New' * Use these values in step 1: - Name: pick a name (I used g1-fedora11-build) - Connection: QEMU/KVM - For 'Choose how you would like to install the operating system', I keep 'Local Install' select - Click Forward * Use these values in step 2: - Be sure Use ISO image is select - Enter the path (I use this long path, an nfs mount): /data/mirrors/redhat/fedora/releases/test/11-Preview/Fedora/x86_64/iso/Fedora-11-Preview-x86_64-DVD.iso - For 'Choose an operating system and version' use Linux for OS Type and Fedora11 for Version. This ensures virtio is used. - Click Forward * In 'Choose Memory and CPU settings' I used - Memory (RAM): 2048 MB - CPUS: 8 * In Step 4... - Disk image size: 10gb - Leave allocate now checked - Click forward Go ahead with the settings and then you can start normal installation. In my case, anaconda told me to initialize /dev/vda, which I did. I used custom partitioning to create a 1gb swap and the rest for root. Once I submitted from there, after it wrote the partition table to the disk, it started to create the filesystem. This is the thing that never completed. The best answer is to tell me I missed something that said this is broken. But I checked the most frequent bugs list and tried several searches here in bugzilla and didn't see anybody else with this problem and that scares me :) One thing I did try was to pre-create the root filesystem. I did this by creating partitions for the image (from the virt host) using kpartx and putting a filesystem on the root partition. In this scenario, I was able to get installation on to the existing filesystem to work ok using virtio. The issue there was that package installation seemed very slow -- slower than my working kvm/no-virtio case. I didn't bother finishing the install. During an install hung at mke2fs, I was able to flip to the virtual console and get some sysrq information including a stack trace. I hope that it is some how helpful. exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 17192 .sched_goidle : 7338 .ttwu_count : 8154 .ttwu_local : 3537 .bkl_count : 929 cfs_rq[1]:/ .exec_clock : 342822.845670 .MIN_vruntime : 0.000001 .min_vruntime : 299396.872726 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 48207.062369 .nr_running : 1 .load : 1024 .nr_spread_over : 133 .shares : 0 rt_rq[1]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.000000 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- R sh 827 299396.872726 352 120 299396.872726 61182.805554 1147326.105595 / cpu#2, 2327.474 MHz .nr_running : 0 .load : 0 .nr_switches : 16238 .nr_load_updates : 365500 .nr_uninterruptible : 49 .jiffies : 4295890985 .next_balance : 4295.890989 .curr->pid : 0 .clock : 1223689.082660 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 16599 .sched_goidle : 7040 .ttwu_count : 8159 .ttwu_local : 3317 .bkl_count : 875 cfs_rq[2]:/ .exec_clock : 305516.524580 .MIN_vruntime : 0.000001 .min_vruntime : 272070.022838 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 20880.212481 .nr_running : 0 .load : 0 .nr_spread_over : 105 .shares : 0 rt_rq[2]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.029356 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#3, 2327.474 MHz .nr_running : 1 .load : 1024 .nr_switches : 17570 .nr_load_updates : 279794 .nr_uninterruptible : -33 .jiffies : 4295890985 .next_balance : 4295.891213 .curr->pid : 0 .clock : 1223689.519237 .cpu_load[0] : 1024 .cpu_load[1] : 512 .cpu_load[2] : 256 .cpu_load[3] : 128 .cpu_load[4] : 72 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 17776 .sched_goidle : 7194 .ttwu_count : 9127 .ttwu_local : 3683 .bkl_count : 519 cfs_rq[3]:/ .exec_clock : 370361.009166 .MIN_vruntime : 314497.179671 .min_vruntime : 314497.179671 .max_vruntime : 314497.179671 .spread : 0.000000 .spread0 : 63307.369314 .nr_running : 1 .load : 1024 .nr_spread_over : 77 .shares : 0 rt_rq[3]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.000000 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- mke2fs 1306 314497.190525 13171 120 314497.190525 650228.119990 364031.669582 / cpu#4, 2327.474 MHz .nr_running : 0 .load : 0 .nr_switches : 13649 .nr_load_updates : 286808 .nr_uninterruptible : -18 .jiffies : 4295890985 .next_balance : 4295.890989 .curr->pid : 0 .clock : 1223689.146358 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 13791 .sched_goidle : 5738 .ttwu_count : 6999 .ttwu_local : 3232 .bkl_count : 553 cfs_rq[4]:/ .exec_clock : 289442.108886 .MIN_vruntime : 0.000001 .min_vruntime : 251715.176949 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 525.366592 .nr_running : 0 .load : 0 .nr_spread_over : 76 .shares : 0 rt_rq[4]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.000000 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#5, 2327.474 MHz .nr_running : 1 .load : 1024 .nr_switches : 18356 .nr_load_updates : 282546 .nr_uninterruptible : -57 .jiffies : 4295890985 .next_balance : 4295.891053 .curr->pid : 809 .clock : 1223689.471474 .cpu_load[0] : 4145 .cpu_load[1] : 3365 .cpu_load[2] : 2390 .cpu_load[3] : 1732 .cpu_load[4] : 1231 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 18435 .sched_goidle : 6021 .ttwu_count : 9757 .ttwu_local : 5407 .bkl_count : 457 cfs_rq[5]:/ .exec_clock : 313457.780381 .MIN_vruntime : 273306.348539 .min_vruntime : 273306.345159 .max_vruntime : 273306.348539 .spread : 0.000000 .spread0 : 22116.534802 .nr_running : 2 .load : 4145 .nr_spread_over : 59 .shares : 0 rt_rq[5]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.000000 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- R ata/5 70 273280.097162 549 115 273280.097162 22607.806493 1159201.836821 / hald-addon-stor 809 273306.348539 5194 120 273306.348539 333022.652845 825502.637703 / cpu#6, 2327.474 MHz .nr_running : 0 .load : 0 .nr_switches : 11802 .nr_load_updates : 253868 .nr_uninterruptible : -121 .jiffies : 4295890985 .next_balance : 4295.890989 .curr->pid : 0 .clock : 1222784.763871 .cpu_load[0] : 3121 .cpu_load[1] : 4644 .cpu_load[2] : 11377 .cpu_load[3] : 17254 .cpu_load[4] : 15102 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 11924 .sched_goidle : 4796 .ttwu_count : 6167 .ttwu_local : 3078 .bkl_count : 476 cfs_rq[6]:/ .exec_clock : 290990.783902 .MIN_vruntime : 0.000001 .min_vruntime : 244118.810216 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : -7071.000141 .nr_running : 0 .load : 0 .nr_spread_over : 76 .shares : 0 rt_rq[6]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.025423 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#7, 2327.474 MHz .nr_running : 1 .load : 1024 .nr_switches : 13067 .nr_load_updates : 267506 .nr_uninterruptible : -128 .jiffies : 4295890985 .next_balance : 4295.891130 .curr->pid : 78 .clock : 1223689.738343 .cpu_load[0] : 1024 .cpu_load[1] : 1024 .cpu_load[2] : 1015 .cpu_load[3] : 907 .cpu_load[4] : 664 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 0 .sched_switch : 0 .sched_count : 13236 .sched_goidle : 5593 .ttwu_count : 6704 .ttwu_local : 2964 .bkl_count : 432 cfs_rq[7]:/ .exec_clock : 225640.752143 .MIN_vruntime : 171738.984674 .min_vruntime : 171738.984674 .max_vruntime : 171738.984674 .spread : 0.000000 .spread0 : -79450.825683 .nr_running : 1 .load : 1024 .nr_spread_over : 50 .shares : 0 rt_rq[7]:/ .rt_nr_running : 0 .rt_throttled : 0 .rt_time : 0.000000 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- R pdflush 78 171738.984674 12964 120 171738.984674 672227.661463 485475.704782 / SysRq : Show backtrace of all active CPUs CPU6: ffff88007b67ff00 ffff88007b67ff18 0000000000000046 ffff88007fbfcdc0 0000000000000006 ffffffff815f6300 0000000000000170 ffff88007b67ff48 ffffffff81014bae ffff88007b67ff68 ffffffff812416cd ffffffff813b02f8 Call Trace: <IRQ> [<ffffffff81014bae>] ? show_stack+0x1c/0x1e [<ffffffff812416cd>] ? showacpu+0x4d/0x60 [<ffffffff813b02f8>] ? trace_hardirqs_off_thunk+0x3a/0x6c [<ffffffff8106d8cb>] ? generic_smp_call_function_interrupt+0x46/0xf7 [<ffffffff810a34d8>] ? mempool_alloc_slab+0x15/0x17 [<ffffffff81022679>] ? smp_call_function_interrupt+0x24/0x34 [<ffffffff81012323>] ? call_function_interrupt+0x13/0x20 <EOI> [<ffffffff810cfedf>] ? kmem_cache_alloc+0xbc/0xf1 [<ffffffff813b02bc>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff810a34d8>] ? mempool_alloc_slab+0x15/0x17 [<ffffffff810a3605>] ? mempool_alloc+0x62/0x119 [<ffffffff811a5f32>] ? get_request+0x1a8/0x2b9 [<ffffffff810cf2d0>] ? new_slab+0x1ae/0x1d5 [<ffffffff811a6078>] ? get_request_wait+0x35/0x16b [<ffffffff811a9165>] ? ll_new_hw_segment+0x1a/0x6f [<ffffffff811a64c9>] ? __make_request+0x31b/0x420 [<ffffffff811a4e1c>] ? generic_make_request+0x256/0x2a7 [<ffffffff811a4f38>] ? submit_bio+0xcb/0xd4 [<ffffffff810f5bef>] ? submit_bh+0xea/0x10d [<ffffffff810f8115>] ? __block_write_full_page+0x1c4/0x2a4 [<ffffffff810fa5a2>] ? blkdev_get_block+0x0/0x63 [<ffffffff810fa5a2>] ? blkdev_get_block+0x0/0x63 [<ffffffff810f8272>] ? block_write_full_page+0x7d/0x86 [<ffffffff810fbbfb>] ? blkdev_writepage+0x18/0x1a [<ffffffff810a7dcf>] ? __writepage+0x17/0x34 [<ffffffff810a849d>] ? write_cache_pages+0x24c/0x3c9 [<ffffffff810a7db8>] ? __writepage+0x0/0x34 [<ffffffff810a863e>] ? generic_writepages+0x24/0x26 [<ffffffff810a866d>] ? do_writepages+0x2d/0x3c [<ffffffff810f1983>] ? __writeback_single_inode+0x18d/0x2d5 [<ffffffff8103ba85>] ? update_curr+0xf6/0xff [<ffffffff8103c293>] ? dequeue_entity+0x20/0x1b3 [<ffffffff810f1ea9>] ? generic_sync_sb_inodes+0x218/0x36d [<ffffffff810f2211>] ? writeback_inodes+0xa9/0x103 [<ffffffff810a87db>] ? wb_kupdate+0xa8/0x11e [<ffffffff810a951f>] ? pdflush+0x175/0x23a [<ffffffff810a8733>] ? wb_kupdate+0x0/0x11e [<ffffffff810a93aa>] ? pdflush+0x0/0x23a [<ffffffff810a93aa>] ? pdflush+0x0/0x23a [<ffffffff8105e881>] ? kthread+0x4d/0x78 [<ffffffff810126ca>] ? child_rip+0xa/0x20 [<ffffffff81011fe7>] ? restore_args+0x0/0x30 [<ffffffff8105e834>] ? kthread+0x0/0x78 [<ffffffff810126c0>] ? child_rip+0x0/0x20
Interesting. I just switched it from 8 cpus to 1 cpu, and then I don't hang at mke2fs any more.
If I don't get any ideas, I'll see what I can find by going ahead with this installation (with a 1 cpu guest). I'll switch it later to 8 cpus again. I'll see if I can duplicate the mke2fs hang on the running/non-anaconda system. If so, I'll update to the latest fedora11 kernels on the guest and see if it persists. If folks have better debugging ideas, let me know. I guess, on the running system, I can also take a crash dump if that helps.
Sorry, my last comment (#2) was a mistake. In that pass, I mis-checked something in custom partitioning so that the filesystem that was created was tiny. The creation of the tiny filesystem worked ok. The 10gb ones are not working. I found out my mistake when anaconda told me I had more packages selected then would fit. Sorry about that. Wish I could delete Comment # 2 :)
Very odd, haven't seen this and can't reproduce it Things to try: 1) Confirm for sure that it happens with UP guests, and stick with that config while trying to narrow it down 2) Switch to testing with virt-install, maybe use kickstart too, it'll make testing easier 3) Try final F11 media; also confirm that it works with F10 media 4) Try with 'virt-install --disk pool=default,size=8,cache=none' - maybe try cache=writeback too, but we don't recommend that 5) Add "console=ttyS0 vnc" to the guest command line when installing and then use "virsh console" to get to the guest console and see if there's anything unusual in the anaconda logs
WRT #3 (Use real F-11 media) I think I have 2 days to wait for that, right? I'll start using the other debugging ideas and hope to report something back later today. Thanks much.
I have found that it doesn't hang forever. It's just taking about 45 minutes to create the 8377MB filesystem. When not using emulated devices (not virtio) it's really fast. I did do a run using the suggestion from #5. I didn't notice anything abnormal but I'm going to set aside the logs (in ~root) in case you think they are noteworthy.
update summary to remove "forever". I just didn't imagine waiting 45 minutes would have resulted from getting past the mke2fs. In the latest tests, I'm using the default 512mb memory (instead of 2048) and UP.
I wonder if this slowdown is possibly related to use of a sparse file backing the disk.There's a bug causing all disks to be sparse in F11 https://bugzilla.redhat.com/show_bug.cgi?id=504605 I remember this made Xen PV catastrophically slow at I/O too, until the sparse file was fully allocated
erikj: could you try pre-allocating the image with e.g. 'dd if=/dev/zero of=/var/lib/libvirt/images/foo.img bs=1M count=10240'
(In reply to comment #9) > erikj: could you try pre-allocating the image with e.g. 'dd if=/dev/zero > of=/var/lib/libvirt/images/foo.img bs=1M count=10240' I followed this suggestion. It seemed to FIX the issue. I'd say the mke2fs operation only took 1 minute, maybe. That's still long-ish for a 10gb root but it's a lot better than 45! This does make me wonder why, using the same fedora 11 host, that a sles11 guest with a similarly sized root filesystem and also using virtio (I forced it) didn't seem to suffer from this issue. But in any case, this suggestion made a big difference. Details of test setup: * New * Name: f11-test * Local install media * Use ISO image * OS Type Linux, Version Fedora 11 * Kept defaults 512 MB, 1 CPU * Kept 'enable storage..' checked, kept 'Create disk image...' selected, Kept default disk size 8gb, kept 'allocate entire disk now' checked, forward * No advanced options adjusted (kept supplied mac address, virt type kvm, x86_64 arch, finish * At this point, the virtual machine was going to start up in install mode. * I did a force-shutdown * Looked at current image: # ls -lh f11-test-1.img -rw------- 1 root root 8.0G 2009-06-08 13:00 f11-test-1.img * Followed instructions from comment #9 using dd: # dd if=/dev/zero of=f11-test-1.img bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 293.755 s, 36.6 MB/s * re-connected the DVD iso as the emulated DVD for installation * Started up the virtual machine * Told bios to boot from DVD
Updating the subject. Even if we fix virt-manager to honour requests for non-sparse disks, we still need to get a better understanding of why virtio-blk is soo negatively impacted here - it shouldn't be suffering worse than IDE emulation.
Thanks Erik - the strange thing is that I cannot reproduce this with the same setup Could you give us some more details on the host? /proc/cpuinfo, lspci, what filesystem the image file is on etc. ?
The host machine is an SGI-branded system, an XE310. This type of system is a 1U system that has two systems sharing a single power supply. I'm using one half of it for these tests. The mainboard is a Supermicro X7DBT. Here is output from lsscsi. The 2nd disk is being used for some virtio test runs where the device is imported wholesale as a build work area. [0:0:0:0] disk ATA HDS725050KLA360 K2AO /dev/sda [1:0:0:0] disk ATA HDT722525DLA380 V44O /dev/sdb [6:0:0:0] cd/dvd PepperC Virtual Disc 1 0.01 /dev/sr1 [7:0:0:0] cd/dvd PepperC Virtual Disc 2 0.01 /dev/sr0 The system has lots of roots configured and I'm just using one partition for these tests. This could be an important difference then. The root partition -- where I'm also placing the images for the guests -- is 46G. With two images in use (1 8gb sles11, 1 10 gb f11), there is 28G available -- utilization is 41%. Perhaps other test systems you have access to have a single root using 250gb of space with 200 free or something. Maybe that could make a difference fragmentation; I'm not sure. As I noted earlier, a SLES11 guest (virtio) doesn't seem to be as sensitive but I could try more experiments there -- especially now that more disk space is used -- if that would help. As already noted by others, that could be moot anyway if even ide emulated guests are doing better in this regard. PS: I have access to other systems. I recently installed F11 on one of my roots on my desktop, a Dell optiplex 745 dual-core (core2 2.4ghz). My noteook, with core 2 2.8ghz processors, and many other test machine resources in the labs at the office. I'm open to trying more hardware types if that would be helpful. I can also check with George Beshers; it's likely that the Westford RH office has equipment like the one I'm using in this bug. lspci: 00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 31) 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev 31) 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev 31) 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev 31) 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev 31) 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31) 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31) 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31) 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31) 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31) 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31) 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31) 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09) 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09) 01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01) 04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) 08:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4654.73 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.03 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 4 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.02 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 2 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.02 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping : 7 cpu MHz : 2333.331 cache size : 4096 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4655.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
This just underscores the root disk utilization from my last add. [root@nada1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda11 46G 18G 26G 42% / tmpfs 2.0G 0 2.0G 0% /dev/shm attica.americas.sgi.com:/mirrors/redhat 3.7T 2.9T 811G 79% /data/mirrors/redhat [root@nada1 ~]# parted /dev/sda print Model: ATA HDS725050KLA360 (scsi) Disk /dev/sda: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 16.5MB 16.4MB primary ext3 2 16.5MB 50.0GB 50.0GB primary xfs boot 3 50.0GB 100GB 50.0GB primary 4 100GB 500GB 400GB extended 5 100GB 150GB 50.0GB logical 6 150GB 200GB 50.0GB logical 7 200GB 250GB 50.0GB logical 8 250GB 300GB 50.0GB logical 9 300GB 350GB 50.0GB logical 10 350GB 400GB 50.0GB logical 11 400GB 450GB 50.0GB logical ext3 12 450GB 500GB 50.0GB logical linux-swap [root@nada1 ~]# ls -lh /var/lib/libvirt/images/ total 14G -rw------- 1 root root 10G 2009-06-09 09:09 f11-test-1.img -rw-------. 1 root root 10G 2009-06-05 09:41 g1-sles11-build.img
(In reply to comment #10) > This does make me wonder why, using the same fedora 11 host, that a sles11 > guest with a similarly sized root filesystem and also using virtio (I forced > it) didn't seem to suffer from this issue. This is probably the best lead we have so far. Perhaps you try the sles11 kernel in the f11 guest and, if that fixes the issue, build your own kernels and try and bisect the issue?
We can pretty reliably reproduce this bug using libguestfs. Changing the disk interface from IDE to virtio causes mkfs and zerofree operations (only) to take about 6 times longer. The backing is also a sparse file, and the guest is UP, and the cache=off in both cases. Upstream discussion: https://www.redhat.com/archives/fedora-virt/2009-July/thread.html#00028
For libguestfs at least, it's NOT sparseness which is the problem. I changed our test to use posix_fallocate to create the files and still see the same slowdown.
I created another bug for the non-sparse case: bug 509383
Can the reporter please try the suggestion here: https://bugzilla.redhat.com/show_bug.cgi?id=509383#c3 to do this in the guest: for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done
(In reply to comment #19) > for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done Erik, if that works for you, we should mark this bug as a dup of bug #509383
Still trying to get system access. Got some access today but they had BIOS redirection so I couldn't turn off HT. I want to turn off HT so the comparison is apples to apples. We work-ordered a system that I should have access to in a week or so. PS: George Beshers tell me SGI-branded Nehalem XE270 systems have arrived in Westford and at least one will be in RHTS.
Conclusion: enabling rotation as described made a huge difference. 27 minutes to just over 2 minutes. Details: HW: SGI XE270, 8gb host memory, 8 core, 2-socket, nehalem, Supermicro X8DTN Intel(R) Xeon(R) CPU X5570 @ 2.93GHz test.img (where I ran the mkfs.ext3 on) created like this on host: qemu-img create -f raw test.img 10G qemu command line: /usr/bin/qemu-kvm -M pc -m 4096 -smp 8 -name f11-test -uuid b7b4b7e4-9c07-22aa-0c95-d5c8a24176c5 -monitor pty -pidfile /var/run/libvirt/qemu//f11-test.pid -drive file=/var/lib/libvirt/images/f11-test.img,if=virtio,index=0,boot=on -drive file=/dev/sdb,if=virtio,index=1 -drive file=/var/lib/libvirt/images/test.img,if=virtio,index=2 -net nic,macaddr=54:52:00:46:48:0e,model=virtio -net user -serial pty -parallel none -usb -usbdevice tablet -vnc cct201:1 -soundhw es1370 -redir tcp:5555::22 test case commands ------------------ # parted -s /dev/vdc mklabel msdos # parted -s /dev/vdc mkpart primary ext2 0 10.7GB # time mkfs.ext3 /dev/vdc1 test: create filesystem on 10gb virtio disk image (test.img), no rotate ----------------------------------------------------------------------- timing: trial 1 real 27m38.213s user 0m0.008s sys 1m24.040s timing: trial 2 (re-used host image, re-created fs on re-used test.img) real 0m8.038s user 0m0.009s sys 0m0.589s timing: trial 3 (proving that I can reset the test... shut down qemu, removed test.img, created it again) real 28m4.008s user 0m0.004s sys 1m15.732s So yes, we can reset the failure case in a reliable way. test: create filesystem on 10gb virtio disk image (test.img), rotational ------------------------------------------------------------------------ ** reset the test by shutting down qemu, removing test.img, and re-creating it. ** issued this command before performing test case commands in guest: for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done Then ran test case commands to create the partition table and filesystem. timing: trial 1 real 2m14.717s user 0m0.008s sys 0m8.480s timing: trial 2 (re-used host image, re-created fs on re-used test.img) real 0m7.671s user 0m0.005s sys 0m0.618s
*** This bug has been marked as a duplicate of bug 509383 ***