Bug 504376

Summary: F-11 guest takes 45 minutes in mke2fs for 8gb fs on virtio disk with a sparse backing file
Product: [Fedora] Fedora Reporter: erikj
Component: qemuAssignee: Glauber Costa <gcosta>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 11CC: berrange, chellwig, clalance, dwmw2, ehabkost, gcosta, itamar, markmc, quintela, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-09 01:29:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 480594    

Description erikj 2009-06-05 20:41:59 UTC
I have a fedora11 virtualization host.  When I use virt-manager to create a fedora 11 guest (using the preview media), the install hangs 100% of the time in the mke2fs operation (to make the ext3 root on the guest).

The guest is configured with mostly defaults given to virt-manager. The settings result in a x86_64 guest using kvm and virtio.  I give the guest 2048mb memory and 8 cpus (system has 8 cores).

In anaconda, I'm able to do a custom partition just fine but we hang forever in the mke2fs step.

Over on the virtualization host, I'm running the latest fedora11 bits available.

kernel: 2.6.29.4-167.fc11.x86_64
qemu-kvm-0.10.4-4.fc11.x86_64
libvirt-0.6.2-11.fc11.x86_64
virt-manager-0.7.0-5.fc11.x86_64

As mentioned before, the guest is at the fedora 11 preview as the problem happens during an install.

If I delete the problem guest and make a new guest using "Linux" for OS type and "Generic 2.6.x kernel" (this results in a guest that does not use virtio), I'm able to get past this point without any trouble.

So the problem seems related to virtio.

I'm very surprised I've hit this and that it hasn't been reported so I'm suspicious.  However, I can't think of anything I'm doing wrong. I'll note that I had created a virtio guest for SLES11 (had to tell virt-manager that it was F11/F11 to get it to do virtio) and SLES11 with virtio didn't suffer from these issues.  I'll also note that I saw the bug reports on the issue at the "eject
dvd" point and my problem is well before then.

* Start up virt-manager
* Click 'New'
* Use these values in step 1:
  - Name: pick a name (I used g1-fedora11-build)
  - Connection: QEMU/KVM
  - For 'Choose how you would like to install the operating system', I keep
    'Local Install' select
  - Click Forward
* Use these values in step 2:
  - Be sure Use ISO image is select
  - Enter the path (I use this long path, an nfs mount):
    /data/mirrors/redhat/fedora/releases/test/11-Preview/Fedora/x86_64/iso/Fedora-11-Preview-x86_64-DVD.iso
  - For 'Choose an operating system and version' use Linux for OS Type and  Fedora11 for Version. This ensures virtio is used.
  - Click Forward
* In 'Choose Memory and CPU settings' I used
  - Memory (RAM): 2048 MB
  - CPUS: 8
* In Step 4...
  - Disk image size: 10gb
  - Leave allocate now checked
  - Click forward

Go ahead with the settings and then you can start normal installation. In my case, anaconda told me to initialize /dev/vda, which I did. I used custom partitioning to create a 1gb swap and the rest for root.  Once I submitted from there, after it wrote the partition table to the disk, it started to create the filesystem.  This is the thing that never completed.

The best answer is to tell me I missed something that said this is broken.  But I checked the most frequent bugs list and tried several searches here in bugzilla and didn't see anybody else with this problem and that scares me :)

One thing I did try was to pre-create the root filesystem.  I did this by creating partitions for the image (from the virt host) using kpartx and putting a filesystem on the root partition.  In this scenario, I was able to get installation on to the existing filesystem to work ok using virtio.  The issue there was that package installation seemed very slow -- slower than my working kvm/no-virtio case.  I didn't bother finishing the install.

During an install hung at mke2fs, I was able to flip to the virtual console and get some sysrq information including a stack trace.  I hope that it is some how helpful.

exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 17192
  .sched_goidle                  : 7338
  .ttwu_count                    : 8154
  .ttwu_local                    : 3537
  .bkl_count                     : 929

cfs_rq[1]:/
  .exec_clock                    : 342822.845670
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 299396.872726
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 48207.062369
  .nr_running                    : 1
  .load                          : 1024
  .nr_spread_over                : 133
  .shares                        : 0

rt_rq[1]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R             sh   827    299396.872726       352   120    299396.872726     61182.805554   1147326.105595 /

cpu#2, 2327.474 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 16238
  .nr_load_updates               : 365500
  .nr_uninterruptible            : 49
  .jiffies                       : 4295890985
  .next_balance                  : 4295.890989
  .curr->pid                     : 0
  .clock                         : 1223689.082660
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 16599
  .sched_goidle                  : 7040
  .ttwu_count                    : 8159
  .ttwu_local                    : 3317
  .bkl_count                     : 875

cfs_rq[2]:/
  .exec_clock                    : 305516.524580
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 272070.022838
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 20880.212481
  .nr_running                    : 0
  .load                          : 0
  .nr_spread_over                : 105
  .shares                        : 0

rt_rq[2]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.029356
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#3, 2327.474 MHz
  .nr_running                    : 1
  .load                          : 1024
  .nr_switches                   : 17570
  .nr_load_updates               : 279794
  .nr_uninterruptible            : -33
  .jiffies                       : 4295890985
  .next_balance                  : 4295.891213
  .curr->pid                     : 0
  .clock                         : 1223689.519237
  .cpu_load[0]                   : 1024
  .cpu_load[1]                   : 512
  .cpu_load[2]                   : 256
  .cpu_load[3]                   : 128
  .cpu_load[4]                   : 72
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 17776
  .sched_goidle                  : 7194
  .ttwu_count                    : 9127
  .ttwu_local                    : 3683
  .bkl_count                     : 519

cfs_rq[3]:/
  .exec_clock                    : 370361.009166
  .MIN_vruntime                  : 314497.179671
  .min_vruntime                  : 314497.179671
  .max_vruntime                  : 314497.179671
  .spread                        : 0.000000
  .spread0                       : 63307.369314
  .nr_running                    : 1
  .load                          : 1024
  .nr_spread_over                : 77
  .shares                        : 0

rt_rq[3]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
          mke2fs  1306    314497.190525     13171   120    314497.190525    650228.119990    364031.669582 /

cpu#4, 2327.474 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 13649
  .nr_load_updates               : 286808
  .nr_uninterruptible            : -18
  .jiffies                       : 4295890985
  .next_balance                  : 4295.890989
  .curr->pid                     : 0
  .clock                         : 1223689.146358
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 13791
  .sched_goidle                  : 5738
  .ttwu_count                    : 6999
  .ttwu_local                    : 3232
  .bkl_count                     : 553

cfs_rq[4]:/
  .exec_clock                    : 289442.108886
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 251715.176949
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 525.366592
  .nr_running                    : 0
  .load                          : 0
  .nr_spread_over                : 76
  .shares                        : 0

rt_rq[4]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#5, 2327.474 MHz
  .nr_running                    : 1
  .load                          : 1024
  .nr_switches                   : 18356
  .nr_load_updates               : 282546
  .nr_uninterruptible            : -57
  .jiffies                       : 4295890985
  .next_balance                  : 4295.891053
  .curr->pid                     : 809
  .clock                         : 1223689.471474
  .cpu_load[0]                   : 4145
  .cpu_load[1]                   : 3365
  .cpu_load[2]                   : 2390
  .cpu_load[3]                   : 1732
  .cpu_load[4]                   : 1231
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 18435
  .sched_goidle                  : 6021
  .ttwu_count                    : 9757
  .ttwu_local                    : 5407
  .bkl_count                     : 457

cfs_rq[5]:/
  .exec_clock                    : 313457.780381
  .MIN_vruntime                  : 273306.348539
  .min_vruntime                  : 273306.345159
  .max_vruntime                  : 273306.348539
  .spread                        : 0.000000
  .spread0                       : 22116.534802
  .nr_running                    : 2
  .load                          : 4145
  .nr_spread_over                : 59
  .shares                        : 0

rt_rq[5]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R          ata/5    70    273280.097162       549   115    273280.097162     22607.806493   1159201.836821 /
 hald-addon-stor   809    273306.348539      5194   120    273306.348539    333022.652845    825502.637703 /

cpu#6, 2327.474 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 11802
  .nr_load_updates               : 253868
  .nr_uninterruptible            : -121
  .jiffies                       : 4295890985
  .next_balance                  : 4295.890989
  .curr->pid                     : 0
  .clock                         : 1222784.763871
  .cpu_load[0]                   : 3121
  .cpu_load[1]                   : 4644
  .cpu_load[2]                   : 11377
  .cpu_load[3]                   : 17254
  .cpu_load[4]                   : 15102
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 11924
  .sched_goidle                  : 4796
  .ttwu_count                    : 6167
  .ttwu_local                    : 3078
  .bkl_count                     : 476

cfs_rq[6]:/
  .exec_clock                    : 290990.783902
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 244118.810216
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -7071.000141
  .nr_running                    : 0
  .load                          : 0
  .nr_spread_over                : 76
  .shares                        : 0

rt_rq[6]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.025423
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#7, 2327.474 MHz
  .nr_running                    : 1
  .load                          : 1024
  .nr_switches                   : 13067
  .nr_load_updates               : 267506
  .nr_uninterruptible            : -128
  .jiffies                       : 4295890985
  .next_balance                  : 4295.891130
  .curr->pid                     : 78
  .clock                         : 1223689.738343
  .cpu_load[0]                   : 1024
  .cpu_load[1]                   : 1024
  .cpu_load[2]                   : 1015
  .cpu_load[3]                   : 907
  .cpu_load[4]                   : 664
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 13236
  .sched_goidle                  : 5593
  .ttwu_count                    : 6704
  .ttwu_local                    : 2964
  .bkl_count                     : 432

cfs_rq[7]:/
  .exec_clock                    : 225640.752143
  .MIN_vruntime                  : 171738.984674
  .min_vruntime                  : 171738.984674
  .max_vruntime                  : 171738.984674
  .spread                        : 0.000000
  .spread0                       : -79450.825683
  .nr_running                    : 1
  .load                          : 1024
  .nr_spread_over                : 50
  .shares                        : 0

rt_rq[7]:/
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R        pdflush    78    171738.984674     12964   120    171738.984674    672227.661463    485475.704782 /

SysRq : Show backtrace of all active CPUs
CPU6:
 ffff88007b67ff00 ffff88007b67ff18 0000000000000046 ffff88007fbfcdc0
 0000000000000006 ffffffff815f6300 0000000000000170 ffff88007b67ff48
 ffffffff81014bae ffff88007b67ff68 ffffffff812416cd ffffffff813b02f8
Call Trace:
 <IRQ>  [<ffffffff81014bae>] ? show_stack+0x1c/0x1e
 [<ffffffff812416cd>] ? showacpu+0x4d/0x60
 [<ffffffff813b02f8>] ? trace_hardirqs_off_thunk+0x3a/0x6c
 [<ffffffff8106d8cb>] ? generic_smp_call_function_interrupt+0x46/0xf7
 [<ffffffff810a34d8>] ? mempool_alloc_slab+0x15/0x17
 [<ffffffff81022679>] ? smp_call_function_interrupt+0x24/0x34
 [<ffffffff81012323>] ? call_function_interrupt+0x13/0x20
 <EOI>  [<ffffffff810cfedf>] ? kmem_cache_alloc+0xbc/0xf1
 [<ffffffff813b02bc>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff810a34d8>] ? mempool_alloc_slab+0x15/0x17
 [<ffffffff810a3605>] ? mempool_alloc+0x62/0x119
 [<ffffffff811a5f32>] ? get_request+0x1a8/0x2b9
 [<ffffffff810cf2d0>] ? new_slab+0x1ae/0x1d5
 [<ffffffff811a6078>] ? get_request_wait+0x35/0x16b
 [<ffffffff811a9165>] ? ll_new_hw_segment+0x1a/0x6f
 [<ffffffff811a64c9>] ? __make_request+0x31b/0x420
 [<ffffffff811a4e1c>] ? generic_make_request+0x256/0x2a7
 [<ffffffff811a4f38>] ? submit_bio+0xcb/0xd4
 [<ffffffff810f5bef>] ? submit_bh+0xea/0x10d
 [<ffffffff810f8115>] ? __block_write_full_page+0x1c4/0x2a4
 [<ffffffff810fa5a2>] ? blkdev_get_block+0x0/0x63
 [<ffffffff810fa5a2>] ? blkdev_get_block+0x0/0x63
 [<ffffffff810f8272>] ? block_write_full_page+0x7d/0x86
 [<ffffffff810fbbfb>] ? blkdev_writepage+0x18/0x1a
 [<ffffffff810a7dcf>] ? __writepage+0x17/0x34
 [<ffffffff810a849d>] ? write_cache_pages+0x24c/0x3c9
 [<ffffffff810a7db8>] ? __writepage+0x0/0x34
 [<ffffffff810a863e>] ? generic_writepages+0x24/0x26
 [<ffffffff810a866d>] ? do_writepages+0x2d/0x3c
 [<ffffffff810f1983>] ? __writeback_single_inode+0x18d/0x2d5
 [<ffffffff8103ba85>] ? update_curr+0xf6/0xff
 [<ffffffff8103c293>] ? dequeue_entity+0x20/0x1b3
 [<ffffffff810f1ea9>] ? generic_sync_sb_inodes+0x218/0x36d
 [<ffffffff810f2211>] ? writeback_inodes+0xa9/0x103
 [<ffffffff810a87db>] ? wb_kupdate+0xa8/0x11e
 [<ffffffff810a951f>] ? pdflush+0x175/0x23a
 [<ffffffff810a8733>] ? wb_kupdate+0x0/0x11e
 [<ffffffff810a93aa>] ? pdflush+0x0/0x23a
 [<ffffffff810a93aa>] ? pdflush+0x0/0x23a
 [<ffffffff8105e881>] ? kthread+0x4d/0x78
 [<ffffffff810126ca>] ? child_rip+0xa/0x20
 [<ffffffff81011fe7>] ? restore_args+0x0/0x30
 [<ffffffff8105e834>] ? kthread+0x0/0x78
 [<ffffffff810126c0>] ? child_rip+0x0/0x20

Comment 1 erikj 2009-06-05 20:58:47 UTC
Interesting.  I just switched it from 8 cpus to 1 cpu, and then I don't hang
at mke2fs any more.

Comment 2 erikj 2009-06-05 21:01:57 UTC
If I don't get any ideas, I'll see what I can find by going ahead with this
installation (with a 1 cpu guest).

I'll switch it later to 8 cpus again.

I'll see if I can duplicate the mke2fs hang on the running/non-anaconda system.
If so, I'll update to the latest fedora11 kernels on the guest and see if it
persists.  If folks have better debugging ideas, let me know.

I guess, on the running system, I can also take a crash dump if that helps.

Comment 3 erikj 2009-06-05 21:30:05 UTC
Sorry, my last comment (#2) was a mistake.  In that pass, I mis-checked
something in custom partitioning so that the filesystem that was created
was tiny.  The creation of the tiny filesystem worked ok.  The 10gb ones
are not working.  I found out my mistake when anaconda told me I had more
packages selected then would fit.  Sorry about that.  Wish I could delete
Comment # 2 :)

Comment 4 Mark McLoughlin 2009-06-08 14:40:46 UTC
Very odd, haven't seen this and can't reproduce it

Things to try:

  1) Confirm for sure that it happens with UP guests, and stick with that
     config while trying to narrow it down

  2) Switch to testing with virt-install, maybe use kickstart too, it'll
     make testing easier

  3) Try final F11 media; also confirm that it works with F10 media

  4) Try with 'virt-install --disk pool=default,size=8,cache=none' - maybe
     try cache=writeback too, but we don't recommend that

  5) Add "console=ttyS0 vnc" to the guest command line when installing
     and then use "virsh console" to get to the guest console and see
     if there's anything unusual in the anaconda logs

Comment 5 erikj 2009-06-08 14:47:54 UTC
WRT #3 (Use real F-11 media) I think I have 2 days to wait for that, right?

I'll start using the other debugging ideas and hope to report something back
later today.  Thanks much.

Comment 6 erikj 2009-06-08 16:55:56 UTC
I have found that it doesn't hang forever.  It's just taking about 45
minutes to create the 8377MB filesystem.  When not using emulated devices
(not virtio) it's really fast.

I did do a run using the suggestion from #5.  I didn't notice anything abnormal
but I'm going to set aside the logs (in ~root) in case you think they are
noteworthy.

Comment 7 erikj 2009-06-08 17:00:20 UTC
update summary to remove "forever".  I just didn't imagine waiting 45
minutes would have resulted from getting past the mke2fs.  In the latest
tests, I'm using the default 512mb memory (instead of 2048) and UP.

Comment 8 Daniel Berrangé 2009-06-08 17:05:25 UTC
I wonder if this slowdown is possibly related to use of a sparse file backing the disk.There's a bug causing all disks to be sparse in F11 

https://bugzilla.redhat.com/show_bug.cgi?id=504605

I remember this made Xen PV catastrophically slow at I/O too, until the sparse file was fully allocated

Comment 9 Mark McLoughlin 2009-06-08 17:07:55 UTC
erikj: could you try pre-allocating the image with e.g. 'dd if=/dev/zero of=/var/lib/libvirt/images/foo.img bs=1M count=10240'

Comment 10 erikj 2009-06-08 18:15:16 UTC
(In reply to comment #9)
> erikj: could you try pre-allocating the image with e.g. 'dd if=/dev/zero
> of=/var/lib/libvirt/images/foo.img bs=1M count=10240'  

I followed this suggestion.

It seemed to FIX the issue.  I'd say the mke2fs operation only took 1 minute,
maybe.  That's still long-ish for a 10gb root but it's a lot better than
45!

This does make me wonder why, using the same fedora 11 host, that a sles11
guest with a similarly sized root filesystem and also using virtio (I forced
it) didn't seem to suffer from this issue.  But in any case, this suggestion
made a big difference.

Details of test setup:
 * New
 * Name: f11-test
 * Local install media
 * Use ISO image
 * OS Type Linux, Version Fedora 11
 * Kept defaults 512 MB, 1 CPU
 * Kept 'enable storage..' checked, kept 'Create disk image...' selected,
   Kept default disk size 8gb, kept 'allocate entire disk now' checked, forward
 * No advanced options adjusted (kept supplied mac address, virt type kvm, 
   x86_64 arch, finish
 * At this point, the virtual machine was going to start up in install mode.
 * I did a force-shutdown
 * Looked at current image:
   # ls -lh f11-test-1.img 
   -rw------- 1 root root 8.0G 2009-06-08 13:00 f11-test-1.img
 * Followed instructions from comment #9 using dd:
   # dd if=/dev/zero of=f11-test-1.img bs=1M count=10240
   10240+0 records in
   10240+0 records out
   10737418240 bytes (11 GB) copied, 293.755 s, 36.6 MB/s
 * re-connected the DVD iso as the emulated DVD for installation
 * Started up the virtual machine
 * Told bios to boot from DVD

Comment 11 Daniel Berrangé 2009-06-08 18:40:31 UTC
Updating the subject. Even if we fix virt-manager to honour requests for non-sparse disks, we still need to get a better understanding of why virtio-blk is soo negatively impacted here - it shouldn't be suffering worse than IDE emulation.

Comment 12 Mark McLoughlin 2009-06-09 07:20:58 UTC
Thanks Erik - the strange thing is that I cannot reproduce this with the same setup

Could you give us some more details on the host? /proc/cpuinfo, lspci, what filesystem the image file is on etc. ?

Comment 13 erikj 2009-06-09 13:56:04 UTC
The host machine is an SGI-branded system, an XE310.  This type of system is
a 1U system that has two systems sharing a single power supply.  I'm using
one half of it for these tests.

The mainboard is a Supermicro X7DBT.

Here is output from lsscsi.  The 2nd disk is being used for some virtio test
runs where the device is imported wholesale as a build work area.

[0:0:0:0]    disk    ATA      HDS725050KLA360  K2AO  /dev/sda 
[1:0:0:0]    disk    ATA      HDT722525DLA380  V44O  /dev/sdb 
[6:0:0:0]    cd/dvd  PepperC  Virtual Disc 1   0.01  /dev/sr1 
[7:0:0:0]    cd/dvd  PepperC  Virtual Disc 2   0.01  /dev/sr0 

The system has lots of roots configured and I'm just using one partition for
these tests.  This could be an important difference then.
The root partition -- where I'm also placing the images for the guests --
is 46G.  With two images in use (1 8gb sles11, 1 10 gb f11), there is 28G
available -- utilization is 41%.  Perhaps other test systems you have access
to have a single root using 250gb of space with 200 free or something.  Maybe
that could make a difference fragmentation; I'm not sure.

As I noted earlier, a SLES11 guest (virtio) doesn't seem to be as sensitive 
but I could try more experiments there -- especially now that more disk 
space is used -- if that would help.  As already noted by others, that could
be moot anyway if even ide emulated guests are doing better in this regard.

PS: I have access to other systems.  I recently installed F11 on one of my
roots on my desktop, a Dell optiplex 745 dual-core (core2 2.4ghz).  My noteook,
with core 2 2.8ghz processors, and many other test machine resources in the 
labs at the office.  I'm open to trying more hardware types if that would be
helpful.  I can also check with George Beshers; it's likely that the Westford
RH office has equipment like the one I'm using in this bug.

lspci:

00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 31)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev 31)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev 31)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev 31)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev 31)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01)
04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
08:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)


/proc/cpuinfo:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4654.73
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 1
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.03
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 1
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 5
initial apicid	: 5
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 4
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 5
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 1
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 6
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
stepping	: 7
cpu MHz		: 2333.331
cache size	: 4096 KB
physical id	: 1
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips	: 4655.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 14 erikj 2009-06-09 14:11:01 UTC
This just underscores the root disk utilization from my last add.

[root@nada1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda11             46G   18G   26G  42% /
tmpfs                 2.0G     0  2.0G   0% /dev/shm
attica.americas.sgi.com:/mirrors/redhat
                      3.7T  2.9T  811G  79% /data/mirrors/redhat
[root@nada1 ~]# parted /dev/sda print
Model: ATA HDS725050KLA360 (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type      File system  Flags
 1      32.3kB  16.5MB  16.4MB  primary   ext3              
 2      16.5MB  50.0GB  50.0GB  primary   xfs          boot 
 3      50.0GB  100GB   50.0GB  primary                     
 4      100GB   500GB   400GB   extended                    
 5      100GB   150GB   50.0GB  logical                     
 6      150GB   200GB   50.0GB  logical                     
 7      200GB   250GB   50.0GB  logical                     
 8      250GB   300GB   50.0GB  logical                     
 9      300GB   350GB   50.0GB  logical                     
10      350GB   400GB   50.0GB  logical                     
11      400GB   450GB   50.0GB  logical   ext3              
12      450GB   500GB   50.0GB  logical   linux-swap        

[root@nada1 ~]# ls -lh /var/lib/libvirt/images/
total 14G
-rw-------  1 root root 10G 2009-06-09 09:09 f11-test-1.img
-rw-------. 1 root root 10G 2009-06-05 09:41 g1-sles11-build.img

Comment 15 Mark McLoughlin 2009-06-19 07:50:02 UTC
(In reply to comment #10)

> This does make me wonder why, using the same fedora 11 host, that a sles11
> guest with a similarly sized root filesystem and also using virtio (I forced
> it) didn't seem to suffer from this issue.

This is probably the best lead we have so far. Perhaps you try the sles11 kernel in the f11 guest and, if that fixes the issue, build your own kernels and try and bisect the issue?

Comment 16 Richard W.M. Jones 2009-07-02 14:16:17 UTC
We can pretty reliably reproduce this bug using libguestfs.

Changing the disk interface from IDE to virtio causes mkfs
and zerofree operations (only) to take about 6 times longer.

The backing is also a sparse file, and the guest is UP, and
the cache=off in both cases.

Upstream discussion:
https://www.redhat.com/archives/fedora-virt/2009-July/thread.html#00028

Comment 17 Richard W.M. Jones 2009-07-02 14:26:40 UTC
For libguestfs at least, it's NOT sparseness which is
the problem.  I changed our test to use posix_fallocate
to create the files and still see the same slowdown.

Comment 18 Richard W.M. Jones 2009-07-02 14:46:06 UTC
I created another bug for the non-sparse case:
bug 509383

Comment 19 Richard W.M. Jones 2009-07-02 20:31:51 UTC
Can the reporter please try the suggestion here:

https://bugzilla.redhat.com/show_bug.cgi?id=509383#c3

to do this in the guest:

for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done

Comment 20 Mark McLoughlin 2009-07-03 13:20:34 UTC
(In reply to comment #19)

> for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done  

Erik, if that works for you, we should mark this bug as a dup of bug #509383

Comment 21 erikj 2009-07-08 20:55:42 UTC
Still trying to get system access.  Got some access today but they had BIOS redirection so I couldn't turn off HT.  I want to turn off HT so the
comparison is apples to apples.

We work-ordered a system that I should have access to in a week or so.

PS: George Beshers tell me SGI-branded Nehalem XE270 systems have arrived in
Westford and at least one will be in RHTS.

Comment 22 erikj 2009-07-09 01:27:46 UTC
Conclusion: enabling rotation as described made a huge difference. 
27 minutes to just over 2 minutes.                                 

Details:

HW: SGI XE270, 8gb host memory, 8 core, 2-socket, nehalem, Supermicro X8DTN
    Intel(R) Xeon(R) CPU X5570  @ 2.93GHz                                  

test.img (where I ran the mkfs.ext3 on) created like this on host:
qemu-img create -f raw test.img 10G                               


qemu command line:

/usr/bin/qemu-kvm -M pc -m 4096 -smp 8 -name f11-test -uuid b7b4b7e4-9c07-22aa-0c95-d5c8a24176c5 -monitor pty -pidfile /var/run/libvirt/qemu//f11-test.pid -drive file=/var/lib/libvirt/images/f11-test.img,if=virtio,index=0,boot=on -drive file=/dev/sdb,if=virtio,index=1 -drive file=/var/lib/libvirt/images/test.img,if=virtio,index=2 -net nic,macaddr=54:52:00:46:48:0e,model=virtio -net user -serial pty -parallel none -usb -usbdevice tablet -vnc cct201:1 -soundhw es1370 -redir tcp:5555::22                                                                       


test case commands
------------------
# parted -s /dev/vdc mklabel msdos
# parted -s /dev/vdc mkpart primary ext2 0 10.7GB
# time mkfs.ext3 /dev/vdc1                       


test: create filesystem on 10gb virtio disk image (test.img), no rotate
-----------------------------------------------------------------------
timing: trial 1                                                        
real    27m38.213s                                                     
user    0m0.008s                                                       
sys     1m24.040s                                                      

timing: trial 2 (re-used host image, re-created fs on re-used test.img)
real    0m8.038s                                                       
user    0m0.009s                                                       
sys     0m0.589s

timing: trial 3 (proving that I can reset the test...
                 shut down qemu, removed test.img, created it again)
real    28m4.008s
user    0m0.004s
sys     1m15.732s


So yes, we can reset the failure case in a reliable way.


test: create filesystem on 10gb virtio disk image (test.img), rotational
------------------------------------------------------------------------
** reset the test by shutting down qemu, removing test.img, and re-creating
   it.

** issued this command before performing test case commands in guest:
for f in /sys/block/vd*/queue/rotational; do echo 1 > $f; done

Then ran test case commands to create the partition table and filesystem.

timing: trial 1
real    2m14.717s
user    0m0.008s
sys     0m8.480s

timing: trial 2 (re-used host image, re-created fs on re-used test.img)
real    0m7.671s
user    0m0.005s
sys     0m0.618s

Comment 23 erikj 2009-07-09 01:29:11 UTC

*** This bug has been marked as a duplicate of bug 509383 ***