Bug 601497

Summary: RHEL4 guest diskdump with ide emulated device too slow
Product: Red Hat Enterprise Linux 6 Reporter: Qian Cai <qcai>
Component: qemu-kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: clalance, cye, knoel, mkenneth, rwheeler, tburke, virt-maint
Target Milestone: rc   
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-24 10:25:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 524819, 580953    
Attachments:
Description Flags
guest xml none

Description Qian Cai 2010-06-08 05:39:31 UTC
Description of problem:
It took about a hour or half to dump 1G memory in a RHEL4.8 kvm guest ide based.

localhost.localdomain login: SysRq : Crashing the kernel by request
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
<ffffffff8023f54c>{sysrq_handle_crash+0}
PML4 396ff067 PGD 39923067 PMD 0 
Oops: 0002 [1] SMP 
CPU 1 
Modules linked in: ide_dump scsi_dump diskdump md5 ipv6 parport_pc lp parport netconsole netdump autofs4 sunrpc iptable_filter ip_tables ds yenta_socket pcmcia_core cpufreq_powersave zlib_deflate dm_mirror dm_mod button battery ac uhci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ne2k_pci 8390 floppy ext3 jbd virtio_blk virtio_pci virtio virtio_ring sd_mod scsi_mod
Pid: 4243, comm: bash Not tainted 2.6.9-89.ELsmp
RIP: 0010:[<ffffffff8023f54c>] <ffffffff8023f54c>{sysrq_handle_crash+0}
RSP: 0018:000001003aba9eb0  EFLAGS: 00010012
RAX: 000000000000001f RBX: ffffffff80414220 RCX: ffffffff803f66e8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: 0000000000000063 R08: ffffffff803f66e8 R09: ffffffff80414220
R10: 0000000100000000 R11: ffffffff8011f688 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000246
FS:  0000002a95aac6e0(0000) GS:ffffffff80504580(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000003f128000 CR4: 00000000000006e0
Process bash (pid: 4243, threadinfo 000001003aba8000, task 000001003aab17f0)
Stack: ffffffff8023f70f 0000000000000000 000001003aba8000 0000000000000002 
       000001003aba9f50 0000000000000002 0000002a98a01000 0000000000000000 
       ffffffff801b46e1 0000000000000048 
Call Trace:<ffffffff8023f70f>{__handle_sysrq+115} <ffffffff801b46e1>{write_sysrq_trigger+43} 
       <ffffffff8017c4ce>{vfs_write+207} <ffffffff8017c5b6>{sys_write+69} 
       <ffffffff801102f6>{system_call+126} 

Code: c6 04 25 00 00 00 00 00 c3 e9 b8 e3 f3 ff e9 49 32 f4 ff 48 
RIP <ffffffff8023f54c>{sysrq_handle_crash+0} RSP <000001003aba9eb0>
CR2: 0000000000000000
CPU frozen: #0
CPU#1 is executing diskdump.
start dumping to hda4
check dump partition...
dumping memory..                       
34620/262042    3091 ETA \ 

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.69.el6.x86_64
libvirt-0.8.1-7.el6.x86_64
kernel-2.6.32-33.el6.x86_64

How reproducible:
always around one or half a hour.

Steps to Reproduce:
1. setup a RHEL4.8 x86_64 guest on my Intel based x200 laptop.
2. setup diskdump and crash the guest.
  
Actual results:
Keep dumping for more than half a hour to dump 1G memory.

Expected results:
Should be quicker.

Comment 1 Qian Cai 2010-06-08 05:41:41 UTC
Created attachment 422036 [details]
guest xml

Comment 2 Qian Cai 2010-06-08 07:15:28 UTC
The vmcore is only around 300M, and it took around a hour to dump it...
# du -sh 127.0.0.1-2010-06-08-05\:37/vmcore
273M	127.0.0.1-2010-06-08-05:37/vmcore

Comment 3 Dor Laor 2010-06-09 12:24:03 UTC
Why is that a blocker? At least it works..

Comment 4 RHEL Program Management 2010-06-09 12:33:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 6 chellwig@redhat.com 2010-06-11 17:42:41 UTC
Is this the same timer/interrupt problem causing problem with all kdump scenarios under KVM?

Comment 7 chellwig@redhat.com 2010-06-23 08:42:41 UTC
Chris, could this be the IRQ routing issue you fixed upstream?

Comment 8 RHEL Program Management 2010-07-15 14:07:42 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 9 Dor Laor 2010-09-26 09:27:58 UTC
CAI, please answer the above questions

Comment 10 Qian Cai 2010-09-26 09:48:37 UTC
I am not sure. I'll need to re-test it on the latest RHEL6 bits to make sure.

Comment 11 Chao Ye 2010-09-28 06:37:29 UTC
I tested with RHEL6-RC4, and installed RHEL4-U8-AS. And IDE disk was used as dump device.
=====================================================================
First time I used hda1 as dump target, size is around 10GB.
[root@dhcp70-103 ~]# du -sh /var/crash/*
206M	/var/crash/127.0.0.1-2010-09-27-21:37 ===>Took around 30mins.
158M	/var/crash/127.0.0.1-2010-09-27-22:22 ===>Took around 8mins.
401M	/var/crash/127.0.0.1-2010-09-27-22:34 ===>Took around 8mins.

Second time I resized hda1, new size is around 1GB. The process became more quick.
[root@dhcp70-103 ~]# du -sh /var/crash/*
...
492M	/var/crash/127.0.0.1-2010-09-27-22:53 ===>Took around 5mins.
494M	/var/crash/127.0.0.1-2010-09-27-23:04 ===>Took around 5mins.
495M	/var/crash/127.0.0.1-2010-09-27-23:21 ===>Took around 6mins.

Third time I used a new dump target, size is around 10GB.
[root@dhcp70-103 ~]# du -sh /var/crash/*
...
499M	/var/crash/127.0.0.1-2010-09-28-02:20 ===>Took around 10mins.

Seems the performance is not very stable.

Comment 12 Dor Laor 2010-09-28 14:35:46 UTC
The question is if the above is reasonable or too slow.
I rather optimize virtio-blk instead of investing time in ide.
Can you please measure virtio time? Also, where is the bottle neck - cpu on the guest/host? IO? kvm_stat data?

Comment 13 Qian Cai 2010-09-28 15:03:16 UTC
There is a bug prevent diskdump working with virtio - BZ#601491.