Bug 555649

Summary: no response for operation on rhel3 during dd
Product: Red Hat Enterprise Linux 5 Reporter: Suqin Huang <shuang>
Component: kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4.zCC: ebachalo, ehabkost, gcosta, Jes.Sorensen, llim, mfranc, mkenneth, plyons, quintela, qzhang, tburke, virt-maint, xwei, ykaul
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 596989 (view as bug list) Environment:
Last Closed: 2011-11-03 13:31:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580948    
Attachments:
Description Flags
strace none

Description Suqin Huang 2010-01-15 04:45:00 UTC
Description of problem:
operation on the rhel3 guest can not get response after migration from rhel5.4.0 to rhel5.4.4 during dd (migration from kvm-83-105.el5_4.13 to kvm-83-105.el5_4.19), 
after around 40mins guest work well.

Version-Release number of selected component (if applicable):
host A: kvm-83-105.el5_4.13 
host B: kvm-83-105.el5_4.19

How reproducible:
100%

Steps to Reproduce:
1. boot guest with following command:
# /usr/libexec/qemu-kvm  -smp 2 -m 2G -drive file=/mnt/bak/rhel3.9-64-intel.bak,media=disk,if=ide,cache=off,index=0,serial=fb-bde1-8bcf10f72b98 -net nic,vlan=0,macaddr=00:65:4a:01:00:37,model=rtl8139 -net tap,vlan=0,script=/etc/qemu-ifup  -uuid `uuidgen` -no-hpet   -rtc-td-hack  -startdate now -cpu qemu64,+sse2  -monitor stdio -vnc :5 -name 3.9-32
2. run dd on guest
# dd if=/dev/zero of=test count=8000 bs=512k
3. Start migration
# /usr/libexec/qemu-kvm  -smp 2 -m 2G -drive file=/mnt/bak/rhel3.9-64-intel.bak,media=disk,if=ide,cache=off,index=0,serial=fb-bde1-8bcf10f72b98 -net nic,vlan=0,macaddr=00:65:4a:01:00:37,model=rtl8139 -net tap,vlan=0,script=/etc/qemu-ifup  -uuid `uuidgen` -no-hpet -usbdevice tablet  -rtc-td-hack  -startdate now -cpu qemu64,+sse2  -monitor stdio -vnc :5 -name 3.9-32 -incoming tcp:0:5995-M rhel5.4.0
  
Actual results:
mouse can be used, such as right-click, click Main Manu, but no response for the operation until around 40mins, and guest work normally after 40mins.

Expected results:


Additional info:

1. host A:
 rpm -qa | grep kvm
kvm-83-105.el5_4.13
etherboot-zroms-kvm-5.4.4-13.el5
kvm-qemu-img-83-105.el5_4.13
etherboot-roms-kvm-5.4.4-13.el5
kmod-kvm-83-105.el5_4.13
kvm-tools-83-105.el5_4.13
kvm-debuginfo-83-105.el5_4.13
[root@localhost ~]# uname -r
2.6.18-164.6.1.el5

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
stepping	: 4
cpu MHz		: 1600.000
cache size	: 8192 KB

2. host B:

[root@s145 ~]# uname -r
2.6.18-164.10.1.el5
[root@s145 ~]# rpm -qa | grep kvm
kvm-83-105.el5_4.19
kvm-qemu-img-83-105.el5_4.19
etherboot-zroms-kvm-5.4.4-13.el5
kmod-kvm-83-105.el5_4.19
kvm-tools-83-105.el5_4.19
etherboot-roms-kvm-5.4.4-13.el5
kvm-debuginfo-83-105.el5_4.19
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
stepping	: 4
cpu MHz		: 1600.000
cache size	: 8192 KB

3. top
12753 root      15   0 2262m 2.0g 2472 S  5.0 17.4   0:16.25 qemu-kvm           
12754 root      15   0 2262m 2.0g 2472 S  3.7 17.4   0:15.59 qemu-kvm           
12745 root      15   0 2262m 2.0g 2472 S  1.7 17.4   0:11.61 qemu-kvm           
12763 root      15   0 2262m 2.0g 2472 S  1.0 17.4   0:02.49 qemu-kvm           
12752 root      15   0 2262m 2.0g 2472 S  0.0 17.4   0:00.43 qemu-kvm           
12756 root      15   0 2262m 2.0g 2472 S  0.0 17.4   0:00.00 qemu-kvm 

4. strace (attached)

5. rhel4 and rhel5 guests work normally 

6. rhel3 guests work normally when migration during kvm-83-105.el5_4.19

Comment 1 Suqin Huang 2010-01-15 04:47:39 UTC
Created attachment 384517 [details]
strace

Comment 2 Dor Laor 2010-01-20 09:45:09 UTC
Is it always reproducible? from 5.4.0 to 5.4.4?
Is it always safe on 5.4.4->5.4.4?

Comment 3 Suqin Huang 2010-01-21 06:15:13 UTC
retest 5 times with 3.9-x86_64  2.4.21-50

can reproduce from 5.4.0 to 5.4.4 every time. 
1. operation get response occasionally (Click Main Manu, command "clear") (1/5)
2. no response for any operation (4/5)

also can reproduce from 5.4.4->5.4.4, the result is a little batter than 5.4.0 to 5.4.4. 
1. Can click Main Manu, issue command "ls", open new terminal, but no response when open app, such as openoffice. (3/5)
2. no response for any operation (1/5)
3. operation get response occasionally (1/5)

Comment 4 Juan Quintela 2010-01-28 00:47:56 UTC
I am using here:

Host A:
root@deus ~]# rpm -qa *kvm*
etherboot-zroms-kvm-5.4.4-10.el5
kvm-qemu-img-83-105.el5
kmod-kvm-83-105.el5
kvm-83-105.el5
[root@deus ~]# uname -a
Linux deus.mitica 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@deus ~]# 


Host B:
root@gnomo ~]# rpm -qa *kvm*
etherboot-zroms-kvm-5.4.4-10.el5
kmod-kvm-83-105.el5_4.19
kvm-83-105.el5_4.19
kvm-qemu-img-83-105.el5_4.13
[root@gnomo ~]# unam e-a
-bash: unam: command not found
[root@gnomo ~]# uname -a
Linux gnomo.mitica 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux


And I can do that test. I can repeat it several times.  I also did several translations on host B -> host B with all valid combinations of 5.4.0 <-> 5.4.4
as source and target.

My command line is:

usr/libexec/qemu-kvm -m 1024 -smp 1 -name rhel3.9X-32 -uuid 0b17c0fd-db6a-2817-8dfb-a4371dbae29b -no-kvm-pit-reinjection -monitor stdio -boot c -drive file=/mnt/images/images/rhel3.9-32X.img,if=ide,index=0,boot=on,cache=none -net nic,macaddr=54:52:00:23:b1:a7,vlan=0 -net tap,script=/etc/kvm-ifup,vlan=0,ifname=vnet0 -serial none -parallel none -usb -vnc :0 -k es -M rhel5.4.4 -cpu qemu64,+sse2 -incoming tcp:0:4444 [-rtc-td-hack]

I have tried with several combinations (normal CPU, no -rtc-td-hack, ...) all works for me.


Could you test that issue is fixed for you?
If it is not, could you test in a host that is not a Nehalem?  Only remaining difference is that I am using Core Duos.

vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping	: 6

Comment 5 Juan Quintela 2010-01-28 01:25:04 UTC
I read it again, and found the -smp 2 part.  Up works without any problem, but smp fails from time to time. investigating it.

Comment 6 Juan Quintela 2010-01-28 01:29:16 UTC
*** Bug 524761 has been marked as a duplicate of this bug. ***

Comment 7 Juan Quintela 2010-01-28 14:35:40 UTC
Updating host to 5.4.4 and using 5.4.4 -> 5.4.4 fails equally.
Same failures with 5.5.0 in 5.5.0 mode.

Comment 8 Juan Quintela 2010-01-28 14:51:19 UTC
This problem is not related with migration.  You can reproduce the problem without any migration involved.

Without smp, it works as expected.  An easy way to reproduce is to do:

- dd if=/dev/zero of=test count=8000 bs=512k
- while true; do date -u && sleep 1; done
- ping in other window

You will see that at times, the date and the ping stall for times as long as 30 seconds.  During that stalls, nothing works.  neithher keyboard, mouse, ssh sessions.

Comment 9 Suqin Huang 2010-02-01 10:59:31 UTC
I can reproduce the problem without migration. 

up kernel can not work well neither, no response when open oowriter, gedit, and other apps.

Comment 12 Glauber Costa 2010-05-27 21:12:15 UTC
Ok, I do have some updates on this, although not yet a fix.

The responsible for this behaviour is the cache=none flag.
It works perfectly for all other cache methods.

perf does not show any big hogs, so my current theory is that we're misreporting
something at the block layer.

Also, it is also present on RHEL6/upstream. I will clone this bug to reflect this.

Comment 13 Glauber Costa 2010-06-22 14:54:45 UTC
This is the same bug describe in rawhide's #563103.

It is a glibc bug. We should get the fix and backport it in glibc.

Comment 15 RHEL Program Management 2011-01-11 20:07:52 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 16 RHEL Program Management 2011-01-11 22:36:02 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 18 Xiaoqing Wei 2011-01-18 07:52:01 UTC
I have tried     Comment 8    sometimes can reproduce.

&& in my host,it's response,but very very very slow,up to 30 min for response a 
command.

I can reproduce this by the follow operations:

1) start the guest into level 3 by cmd in hostA:
 /usr/libexec/qemu-kvm  -smp 1 -m 2G -drive file=/media/live-migragion.qcow2,media=disk,if=ide,cache=none,index=0,serial=fb-bde1-8bcf10f72b98 -net nic,vlan=0,macaddr=00:65:4a:01:00:37,model=e1000 -net tap,vlan=0,script=/media/qemu-ifup-switch  -uuid `uuidgen` -no-hpet   -rtc-td-hack -startdate now -cpu qemu64,+sse2  -monitor stdio -vnc :0 -name 3.9  

in HostB for standby:
 /usr/libexec/qemu-kvm  -smp 1 -m 2G -drive file=/media/live-migragion.qcow2,media=disk,if=ide,cache=none,index=0,serial=fb-bde1-8bcf10f72b98 -net nic,vlan=0,macaddr=00:65:4a:01:00:37,model=e1000 -net tap,vlan=0,script=/media/qemu-ifup-switch  -uuid `uuidgen` -no-hpet   -rtc-td-hack -startdate now -cpu qemu64,+sse2  -monitor stdio -vnc :0 -name 3.9   -incoming tcp:0:4444

2) exec cmd in guest:
dd if=/dev/zero of=file.img count=100 bs=512

SSH to the guest by ssh root@guestIP

#the guest works normal and finish the operation.
dd if=/dev/zero of=file.img count=8000 bs=512
#while it dd ing ,do migration

3) migrate -d tcp:hostB:4444
send keys to guest from hostB's kvm monitor   
#sendkey alt-f2
#now the dd process is freeze in guest.
#ps aux|grep -i dd     can see it's status is D  

4)do cmd in guest's 2nd console " while true; do date -u && sleep 1; done"
it outputs normal, one line  per second.


5)get the 3rd console of guest by send key from host B:
sendkey alt-f3

#login to guest now can take as long as 5 min to response(just type root and press enter),
 
startx


it takes 35 min for fully start a gnome desktop.
while the gnome starting ,the SSH session still on,but stalls for several mins .
type a 'free -m' or 'top' in SSH takes 5 min for output.

Comment 21 Andreas Schwab 2011-11-02 16:54:31 UTC
RHEL-5 glibc does not provide preadv/pwritev, so I don't see how this can have anything to do with glibc.