Bug 550755

Summary: Hypercall driver doesn't reset device on power-down
Product: Red Hat Enterprise Linux 5 Reporter: Dor Laor <dlaor>
Component: kvmAssignee: Tim Burke <tburke>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 5.4CC: bazulay, cpelland, khong, lihuang, summer, tburke, virt-maint, ykaul
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kvm-83-144.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:53:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 481840, 550247, 552528    
Attachments:
Description Flags
screen shot none

Description Dor Laor 2009-12-27 08:24:41 UTC
Description of problem:
The driver/device pair does not have a reset option to call when the driver is unloaded / system is rebooted.
As a result, there might be cases where the irq line will stay asserted and the system will consume 100% cpu

How reproducible:

run qemu with -vmchannel di:0200,tcp:0:4444,server and connect a telnet 0 4444 client with input redirection of a huge file .
Then, run winxp with the hypercall driver, reboot the system

 
Actual results:
The system won't be able to power down on the reboot.

Expected results:
Successful reboot

Additional info:

Comment 1 Yaniv Kaul 2009-12-27 09:16:34 UTC
QA_ACK for 5.5.
Probably should be cloned for 5.4.z, if we want it there as well.

Comment 2 Dor Laor 2009-12-30 10:44:15 UTC
*** Bug 536835 has been marked as a duplicate of this bug. ***

Comment 12 Keqin Hong 2010-01-18 09:39:47 UTC
(In reply to comment #0)
> Description of problem:
> The driver/device pair does not have a reset option to call when the driver is
> unloaded / system is rebooted.
> As a result, there might be cases where the irq line will stay asserted and the
> system will consume 100% cpu
> 
> How reproducible:
> 
> run qemu with -vmchannel di:0200,tcp:0:4444,server and connect a telnet 0 4444
> client with input redirection of a huge file .

Does it mean "#telnet 0 4444 < hugefile"?

Comment 13 Dor Laor 2010-01-18 10:09:49 UTC
Yap

Comment 14 Keqin Hong 2010-01-18 11:00:29 UTC
(In reply to comment #0)
...
> How reproducible:
> 
> run qemu with -vmchannel di:0200,tcp:0:4444,server and connect a telnet 0 4444
> client with input redirection of a huge file .
> Then, run winxp with the hypercall driver, reboot the system
> 

Can't reproduce, need to confirm whether my steps are right.

1. boot a winXP guest (Red Hat Hypercall Device already installed) with vmchannel option.  
#/usr/libexec/qemu-kvm -m 2048 -smp 2 -drive file=winXP-32.raw,if=ide,cache=off,boot=on -net nic,model=rtl8139,vlan=1,macaddr=DE:AD:BE:EF:17:27 -net tap,vlan=1,script=/etc/qemu-ifup -boot c -uuid c9bdbfde-dd54-4e9d-8fbe-17228cd33a08 -usbdevice tablet -no-hpet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -notify all -cpu qemu64,+sse2 -balloon none -vnc :1 -vmchannel di:0200,tcp:0:4444,server
2. connect through telnet.
$telnet $hostIP 4444 < RHEL5.4-Server-20090819.0-i386-DVD.iso
(also use "yes 'abc' | telnet $hostIP 4444" instead)
3. make sure Red Hat Hypercall Device is installed and enabled (check "Device Manager">"System Devices" inside winXP).
4. restart winXP by clicking buttons inside winXP.

After the above steps, Windows can be restarted normally normally on kvm-83-105.el5_4.13 and kvm-83-105.el5_4.9.

Comment 15 Keqin Hong 2010-01-18 11:35:22 UTC
Additional info accompanying comment 14:
there were lines of message as
"
vmchannel_read: error: got read during interrupt disabled
vmchannel_read: error: got read during interrupt disabled
vmchannel_read: error: got read during interrupt disabled
...
" 
shown in the qemu console on host after executing step 2 in comment 14.

the msg always appeared when during guest start-up and shutdown.

Comment 16 Dor Laor 2010-01-18 11:46:34 UTC
It shouldn't reboot on kvm-83-105.el5_4.13. Do you have a userspace daemon using the driver in the guest? It is installed when you install the driver using the msi installer. Once you write into the vmchannel it should consume lots of cpu in the guest

Comment 17 Keqin Hong 2010-01-18 12:01:22 UTC
(In reply to comment #16)
> It shouldn't reboot on kvm-83-105.el5_4.13. Do you have a userspace daemon
> using the driver in the guest? It is installed when you install the driver
> using the msi installer. Once you write into the vmchannel it should consume
> lots of cpu in the guest    

I believe I have. As I notice in the "Windows Task Manager" there is a 
guestVdsAgentService.exe (SYSTEM proc) which costs around 20% of the CPU Usage when writing into the vmchannel, and around 0% when I terminate the telnet client.

Comment 18 Keqin Hong 2010-01-18 12:13:43 UTC
Also checked under RHEL host: the qemu-kvm process running the winXP guest costed about 80% of CPU when writing into the vmchannel just as you described in comment 16.
BTW, did you notice comment 15? Is it something unusual? Thanks

Comment 19 Dor Laor 2010-01-18 12:35:25 UTC
I saw comment #15, it was common in the original code, the fix dos not have it.
Again, to understand, this is fixed in kvm-83-105.el5_4.15 before it should not reboot and get stuck

Comment 20 lihuang 2010-01-19 12:33:37 UTC
retest many times (*20) with khong's steps. still can not reproduce the bug in kvm-83-105.el5_4.13

also test in kvm-83-105.el5_4.19.

the only different I have observed is the mount of debug message: (vmchannel_read: error: got read during interrupt disabled)
on kvm-83-105.el5.4_13, tuns of debug message is printed when booting the guest or do operation in the guest.

on kvm-83-105.el5_4.19. I have reboot the guest X3 times, load the guest by iometer and cpuhog, the number of debug message is less then 10.

Dor,Can we verified the bug according to current test ? (can not reproduce in 20X reboot )

Comment 21 Dor Laor 2010-01-21 14:48:30 UTC
It is probably very rare thing, together with https://bugzilla.redhat.com/show_bug.cgi?id=553249 it was easy to reproduce because of the later bug.
I guess you shouldn't invest no more in reproducing it.

Comment 22 lihuang 2010-01-22 03:33:12 UTC
Created attachment 386084 [details]
screen shot

(In reply to comment #21)
> It is probably very rare thing, together with
> https://bugzilla.redhat.com/show_bug.cgi?id=553249 it was easy to reproduce
> because of the later bug.
> I guess you shouldn't invest no more in reproducing it.    

Can reproduce in kvm-83-105.el5_4.13 with spice 

CLI : 
[root@t199 t199]# ps -aef | grep kvm
root     29257  8655 94 22:01 pts/5    00:29:35 /usr/libexec/qemu-kvm -m 1024 -smp 1 -drive file=winXP-32-1.raw,if=ide -net nic,macaddr=DE:AD:BE:EF:99:01 -net tap -boot c -uuid 2f5bde3c-9721-4ff7-a41e-0ab4a28cea8c -usbdevice tablet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -notify all -balloon none -vmchannel di:0200,tcp:0:7891,server,nowait -drive file=/data/t199/test1,if=ide -name t99.2 -spice host=0,ic=on,port=5911,disable-ticketing -qxl 1 -soundhw ac97 -drive file=rhevm-guest-tools-2.1-39917.iso,media=cdrom

top - 22:32:55 up 2 days, 22:26, 11 users,  load average: 1.80, 1.75, 2.37
Tasks: 166 total,   2 running, 164 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 44.0%sy,  0.0%ni, 55.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7912828k total,  4141532k used,  3771296k free,   188764k buffers
Swap: 10482404k total,      168k used, 10482236k free,  3429208k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                     
29257 root      15   0 1366m 1.1g 899m S 100.0 14.4  29:31.04 qemu-kvm                                                                                                                   
 1885 root      11  -5     0    0    0 R 76.0  0.0   2689:39 kksmd                                                                                                                       
  494 root      10  -5     0    0    0 S  0.0  0.0   0:52.16 scsi_eh_1

Comment 23 Dor Laor 2010-01-24 14:07:20 UTC
So is it verified?

Comment 24 lihuang 2010-01-24 16:25:01 UTC
yes. also do the above testing on kvm-83-147.el5. PASS.

Comment 27 errata-xmlrpc 2010-03-30 07:53:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0271.html