Bug 550755 - Hypercall driver doesn't reset device on power-down
Summary: Hypercall driver doesn't reset device on power-down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Tim Burke
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 536835 (view as bug list)
Depends On:
Blocks: 481840 550247 552528
TreeView+ depends on / blocked
 
Reported: 2009-12-27 08:24 UTC by Dor Laor
Modified: 2013-01-09 21:17 UTC (History)
8 users (show)

Fixed In Version: kvm-83-144.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:53:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screen shot (15.21 KB, image/png)
2010-01-22 03:33 UTC, lihuang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0271 0 normal SHIPPED_LIVE Important: kvm security, bug fix and enhancement update 2010-03-29 13:19:48 UTC

Description Dor Laor 2009-12-27 08:24:41 UTC
Description of problem:
The driver/device pair does not have a reset option to call when the driver is unloaded / system is rebooted.
As a result, there might be cases where the irq line will stay asserted and the system will consume 100% cpu

How reproducible:

run qemu with -vmchannel di:0200,tcp:0:4444,server and connect a telnet 0 4444 client with input redirection of a huge file .
Then, run winxp with the hypercall driver, reboot the system

 
Actual results:
The system won't be able to power down on the reboot.

Expected results:
Successful reboot

Additional info:

Comment 1 Yaniv Kaul 2009-12-27 09:16:34 UTC
QA_ACK for 5.5.
Probably should be cloned for 5.4.z, if we want it there as well.

Comment 2 Dor Laor 2009-12-30 10:44:15 UTC
*** Bug 536835 has been marked as a duplicate of this bug. ***

Comment 12 Keqin Hong 2010-01-18 09:39:47 UTC
(In reply to comment #0)
> Description of problem:
> The driver/device pair does not have a reset option to call when the driver is
> unloaded / system is rebooted.
> As a result, there might be cases where the irq line will stay asserted and the
> system will consume 100% cpu
> 
> How reproducible:
> 
> run qemu with -vmchannel di:0200,tcp:0:4444,server and connect a telnet 0 4444
> client with input redirection of a huge file .

Does it mean "#telnet 0 4444 < hugefile"?

Comment 13 Dor Laor 2010-01-18 10:09:49 UTC
Yap

Comment 14 Keqin Hong 2010-01-18 11:00:29 UTC
(In reply to comment #0)
...
> How reproducible:
> 
> run qemu with -vmchannel di:0200,tcp:0:4444,server and connect a telnet 0 4444
> client with input redirection of a huge file .
> Then, run winxp with the hypercall driver, reboot the system
> 

Can't reproduce, need to confirm whether my steps are right.

1. boot a winXP guest (Red Hat Hypercall Device already installed) with vmchannel option.  
#/usr/libexec/qemu-kvm -m 2048 -smp 2 -drive file=winXP-32.raw,if=ide,cache=off,boot=on -net nic,model=rtl8139,vlan=1,macaddr=DE:AD:BE:EF:17:27 -net tap,vlan=1,script=/etc/qemu-ifup -boot c -uuid c9bdbfde-dd54-4e9d-8fbe-17228cd33a08 -usbdevice tablet -no-hpet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -notify all -cpu qemu64,+sse2 -balloon none -vnc :1 -vmchannel di:0200,tcp:0:4444,server
2. connect through telnet.
$telnet $hostIP 4444 < RHEL5.4-Server-20090819.0-i386-DVD.iso
(also use "yes 'abc' | telnet $hostIP 4444" instead)
3. make sure Red Hat Hypercall Device is installed and enabled (check "Device Manager">"System Devices" inside winXP).
4. restart winXP by clicking buttons inside winXP.

After the above steps, Windows can be restarted normally normally on kvm-83-105.el5_4.13 and kvm-83-105.el5_4.9.

Comment 15 Keqin Hong 2010-01-18 11:35:22 UTC
Additional info accompanying comment 14:
there were lines of message as
"
vmchannel_read: error: got read during interrupt disabled
vmchannel_read: error: got read during interrupt disabled
vmchannel_read: error: got read during interrupt disabled
...
" 
shown in the qemu console on host after executing step 2 in comment 14.

the msg always appeared when during guest start-up and shutdown.

Comment 16 Dor Laor 2010-01-18 11:46:34 UTC
It shouldn't reboot on kvm-83-105.el5_4.13. Do you have a userspace daemon using the driver in the guest? It is installed when you install the driver using the msi installer. Once you write into the vmchannel it should consume lots of cpu in the guest

Comment 17 Keqin Hong 2010-01-18 12:01:22 UTC
(In reply to comment #16)
> It shouldn't reboot on kvm-83-105.el5_4.13. Do you have a userspace daemon
> using the driver in the guest? It is installed when you install the driver
> using the msi installer. Once you write into the vmchannel it should consume
> lots of cpu in the guest    

I believe I have. As I notice in the "Windows Task Manager" there is a 
guestVdsAgentService.exe (SYSTEM proc) which costs around 20% of the CPU Usage when writing into the vmchannel, and around 0% when I terminate the telnet client.

Comment 18 Keqin Hong 2010-01-18 12:13:43 UTC
Also checked under RHEL host: the qemu-kvm process running the winXP guest costed about 80% of CPU when writing into the vmchannel just as you described in comment 16.
BTW, did you notice comment 15? Is it something unusual? Thanks

Comment 19 Dor Laor 2010-01-18 12:35:25 UTC
I saw comment #15, it was common in the original code, the fix dos not have it.
Again, to understand, this is fixed in kvm-83-105.el5_4.15 before it should not reboot and get stuck

Comment 20 lihuang 2010-01-19 12:33:37 UTC
retest many times (*20) with khong's steps. still can not reproduce the bug in kvm-83-105.el5_4.13

also test in kvm-83-105.el5_4.19.

the only different I have observed is the mount of debug message: (vmchannel_read: error: got read during interrupt disabled)
on kvm-83-105.el5.4_13, tuns of debug message is printed when booting the guest or do operation in the guest.

on kvm-83-105.el5_4.19. I have reboot the guest X3 times, load the guest by iometer and cpuhog, the number of debug message is less then 10.

Dor,Can we verified the bug according to current test ? (can not reproduce in 20X reboot )

Comment 21 Dor Laor 2010-01-21 14:48:30 UTC
It is probably very rare thing, together with https://bugzilla.redhat.com/show_bug.cgi?id=553249 it was easy to reproduce because of the later bug.
I guess you shouldn't invest no more in reproducing it.

Comment 22 lihuang 2010-01-22 03:33:12 UTC
Created attachment 386084 [details]
screen shot

(In reply to comment #21)
> It is probably very rare thing, together with
> https://bugzilla.redhat.com/show_bug.cgi?id=553249 it was easy to reproduce
> because of the later bug.
> I guess you shouldn't invest no more in reproducing it.    

Can reproduce in kvm-83-105.el5_4.13 with spice 

CLI : 
[root@t199 t199]# ps -aef | grep kvm
root     29257  8655 94 22:01 pts/5    00:29:35 /usr/libexec/qemu-kvm -m 1024 -smp 1 -drive file=winXP-32-1.raw,if=ide -net nic,macaddr=DE:AD:BE:EF:99:01 -net tap -boot c -uuid 2f5bde3c-9721-4ff7-a41e-0ab4a28cea8c -usbdevice tablet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -notify all -balloon none -vmchannel di:0200,tcp:0:7891,server,nowait -drive file=/data/t199/test1,if=ide -name t99.2 -spice host=0,ic=on,port=5911,disable-ticketing -qxl 1 -soundhw ac97 -drive file=rhevm-guest-tools-2.1-39917.iso,media=cdrom

top - 22:32:55 up 2 days, 22:26, 11 users,  load average: 1.80, 1.75, 2.37
Tasks: 166 total,   2 running, 164 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 44.0%sy,  0.0%ni, 55.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7912828k total,  4141532k used,  3771296k free,   188764k buffers
Swap: 10482404k total,      168k used, 10482236k free,  3429208k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                     
29257 root      15   0 1366m 1.1g 899m S 100.0 14.4  29:31.04 qemu-kvm                                                                                                                   
 1885 root      11  -5     0    0    0 R 76.0  0.0   2689:39 kksmd                                                                                                                       
  494 root      10  -5     0    0    0 S  0.0  0.0   0:52.16 scsi_eh_1

Comment 23 Dor Laor 2010-01-24 14:07:20 UTC
So is it verified?

Comment 24 lihuang 2010-01-24 16:25:01 UTC
yes. also do the above testing on kvm-83-147.el5. PASS.

Comment 27 errata-xmlrpc 2010-03-30 07:53:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0271.html


Note You need to log in before you can comment on or make changes to this bug.