Bug 844583

Summary:	s3/s4 support for virtio-rng driver
Product:	Red Hat Enterprise Linux 6	Reporter:	Amit Shah <amit.shah>
Component:	kernel	Assignee:	Amit Shah <amit.shah>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.4	CC:	flang, juzhang, lnovich, michen, qzhang, rhod, shuang, virt-maint
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-2.6.32-298.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-21 06:44:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	761491, 840816

Description Amit Shah 2012-07-31 06:27:38 UTC

Description of problem:

The virtio rng driver doesn't delete vq on s3/s4 and restore it upon resume.  This can cause the 'guest moved used index from xxx to xxx' error and guest abort.

Upstream commits 178d855e7810deecb7fa96afdf82ec45b0284233 and 0bc1a2ef19b45bb23617b203bc631b44609f17ba fix the issue.

For testing this, the virtio-rng device isn't yet available upstream or in a RHEL release.  I can give a qemu-kvm build to QE for testing and verification purposes, let me know when you need it.

Comment 1 RHEL Program Management 2012-08-06 10:11:24 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 2 juzhang 2012-08-09 02:50:30 UTC

marked ack+, qe tried virtio rng build.

Comment 4 Jarod Wilson 2012-08-16 15:01:26 UTC

Patch(es) available on kernel-2.6.32-298.el6

Comment 7 Qunfang Zhang 2012-12-28 07:22:15 UTC

Test this with both rhel6.3 (kernel-2.6.32-279.el6.x86_64) and rhel6.4 (kernel-2.6.32-348.el6.x86_64) guest kernel. But both can not reproduce the 'guest moved used index from xxx to xxx' error after more than 20 times S3 and S4 alternately.

Steps:
1. build the upstream qemu-kvm as the current rhel6 and rhel7 have no virtio-rng-pci device.

2. Boot guest with "-device virtio-rng-pci,bus=pci.0,addr=0xa,id=rng0".
CLI:
#  ./x86_64-softmmu/qemu-system-x86_64  -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name t2-rhel6.4-32 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/home/rhel6.4-64-virtio-backup.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=disk0,id=disk0  -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,vhost=on,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c   -global  PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0  -device virtio-rng-pci,bus=pci.0,addr=0xa,id=rng0

3. #cat /dev/hwrng (inside guest)

4. Do s3 and s4 for many times. 

Result: Guest can do S3 and S4 correctly and resume back.
(For rhel6.3, there's call trace log in dmesg after resume, i remember it was an old 6.3 issue. And for rhel6.4, there's no such error or call trace in dmesg.)

Hi, Amit
Is there something wrong with my steps? Because actually I can not reproduce the issue for both old and latest rhel6 kernel. Or the 'guest moved used index from xxx to xxx' error is just in theory?


Thanks,
Qunfang

Comment 8 Amit Shah 2013-01-02 11:54:06 UTC

To test, first transfer some data from host->guest (keep 'cat /dev/hwrng' going on).  After some data is read by the guest (i.e. when some data is shown in the terminal), put the guest in sleep states.  Starting a new cat process after resume (or letting the old one continue) should cause the 'guest moved ...' message.

Comment 10 Amit Shah 2013-01-09 19:32:21 UTC

If s3/s4 works fine, we can go ahead and mark this verified.  I don't exactly remember the errors that could happen before the fix now, but if

1. cat /dev/hwrng
2. s3/s4
3. resume
4. data is returned from /dev/hwrng

and

1. cat /dev/hwrng
2. s3/s4
3. resume
4. ^C

both work fine, then let's mark this verified.

Comment 11 Qunfang Zhang 2013-01-10 03:10:38 UTC

OK, thanks for the confirmation. Verify it then as guest S3/resume and S4/resume  work fine after a repeat loop of implementation.

Comment 13 errata-xmlrpc 2013-02-21 06:44:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html