Bug 858531

Summary: guest fail to s3/s4 with virtio-rng driver if the /dev/hwrng open for reading
Product: Red Hat Enterprise Linux 6 Reporter: juzhang <juzhang>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: amit.shah, flang, juzhang, mdeng, michen, nhorman, qzhang, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 917960 (view as bug list) Environment:
Last Closed: 2014-08-12 11:27:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 912287, 917960    
Attachments:
Description Flags
the guest full dmesg
none
after second time S3/S4 ,guest show
none
after second time S3/S4 ,guest show none

Description juzhang 2012-09-19 03:14:15 UTC
Description of problem:
Boot guest with virtio-rng driver, the open /dev/hwrng for reading, in the meantime, do s3/s4. guest fail to s3/s4 with call trace.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.299.el6.bz786407.x86_64
kernel(host&guest)
#uname -r
2.6.32-304.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot guest with enable virtio-rng
#/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu SandyBridge,-kvmclock -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.4 -vnc :10 -k en-us -rtc base=localtime,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/home/RHEL-Server-6.3-64-sluo-copy.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=9a:7d:6b:2e:28:f8,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -boot c -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtio-rng-pci,chardev=foo,id=virtio-rng -bios /usr/share/seabios/bios-pm.bin

2.on host
#nc -U /tmp/foo < /dev/urandom

3. in guest
# cat /sys/devices/virtual/misc/hw_random/rng_current
virtio
# cat /sys/devices/virtual/misc/hw_random/rng_available
virtio
#rngd -r /dev/hwrng
#cat -r /dev/hwrng
output is nothing

4.cat /dev/random
...
5 during the step4.
do s4 #pm-hibernate
  
Actual results:
Failed to s3/s4 with call trace.

--snip of guest kernel call trace--
Call Trace:
[<ffffffff8131fc8a>] ? misc_open+0x1ca/0x320
[<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
[<ffffffff8117ef50>] ? chrdev_open+0x0/0x230
[<ffffffff811789bf>] ? __dentry_open+0x23f/0x360
[<ffffffff8121c262>] ? selinux_inode_permission+0x72/0xb0
[<ffffffff8121429f>] ? security_inode_permission+0x1f/0x30
[<ffffffff814fe323>] wait_for_common+0x123/0x180
[<ffffffff81060250>] ? default_wake_function+0x0/0x20
[<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
[<ffffffffa0149104>] virtio_data_present+0x24/0x40 [virtio_rng]
[<ffffffff81332621>] rng_dev_read+0x81/0x1b0
[<ffffffff81213136>] ? security_file_permission+0x16/0x20
[<ffffffff8117b8b5>] vfs_read+0xb5/0x1a0
[<ffffffff8117b9f1>] sys_read+0x51/0x90
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Expected results:
s3/s4 works.

Additional info:
For the full call trace, please have a look the attachment.

Comment 1 juzhang 2012-09-19 03:21:56 UTC
Created attachment 614190 [details]
the guest full dmesg

Comment 2 langfang 2012-09-19 04:19:53 UTC
test this as follow version:

host:
#rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.299.el6.bz786407.x86_64
#uname -r
2.6.32-305.el6.x86_64

guest:
2.6.32-303.el6.x86_64



steps as same as Discription

results:
not hit call trace,but when after first time do S3/S4 successfully ,can not do S3/S4 second time ,third time .



guest :
[root@localhost ~]# echo mem >/sys/power/state---->first time s3 successfully
[root@localhost ~]# echo disk >/sys/power/state--->first time s4 successfully
[root@localhost ~]# echo disk >/sys/power/state---->second time S4
bash: echo: write error: Device or resource busy
[root@localhost ~]# echo disk >/sys/power/state---->third time s4
bash: echo: write error: Device or resource busy
[root@localhost ~]#  echo disk >/sys/power/state

Message from syslogd@localhost at Sep 18 07:58:08 ...
 kernel:BUG: soft lockup - CPU#1 stuck for 64s! [bash:2507]
bash: echo: write error: Device or resource busy
[root@localhost ~]# bash: echo: write error: Device or resource busy
bash: bash:: command not found
[root@localhost ~]# echo mem >/sys/power/state---->second time s3
bash: echo: write error: Device or resource busy

addinfo attachment is when can not do S3/s4,the guest show

Comment 3 langfang 2012-09-19 04:21:54 UTC
Created attachment 614195 [details]
after second time S3/S4 ,guest show

Comment 4 langfang 2012-09-19 04:22:50 UTC
Created attachment 614196 [details]
after second time S3/S4 ,guest show

Comment 5 Amit Shah 2013-02-21 11:27:12 UTC
Screenshots say rngd refuses to freeze.

Comment 6 Amit Shah 2013-04-16 10:45:52 UTC
reassigning to rng-tools, please check why rngd doesn't freeze.

Comment 7 RHEL Program Management 2013-10-14 00:26:50 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 8 Neil Horman 2014-08-07 12:16:42 UTC
This really has nothing to do with rng-tools, it appears that rngd isn't freezing because its stuck in a state where we're getting woken up from a wait_for_completion_killable call in virtio_data_present, but the return code isn't ERESTARTSYS in wait_for_common, so the virtio code falls through its return check, then looks to see how much more data it needs to pull from the random device.  Since this test puts such a drain on the entropy pools we just go straight back to waiting on the completion event.  juzhang, do you have this set up somewhere where I can try to do some more investigation on it already, or do I need to set something up?

Comment 9 juzhang 2014-08-11 09:29:49 UTC
Hi Qzhang,

Could you give a help for handling this issue?

Best Regards,
Junyi

Comment 10 Qunfang Zhang 2014-08-12 02:36:12 UTC
(In reply to Neil Horman from comment #8)
> This really has nothing to do with rng-tools, it appears that rngd isn't
> freezing because its stuck in a state where we're getting woken up from a
> wait_for_completion_killable call in virtio_data_present, but the return
> code isn't ERESTARTSYS in wait_for_common, so the virtio code falls through
> its return check, then looks to see how much more data it needs to pull from
> the random device.  Since this test puts such a drain on the entropy pools
> we just go straight back to waiting on the completion event.  juzhang, do
> you have this set up somewhere where I can try to do some more investigation
> on it already, or do I need to set something up?

Hi, Neil

We could setup the environment for you to debug, but I would like to confirm with you first, do we need to fix such issue in RHEL6? Since RHEL6 s3/s4 will not be official supported any more and Ademar closed most of the s3/s4 bug (refer to Bug 912287). We will only keep some basic S3/S4 case in our test plan. If you think we need to fix this bug or at least need to investigate first, then I will prepare the setup. :)

Thanks,
Qunfang

Comment 11 Neil Horman 2014-08-12 11:27:47 UTC
Oh, thats a good point, I had not realized that we don't support s3/s4 in virt environments.  If thats the case, then no, we probably don't need to fix this in RHEL6.  That said, you do probably want to test it on RHEL7 where we do support it, as this is still likely a bug.

Comment 12 Qunfang Zhang 2014-08-13 01:30:53 UTC
(In reply to Neil Horman from comment #11)
> Oh, thats a good point, I had not realized that we don't support s3/s4 in
> virt environments.  If thats the case, then no, we probably don't need to
> fix this in RHEL6.  That said, you do probably want to test it on RHEL7
> where we do support it, as this is still likely a bug.

Hi, Junyi

Could you help let someone to try it on RHEL7 host to see whether the problem exists on RHEL7 host? 

Thanks,
Qunfang

Comment 13 juzhang 2014-08-13 01:49:22 UTC
(In reply to Qunfang Zhang from comment #12)
> (In reply to Neil Horman from comment #11)
> > Oh, thats a good point, I had not realized that we don't support s3/s4 in
> > virt environments.  If thats the case, then no, we probably don't need to
> > fix this in RHEL6.  That said, you do probably want to test it on RHEL7
> > where we do support it, as this is still likely a bug.
> 
> Hi, Junyi
> 
> Could you help let someone to try it on RHEL7 host to see whether the
> problem exists on RHEL7 host? 
> 
> Thanks,
> Qunfang

Sure.

Hi Xiangchun,

Could you have a try then update the result in the bz?

Best Regards,
Junyi