RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1175968 - Guest hang(host cpu about 100%) after live throttle the guest(ide disk)
Summary: Guest hang(host cpu about 100%) after live throttle the guest(ide disk)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: John Snow
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1250747
TreeView+ depends on / blocked
 
Reported: 2014-12-19 01:57 UTC by langfang
Modified: 2015-08-05 21:49 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1250747 (view as bug list)
Environment:
Last Closed: 2015-08-05 21:49:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description langfang 2014-12-19 01:57:44 UTC
Description of problem:
Guest hang(host cpu about 100%) after  live throttle the guest(ide disk)

Version-Release number of selected component (if applicable):

Host:
# uname -r
3.10.0-217.el7.x86_64
# rpm -q qemu-kvm
qemu-kvm-1.5.3-84.el7.x86_64

Guest:
3.10.0-217.el7.x86_64

How reproducible:

100%


Steps to Reproduce:
1.Boot guest with  all block interface 
...
-drive file=/home/ide-img.qcow2,if=none,id=drive-ide-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device ide-drive,drive=drive-ide-disk,id=ide-disk,bootindex=1 -drive file=/home/sisi-disk-img.qcow2,if=none,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0,addr=0x7,num_queues=4 -device scsi-disk,drive=drive-scsi-disk,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk -drive file=/home/cdrom_scsi.qcow2,if=none,media=cdrom,readonly=on,format=qcow2,id=cdrom1 -device scsi-cd,bus=scsi0.0,drive=cdrom1,id=scsi0-0 -device usb-ehci,id=ehci -drive file=/home/usb.qcow2,if=none,id=drive-usb-2-0,media=disk,format=qcow2,cache=none -device usb-storage,drive=drive-usb-2-0,id=usb-0-0,removable=on,bus=ehci.0,port=1 -drive file=/home/virtio-blk.qcow2,format=qcow2,if=none,id=block-virtio,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x8,drive=block-virtio,id=block-virtio -chardev socket,id=charserial0,path=/tmp/qzhang-test,server,nowait -device isa-serial,chardev=charserial0,id=serial0...

2. live throttle the guest's bps_rd and iops
{ "execute": "block_set_io_throttle", "arguments": { "device": "drive-ide-disk","bps": 0,"bps_rd": 10000,"bps_wr": 0,"iops": 10000,"iops_rd": 0,"iops_wr": 0 } }

(qemu) block_set_io_throttle drive-ide-disk 0 10000 0 10000 0 0

3.Do fio in guest
# fio --filename=/dev/sdc --direct=1 --rw=read --bs=1M --size=10M --name=test --iodepth=1 
test: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=1
fio-2.1.10
Starting 1 process
Jobs: 1 (f=1): [R] [inf% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 1158050441d:06h:59m:45s]   


Actual results:
On host:
#nc -U /tmp/flang-test
[   70.112398] systemd-journald[481]: Received request to flush runtime journal from PID 1
[root@dhcp-8-112 ~]# [  214.762388] ata8.00: exception Emask 0x0 SAct 0x7e000000 SErr 0x0 action 0x6 frozen
[  214.763416] ata8.00: failed command: READ FPDMA QUEUED
[  214.763880] ata8.00: cmd 60/00:c8:00:02:00/01:00:00:00:00/40 tag 25 ncq 131072 in
[  214.763880]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  214.765305] ata8.00: status: { DRDY }
[  214.765625] ata8.00: failed command: READ FPDMA QUEUED
[  214.766140] ata8.00: cmd 60/00:d0:00:03:00/01:00:00:00:00/40 tag 26 ncq 131072 in
[  214.766140]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  214.767523] ata8.00: status: { DRDY }
[  214.767842] ata8.00: failed command: READ FPDMA QUEUED
[  214.768363] ata8.00: cmd 60/00:d8:00:04:00/01:00:00:00:00/40 tag 27 ncq 131072 in
[  214.768363]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  214.769733] ata8.00: status: { DRDY }
[  214.770076] ata8.00: failed command: READ FPDMA QUEUED
[  214.770509] ata8.00: cmd 60/00:e0:00:05:00/01:00:00:00:00/40 tag 28 ncq 131072 in
[  214.770509]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  214.771914] ata8.00: status: { DRDY }
[  214.772260] ata8.00: failed command: READ FPDMA QUEUED
[  214.772695] ata8.00: cmd 60/00:e8:00:06:00/01:00:00:00:00/40 tag 29 ncq 131072 in
[  214.772695]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  214.774079] ata8.00: status: { DRDY }
[  214.774391] ata8.00: failed command: READ FPDMA QUEUED
[  214.774827] ata8.00: cmd 60/00:f0:00:07:00/01:00:00:00:00/40 tag 30 ncq 131072 in
[  214.774827]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  214.776243] ata8.00: status: { DRDY }

Host:
#top  -->about 100%

And the guest can not do any operation.

Expected results:

If this is not permit for such test,should give the error info. Otherwise,should work well

Additional info:

1)MY CLI:
 /usr/libexec/qemu-kvm -enable-kvm -m 4G --name VM -sandbox on -uuid d36fd1ad-3d8e-4e15-be0a-6bfd30fbfd4e -nodefaults -rtc base=utc -device ahci,id=ahci0 -drive file=/home/rhel7latestcp1.qcow2,if=none,id=drive-sata-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,serial=QEMU-DISK2,iops=100,snapshot=on -device ide-drive,drive=drive-sata-disk,bus=ahci0.0,id=sata-disk,bootindex=0 -monitor stdio -boot menu=on,order=d -qmp tcp:0:4447,server,nowait -vnc :10 -vga std -netdev tap,id=hostnet0,fd=5  5<>/dev/tap5 -device virtio-net-pci,netdev=hostnet0,vectors=32,mq=on,id=virtio-net-pci0,mac=74:46:a0:8e:81:d9,bus=pci.0,addr=0x5 -drive file=/home/ide-img.qcow2,if=none,id=drive-ide-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device ide-drive,drive=drive-ide-disk,id=ide-disk,bootindex=1 -drive file=/home/sisi-disk-img.qcow2,if=none,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0,addr=0x7,num_queues=4 -device scsi-disk,drive=drive-scsi-disk,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk -drive file=/home/cdrom_scsi.qcow2,if=none,media=cdrom,readonly=on,format=qcow2,id=cdrom1 -device scsi-cd,bus=scsi0.0,drive=cdrom1,id=scsi0-0 -device usb-ehci,id=ehci -drive file=/home/usb.qcow2,if=none,id=drive-usb-2-0,media=disk,format=qcow2,cache=none -device usb-storage,drive=drive-usb-2-0,id=usb-0-0,removable=on,bus=ehci.0,port=1 -drive file=/home/virtio-blk.qcow2,format=qcow2,if=none,id=block-virtio,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x8,drive=block-virtio,id=block-virtio -chardev socket,id=charserial0,path=/tmp/qzhang-test,server,nowait -device isa-serial,chardev=charserial0,id=serial0

2)I hit guest call trace one time, but unluckly,not collect the info, if hit again, i will paste the log.

Comment 4 John Snow 2015-06-23 22:33:54 UTC
First, Q35 and by extension AHCI and NCQ are not supported in RHEL7 anymore (They used to be tech preview, but now they're RHEV only goodies) so this bug is non applicable here. Moving it to RHEV.

Second, I'm working on NCQ upstream at the moment (this appears to be an NCQ bug) so I will look at this while I test for NCQ feature completeness.

Comment 5 John Snow 2015-07-17 20:59:56 UTC
Misread a little; it doesn't require AHCI. This is valid. Actually managed to trigger some assertions with this one. We should probably guard against that...

Comment 6 John Snow 2015-08-04 15:56:14 UTC
Okay, two things:

(1) It's difficult to know what the "minimum throttle size" will be, because that value is determined by the guest. As far as I understand it, we won't be able to improve this functionality in the short term.

(2) The assertion I triggered is very hard to reproduce for me, but I will document it here in case someone else has better luck:

- Fedora 22 workstation x86_64 guest installed on a Fedora 21 host with Linux  4.1.1 and QEMU 2.4-rc3.
- Log in and install fio
- Run fio as outlined above as root, writing to a local file instead of a device. (i.e.: reading/writing to the one and only system drive connected.)
- Allow fio to complete normally once
- Enable the bps_rs=10000 and iops=10000 throttling
- Run fio again and allow it to hang.
- Optionally, try to wedge the guest by clicking on a menu on the xterm session to lock everything.
- Change the throttling to bps_rd=10000 but iops and everything else set to 0.
- QEMU will very occasionally throw an assertion,

in bdrv_aligned_preadv,
line:     assert(!qiov || bytes == qiov->size);

Have been unable to determine why, or unable to reproduce more than twice since I discovered it to try to investigate further. Unknown if it occurs downstream or only in RHEL.

Bumping to 7.3 for now, until I can determine what, if any, improvements we can reasonably make upstream.


Note You need to log in before you can comment on or make changes to this bug.