Bug 1650272

Summary: Ballooning is incompatible with vfio assigned devices, but not prevented
Product: Red Hat Enterprise Linux 8 Reporter: Alex Williamson <alex.williamson>
Component: qemu-kvmAssignee: Alex Williamson <alex.williamson>
Status: CLOSED CURRENTRELEASE QA Contact: Yumei Huang <yuhuang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: chayang, jinzhao, juzhang, knoel, mdeng, micai, pezhang, qzhang, rbalakri, virt-maint, yfu, yuhuang
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-2.12.0-45.module+el8+2313+d65431a0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1619778 Environment:
Last Closed: 2019-06-14 01:31:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Danilo de Paula 2018-12-06 16:09:39 UTC
Fix included in qemu-kvm-2.12.0-45.module+el8+2313+d65431a0

Comment 4 Yumei Huang 2018-12-27 05:55:44 UTC
Steps to reproduce and verify:
1. Boot guest with vfio assigned nic and balloon device
 cli:
 -m 4096 \
 -device virtio-balloon-pci,id=balloon0 \
 -device vfio-pci,host=0000:44:00.0,id=nic1 \

2. Disable the nic  
   # ifconfig eth1 down

3. Inflate the balloon till guest start use swap, check qemu resident memory
   (qemu) balloon 1024
   # top -p `pgrep qemu`

4. Deflate balloon to original value
   (qemu) balloon 4096

5. Activate the nic, check host and guest dmesg 
   # ifconfig eth1 up


Reproduced with qemu-kvm-2.12.0-42.module+el8+2173+537e5cb5:
After step 3, qemu resident memory is around 1G. 
After step 5, hit dma error.

Host dmesg:
<3>[12123.377068] vfio-pci 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xa34fa56685400380 flags=0x0030]
<3>[12125.361549] vfio-pci 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xa34fa56685400210 flags=0x0030]
<3>[12125.362465] vfio-pci 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xa34fa56685400240 flags=0x0030]
<4>[12125.370860] amd_iommu_report_page_fault: 4 callbacks suppressed
<3>[12125.370864] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x0000 address=0xa34fa56685400220 flags=0x0030]
<3>[12125.379747] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x0000 address=0xa34fa56685400280 flags=0x0030]
<3>[12125.388403] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x0000 address=0xa34fa566854002c0 flags=0x0030]
<3>[12125.396852] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x0000 address=0xa34fa56685400300 flags=0x0030]

Guest dmesg:
[  487.856482] igb 0000:00:06.0: Detected Tx Unit Hang
[  487.856482]   Tx Queue             <2>
[  487.856482]   TDH                  <0>
[  487.856482]   TDT                  <1>
[  487.856482]   next_to_use          <1>
[  487.856482]   next_to_clean        <0>
[  487.856482] buffer_info[next_to_clean]
[  487.856482]   time_stamp           <10002d678>
[  487.856482]   next_to_watch        <00000000c71aaaa3>
[  487.856482]   jiffies              <10002dd48>
[  487.856482]   desc.status          <158000>
[  488.596549] igb 0000:00:06.0: Detected Tx Unit Hang
[  488.596549]   Tx Queue             <1>
[  488.596549]   TDH                  <0>
[  488.596549]   TDT                  <1>
[  488.596549]   next_to_use          <1>
[  488.596549]   next_to_clean        <0>
[  488.596549] buffer_info[next_to_clean]
[  488.596549]   time_stamp           <10002da81>
[  488.596549]   next_to_watch        <000000007d89803d>
[  488.596549]   jiffies              <10002e02c>
[  488.596549]   desc.status          <f8000>


Verified with qemu-kvm-2.12.0-49.module+el8+2586+bf759444:
After step 3, qemu resident memory is around 4G. 
After step 5, both host and guest work well, no error in dmesg even after running netperf.