Bug 2229357

Summary: [viommu/vhost] qemu-kvm: Fail to lookup the translated address
Product: Red Hat Enterprise Linux 9 Reporter: Yanghang Liu <yanghliu>
Component: qemu-kvmAssignee: Eric Auger <eric.auger>
qemu-kvm sub component: Networking QA Contact: jinl
Status: CLOSED MIGRATED Docs Contact:
Severity: low    
Priority: low CC: chayang, coli, eric.auger, jasowang, jinli, jinl, jinzhao, juzhang, leiyang, lvivier, mst, virt-maint, yanghliu, ymankad
Version: 9.3Keywords: MigratedToJIRA
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-22 16:10:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanghang Liu 2023-08-05 09:24:57 UTC
Description of problem:
When reboot or shutdown the VM with an Intel iommu device and PF/VF, the qemu-kvm throws the error info like : Fail to lookup the translated address fff22000

Version-Release number of selected component (if applicable):
5.14.0-348.el9.x86_64
qemu-kvm-8.0.0-9.el9.x86_64
libvirt-9.5.0-5.el9.x86_64
edk2-ovmf-20230524-2.el9.noarch


How reproducible:
100%

Steps to Reproduce:
1. import a VM which has two VFs

# virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --osinfo detect=on,require=off --check all=off --memtune hard_limit=12582912 --memballoon virtio,driver.iommu=on,driver.ats=on --import --noautoconsole --check all=off --network bridge=switch,model=virtio,mac=52:54:00:03:93:93,driver.iommu=on,driver.ats=on --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20,driver.iommu=on,driver.ats=on --features ioapic.driver=qemu --iommu model=intel,driver.intremap=on,driver.caching_mode=on,driver.iotlb=on --boot=uefi --hostdev pci_0000_3b_0e_0  --hostdev pci_0000_3b_0e_1 



2. make sure the VM kernel has enabled intel_iommu=on option


3. start a VM which has two VFs

# virsh start rhel93

4. reboot or shutdown the VM
run "reboot" or "shutdown -h now" in the VM

5. repeated the step 2 for 5 times

Actual results:
The qemu-kvm throws the error info like : Fail to lookup the translated address fff22000

Expected results:
The qemu-kvm does not throw any error

Additional info:
(1) The cmd I used to reproduce this issue via auto tests: 
# python3 /home/private_autocase/vfio/vfio_sriov_test.py --feature=vf --domain=rhel93 --device_name=MT2892  --machine_type=q35 --bios=ovmf  --test_list="intel_iommu_test"
or
python3 /home/private_autocase/vfio/vfio_sriov_test.py --feature=pf --domain=rhel93 --device_name=82599ES --machine_type=q35 --bios=ovmf  --test_list="intel_iommu_test"

Comment 1 Alex Williamson 2023-08-09 19:30:23 UTC
The only such message I see in QEMU is from vhost:

hw/virtio/vhost.c:
            error_report("Fail to lookup the translated address "
                         "%"PRIx64, iotlb.translated_addr);

Suspect this might have more to do with the virtio network device than the vfio devices.

Comment 2 Yanghang Liu 2023-08-21 07:57:04 UTC
Test with virtio-iommu in the same test env:

Test step:
(1) import a VM which has two MT2892(mlx5_core) VFs
virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --osinfo detect=on,require=off --check all=off --memtune hard_limit=12582912 --memballoon virtio,driver.iommu=on,driver.ats=on --import --noautoconsole --check all=off --network bridge=switch,model=virtio,mac=52:54:00:03:93:93,driver.iommu=on,driver.ats=on --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20,driver.iommu=on,driver.ats=on --features ioapic.driver=qemu --iommu model=virtio --boot=uefi --hostdev pci_0000_60_01_2  --hostdev pci_0000_60_01_3 

(2) make sure the VM kernel has enabled intel_iommu=on option

(3) start a VM which has two VFs

# virsh start rhel93

(4) reboot or shutdown the VM
4.1 run "virsh reboot rhel93" or  "virsh shutdown rhel93" on the host 
4.2 run "reboot" or "shutdown -h now" in the VM

(5) repeated the step 2 for 5 times

(6) check the qemu-kvm log
2023-08-21T07:47:34.636238Z qemu-kvm: Fail to lookup the translated address fffc6000
2023-08-21T07:47:35.320271Z qemu-kvm: virtio_iommu_translate no mapping for 0xfffff242 for sid=256
2023-08-21T07:48:29.047660Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:48:29.130049Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:49:23.606585Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:49:23.606618Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:49:24.090760Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:50:27.523693Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:51:21.408694Z qemu-kvm: Fail to lookup the translated address ffffe000
2023-08-21T07:51:21.735175Z qemu-kvm: Fail to lookup the translated address ffffe000

Comment 3 Yanghang Liu 2023-08-21 08:28:12 UTC
Hi Eric,

Could you please help check Comment 2 ?

I want to check with you to see if I need to open a separate bug for "2023-08-21T07:47:35.320271Z qemu-kvm: virtio_iommu_translate no mapping for 0xfffff242 for sid=256" ?

Comment 4 Eric Auger 2023-08-23 14:26:26 UTC
(In reply to Yanghang Liu from comment #3)
> Hi Eric,
> 
> Could you please help check Comment 2 ?
> 
> I want to check with you to see if I need to open a separate bug for
> "2023-08-21T07:47:35.320271Z qemu-kvm: virtio_iommu_translate no mapping for
> 0xfffff242 for sid=256" ?

Hi, no please let it as is. Can I have access to your machine to have a look?

Comment 7 Eric Auger 2023-08-23 15:46:58 UTC
Please can you indicate whether it is a regression. Does your host feature MR2876 ("Synchronize virtio ring, net, blk and scsi with upstream")?

Comment 8 Eric Auger 2023-09-04 14:50:18 UTC
So I remove the vfio-pci devices and I still hit the
qemu-kvm: Fail to lookup the translated address ffec7000
error which comes from the vhost-net device. So to me this looks unrelated to VFIO. Also I hit 
virtio_iommu_translate no mapping for 0xffff2c00 for sid=24 on shutdown -r now, on another machine/VM using vhost-net (upstream qemu). All those stuff look related to virtio-iommu/vhost integration. Adding Jason and Michael in CC

Comment 9 Eric Auger 2023-09-04 16:05:11 UTC
(In reply to Yanghang Liu from comment #2)
> Test with virtio-iommu in the same test env:
> 
> Test step:
> (1) import a VM which has two MT2892(mlx5_core) VFs
> virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4
> --graphics type=vnc,port=5993,listen=0.0.0.0 --osinfo detect=on,require=off
> --check all=off --memtune hard_limit=12582912 --memballoon
> virtio,driver.iommu=on,driver.ats=on --import --noautoconsole --check
> all=off --network
> bridge=switch,model=virtio,mac=52:54:00:03:93:93,driver.iommu=on,driver.
> ats=on --disk
> path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> size=20,driver.iommu=on,driver.ats=on --features ioapic.driver=qemu --iommu
> model=virtio --boot=uefi --hostdev pci_0000_60_01_2  --hostdev
> pci_0000_60_01_3 
> 
> (2) make sure the VM kernel has enabled intel_iommu=on option

for the virtio-iommu you don't need that option. Also the ats=on settings are not mandated on virtio devices.

Comment 10 Yanghang Liu 2023-09-05 11:51:43 UTC
(In reply to Eric Auger from comment #8)
> So I remove the vfio-pci devices and I still hit the
> qemu-kvm: Fail to lookup the translated address ffec7000
> error which comes from the vhost-net device. So to me this looks unrelated
> to VFIO. 


> Also I hit virtio_iommu_translate no mapping for 0xffff2c00 for sid=24 on shutdown -r
> now, on another machine/VM using vhost-net (upstream qemu). 
> All those stuff look related to virtio-iommu/vhost integration.

Hi Jin, 

I assign the QA Contact to you as this bug can be reproduced without any PF/VF.

Please feel free to let me know for any concerns.

Comment 11 Yanghang Liu 2023-09-05 11:54:48 UTC
(In reply to Eric Auger from comment #9)
> (In reply to Yanghang Liu from comment #2)
> > Test with virtio-iommu in the same test env:
> > 
> > Test step:
> > (1) import a VM which has two MT2892(mlx5_core) VFs
> > virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4
> > --graphics type=vnc,port=5993,listen=0.0.0.0 --osinfo detect=on,require=off
> > --check all=off --memtune hard_limit=12582912 --memballoon
> > virtio,driver.iommu=on,driver.ats=on --import --noautoconsole --check
> > all=off --network
> > bridge=switch,model=virtio,mac=52:54:00:03:93:93,driver.iommu=on,driver.
> > ats=on --disk
> > path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> > size=20,driver.iommu=on,driver.ats=on --features ioapic.driver=qemu --iommu
> > model=virtio --boot=uefi --hostdev pci_0000_60_01_2  --hostdev
> > pci_0000_60_01_3 
> > 
> > (2) make sure the VM kernel has enabled intel_iommu=on option
> 
> for the virtio-iommu you don't need that option. Also the ats=on settings
> are not mandated on virtio devices.

Thanks Eric for pointing this out. 

I have updated my auto test code based on your comment :)

Comment 12 jinl 2023-09-06 08:14:52 UTC
(In reply to Eric Auger from comment #7)
> Please can you indicate whether it is a regression. Does your host feature
> MR2876 ("Synchronize virtio ring, net, blk and scsi with upstream")?

Tested with qemu-kvm-7.2.0-14.el9_2.5 and kernel-5.14.0-284.30.1.el9_2.x86_64 can reproduce this issue, it shouldn't be a regression.

Comment 13 Eric Auger 2023-09-06 09:24:34 UTC
(In reply to jinl from comment #12)
> (In reply to Eric Auger from comment #7)
> > Please can you indicate whether it is a regression. Does your host feature
> > MR2876 ("Synchronize virtio ring, net, blk and scsi with upstream")?
> 
> Tested with qemu-kvm-7.2.0-14.el9_2.5 and
> kernel-5.14.0-284.30.1.el9_2.x86_64 can reproduce this issue, it shouldn't
> be a regression.

do you confirm you hit it *without* vfio assigned devices, ie. only with vhost-net? It is important to reduce the test case if possible. On my end it is not that easy to reproduce. At some point I thought I was even able to reproduce it with vhost only and without viommu. But I am nore sure. I would be grateful to you if you could try this on your end too.

Comment 14 jinl 2023-09-08 08:22:37 UTC
(In reply to Eric Auger from comment #13)

> do you confirm you hit it *without* vfio assigned devices, ie. only with
> vhost-net? It is important to reduce the test case if possible. On my end it
> is not that easy to reproduce. At some point I thought I was even able to
> reproduce it with vhost only and without viommu. But I am nore sure. I would
> be grateful to you if you could try this on your end too.

Yes, I reproduced it without vfio assigned devices. And I can reproduce it every time.
For vhost only and without viommu, I tried several times and cannot hit the issue.

To summarize my test result:
Can reproduce it with both qemu-kvm-7.2 and qemu-kvm-8.0

The steps to reproduce:
1. install vm with intel_iommu and vhost-net
2. reboot vm

qemu-kvm commandline:
/usr/libexec/qemu-kvm \
-name guest=ovmf,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-2-ovmf/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/ovmf_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.2.0,usb=off,smm=on,kernel_irqchip=split,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu Broadwell-IBRS,vme=on,ss=on,vmx=on,pdcm=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaveopt=on,pdpe1gb=on,abm=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on \
-global driver=cfi.pflash01,property=secure,value=on \
-m 8192 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":8589934592}' \
-overcommit mem-lock=off \
-smp 10,sockets=1,dies=1,cores=10,threads=1 \
-uuid f9c2980d-3dbc-4572-bbc1-67965b87474f \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=32,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device '{"driver":"intel-iommu","id":"iommu0","intremap":"on","caching-mode":true,"device-iotlb":true}' \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \
-device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \
-device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \
-device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \
-device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \
-device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \
-device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \
-device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \
-blockdev '{"driver":"file","filename":"/home/rhel.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \
-device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \
-netdev '{"type":"tap","fd":"35","vhost":true,"vhostfd":"36","id":"hostnet0"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"netdev":"hostnet0","id":"net0","mac":"52:56:00:00:00:0b","bus":"pci.1","addr":"0x0"}' \
-add-fd set=0,fd=29,opaque=serial0-source \
-chardev file,id=charserial0,path=/dev/fdset/0,append=on \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev pty,id=charserial1 \
-device '{"driver":"isa-serial","chardev":"charserial1","id":"serial1","index":1}' \
-chardev socket,id=charchannel0,fd=31,server=on,wait=off \
-device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \
-chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/2-ovmf-swtpm.sock \
-tpmdev emulator,id=tpm-tpm0,chardev=chrtpm \
-device '{"driver":"tpm-crb","tpmdev":"tpm-tpm0","id":"tpm0"}' \
-device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 0.0.0.0:0,audiodev=audio1 \
-device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \
-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Comment 15 RHEL Program Management 2023-09-22 16:08:23 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.