Bug 1956106 - VM fails on start with XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0
Summary: VM fails on start with XML error: Invalid PCI address 0000:12:01.0. slot must...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.6.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.7
: 4.4.7.1
Assignee: Lucia Jelinkova
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-02 19:22 UTC by Polina
Modified: 2023-01-15 08:46 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.4.7.1
Doc Type: Bug Fix
Doc Text:
```bash lsscsi | grep disk | awk '{print $NF}' parted /dev/sda --script -- mklabel gpt parted -a optimal /dev/sda mkpart primary 0% 1024MB mkfs.ext4 -F /dev/sda1 mkdir -p /disk_passthrough_mount_point mount /dev/sda1 /disk_passthrough_mount_point echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults 0 0" >> /etc/fstab touch /disk_passthrough_mount_point/file_test echo "content" > /disk_passthrough_mount_point/file_test ```
Clone Of:
Environment:
Last Closed: 2021-07-06 07:28:01 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
the second reproduce (1.25 MB, application/gzip)
2021-05-04 10:03 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 114765 0 master MERGED engine: update sound and usb on chipset change 2021-05-27 08:40:53 UTC

Comment 2 Polina 2021-05-04 10:03:42 UTC
Created attachment 1779320 [details]
the second reproduce

Comment 3 Polina 2021-05-04 11:32:32 UTC
some addition:

we see from the Test Setup 3-16 that after the attaching hostdev the VM could start first ( Test Setup  12). 
Then It is shut down at  Test Setup  14 and fails after start in Test Setup  16.
what happened between 12-14 while VM is UP.
the test connects by ssh to the VM and performs the following steps on the VM:

lsscsi | grep disk | awk '{print $NF}'
parted /dev/sda	 --script -- mklabel gpt
parted -a optimal /dev/sda mkpart primary 0% 1024MB
mkfs.ext4 -F /dev/sda1
mkdir -p /disk_passthrough_mount_point
mount /dev/sda1 /disk_passthrough_mount_point
echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults     0   0" >> /etc/fstab
touch /disk_passthrough_mount_point/file_test 
echo "content" > /disk_passthrough_mount_point/file_test

shutdown and start = > here it fails

Comment 4 Arik 2021-05-04 19:33:12 UTC
I see two problems here:

1. That the sound device remained ich6 after the VM changed to q35 (should have changed to ich9)

2. The real issue is that the following combination of:

   device     |    type    |                           address                            
--------------+------------+--------------------------------------------------------------
ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}

seems to fail without pcie-to-pci-bridge and pci-bridge (similar to bz 1770697)

And as for the scenario in which we drop the pci bridges - it happened after we started the vm using run-once.
In ProcessDownVm we clear the unmanaged devices when a stateless VM or VM that was started using run-once shuts down.

We probably shouldn't remove those pci bridges in these flows now..

Comment 5 Arik 2021-05-04 19:35:49 UTC
The devices of a VM that failed to run:

              device_id               |    device     |    type    |                           address                            
--------------------------------------+---------------+------------+--------------------------------------------------------------
 203aea20-7047-4536-b565-8b66fbda6c87 | virtio-scsi   | controller | {type=pci, slot=0x00, bus=0x04, domain=0x0000, function=0x0}
 655f3977-4dc6-42a1-9792-5353aee2a007 | usb           | controller | {type=pci, slot=0x00, bus=0x05, domain=0x0000, function=0x0}
 7522c13f-573d-4f16-a37b-704fa7827050 | memballoon    | balloon    | {type=pci, slot=0x00, bus=0x19, domain=0x0000, function=0x0}
 81955117-f37c-43f1-ba7f-3367943ffb6d | ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
 9e3933b9-8af8-4fca-be75-c87733be1292 | bridge        | interface  | {type=pci, slot=0x00, bus=0x02, domain=0x0000, function=0x0}
 d39a881d-d6cb-4908-ad6f-afb91e2e7986 | disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}
 ef749c63-9325-477e-a4f9-9131d3918c5b | cdrom         | disk       | {type=drive, bus=0, controller=0, target=0, unit=2}
 4f6ce940-e389-48dd-a3fa-41251fd4dfb9 | virtio        | rng        | {type=pci, slot=0x00, bus=0x06, domain=0x0000, function=0x0}
 6af0b5d5-a1c3-45ea-8ece-995b6adc1771 | virtio-serial | controller | {type=pci, slot=0x00, bus=0x03, domain=0x0000, function=0x0}

Comment 6 Lucia Jelinkova 2021-05-13 10:39:32 UTC
I tried to simulate this in my local environment - I created a VM with I440fx chipset, Virtio disk and a soundcard. Then changed the chipset to Q35, ran the vm, stopped the VM. My vm's disk device address was different than the one in the failed environment, so I changed it in the database to bus=0x18 slot. Run once, power off, run -> it failed with "Exit message: XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0"

I can confirm Arik's conclusion that the special combination of disk and soundcard addresses causes the VM not to start is correct. When I removed the soundcard or when I used the correct ich9 soundcard the issue did not occur.

That is why I suppose that fixing the soundcard conversion from ich6 (I440fx) to ich9 (Q35) will prevent the "Invalid PCI address 0000:12:01.0. slot must be <= 0" to occur.

Comment 7 Polina 2021-06-21 10:36:23 UTC
Verified by automation job upgrading 4.3 to 4.4.7.4. (ovirt-engine-4.4.7.4-0.9.el8ev.noarch). 
VM after updating chipset from I440FX to Q35, and the following attaching iscsi host device starts successfully. the link to the automation tests results - https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.4-ge-runner-tier1-after-upgrade/60/testReport/

Comment 8 Sandro Bonazzola 2021-07-06 07:28:01 UTC
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 9 lily young 2022-12-24 07:25:31 UTC Comment hidden (spam)
Comment 10 Vitto Brown 2023-01-15 08:46:12 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.