Bug 1956106 - VM fails on start with XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0
Summary: VM fails on start with XML error: Invalid PCI address 0000:12:01.0. slot must...
Keywords:
Status: POST
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.6.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.7
: ---
Assignee: Lucia Jelinkova
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-02 19:22 UTC by Polina
Modified: 2021-05-13 10:39 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.lsscsi | grep disk | awk '{print $NF}' parted /dev/sda --script -- mklabel gpt parted -a optimal /dev/sda mkpart primary 0% 1024MB mkfs.ext4 -F /dev/sda1 mkdir -p /disk_passthrough_mount_point mount /dev/sda1 /disk_passthrough_mount_point echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults 0 0" >> /etc/fstab touch /disk_passthrough_mount_point/file_test echo "content" > /disk_passthrough_mount_point/file_test
Clone Of:
Environment:
Last Closed:
oVirt Team: Virt
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
the second reproduce (1.25 MB, application/gzip)
2021-05-04 10:03 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 114765 0 master POST engine: update sound and usb on chipset change 2021-05-13 12:21:05 UTC

Comment 2 Polina 2021-05-04 10:03:42 UTC
Created attachment 1779320 [details]
the second reproduce

Comment 3 Polina 2021-05-04 11:32:32 UTC
some addition:

we see from the Test Setup 3-16 that after the attaching hostdev the VM could start first ( Test Setup  12). 
Then It is shut down at  Test Setup  14 and fails after start in Test Setup  16.
what happened between 12-14 while VM is UP.
the test connects by ssh to the VM and performs the following steps on the VM:

lsscsi | grep disk | awk '{print $NF}'
parted /dev/sda	 --script -- mklabel gpt
parted -a optimal /dev/sda mkpart primary 0% 1024MB
mkfs.ext4 -F /dev/sda1
mkdir -p /disk_passthrough_mount_point
mount /dev/sda1 /disk_passthrough_mount_point
echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults     0   0" >> /etc/fstab
touch /disk_passthrough_mount_point/file_test 
echo "content" > /disk_passthrough_mount_point/file_test

shutdown and start = > here it fails

Comment 4 Arik 2021-05-04 19:33:12 UTC
I see two problems here:

1. That the sound device remained ich6 after the VM changed to q35 (should have changed to ich9)

2. The real issue is that the following combination of:

   device     |    type    |                           address                            
--------------+------------+--------------------------------------------------------------
ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}

seems to fail without pcie-to-pci-bridge and pci-bridge (similar to bz 1770697)

And as for the scenario in which we drop the pci bridges - it happened after we started the vm using run-once.
In ProcessDownVm we clear the unmanaged devices when a stateless VM or VM that was started using run-once shuts down.

We probably shouldn't remove those pci bridges in these flows now..

Comment 5 Arik 2021-05-04 19:35:49 UTC
The devices of a VM that failed to run:

              device_id               |    device     |    type    |                           address                            
--------------------------------------+---------------+------------+--------------------------------------------------------------
 203aea20-7047-4536-b565-8b66fbda6c87 | virtio-scsi   | controller | {type=pci, slot=0x00, bus=0x04, domain=0x0000, function=0x0}
 655f3977-4dc6-42a1-9792-5353aee2a007 | usb           | controller | {type=pci, slot=0x00, bus=0x05, domain=0x0000, function=0x0}
 7522c13f-573d-4f16-a37b-704fa7827050 | memballoon    | balloon    | {type=pci, slot=0x00, bus=0x19, domain=0x0000, function=0x0}
 81955117-f37c-43f1-ba7f-3367943ffb6d | ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
 9e3933b9-8af8-4fca-be75-c87733be1292 | bridge        | interface  | {type=pci, slot=0x00, bus=0x02, domain=0x0000, function=0x0}
 d39a881d-d6cb-4908-ad6f-afb91e2e7986 | disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}
 ef749c63-9325-477e-a4f9-9131d3918c5b | cdrom         | disk       | {type=drive, bus=0, controller=0, target=0, unit=2}
 4f6ce940-e389-48dd-a3fa-41251fd4dfb9 | virtio        | rng        | {type=pci, slot=0x00, bus=0x06, domain=0x0000, function=0x0}
 6af0b5d5-a1c3-45ea-8ece-995b6adc1771 | virtio-serial | controller | {type=pci, slot=0x00, bus=0x03, domain=0x0000, function=0x0}

Comment 6 Lucia Jelinkova 2021-05-13 10:39:32 UTC
I tried to simulate this in my local environment - I created a VM with I440fx chipset, Virtio disk and a soundcard. Then changed the chipset to Q35, ran the vm, stopped the VM. My vm's disk device address was different than the one in the failed environment, so I changed it in the database to bus=0x18 slot. Run once, power off, run -> it failed with "Exit message: XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0"

I can confirm Arik's conclusion that the special combination of disk and soundcard addresses causes the VM not to start is correct. When I removed the soundcard or when I used the correct ich9 soundcard the issue did not occur.

That is why I suppose that fixing the soundcard conversion from ich6 (I440fx) to ich9 (Q35) will prevent the "Invalid PCI address 0000:12:01.0. slot must be <= 0" to occur.


Note You need to log in before you can comment on or make changes to this bug.