Bug 1956106

Summary: VM fails on start with XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0
Product: [oVirt] ovirt-engine Reporter: Polina <pagranat>
Component: BLL.VirtAssignee: Lucia Jelinkova <ljelinko>
Status: ON_QA --- QA Contact: Polina <pagranat>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.6.6CC: ahadas, bugs
Target Milestone: ovirt-4.4.7Keywords: Automation, ZStream
Target Release: 4.4.7.1Flags: pm-rhel: ovirt-4.4+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.7.1 Doc Type: If docs needed, set a value
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.lsscsi | grep disk | awk '{print $NF}' parted /dev/sda --script -- mklabel gpt parted -a optimal /dev/sda mkpart primary 0% 1024MB mkfs.ext4 -F /dev/sda1 mkdir -p /disk_passthrough_mount_point mount /dev/sda1 /disk_passthrough_mount_point echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults 0 0" >> /etc/fstab touch /disk_passthrough_mount_point/file_test echo "content" > /disk_passthrough_mount_point/file_test
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
the second reproduce none

Comment 2 Polina 2021-05-04 10:03:42 UTC
Created attachment 1779320 [details]
the second reproduce

Comment 3 Polina 2021-05-04 11:32:32 UTC
some addition:

we see from the Test Setup 3-16 that after the attaching hostdev the VM could start first ( Test Setup  12). 
Then It is shut down at  Test Setup  14 and fails after start in Test Setup  16.
what happened between 12-14 while VM is UP.
the test connects by ssh to the VM and performs the following steps on the VM:

lsscsi | grep disk | awk '{print $NF}'
parted /dev/sda	 --script -- mklabel gpt
parted -a optimal /dev/sda mkpart primary 0% 1024MB
mkfs.ext4 -F /dev/sda1
mkdir -p /disk_passthrough_mount_point
mount /dev/sda1 /disk_passthrough_mount_point
echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults     0   0" >> /etc/fstab
touch /disk_passthrough_mount_point/file_test 
echo "content" > /disk_passthrough_mount_point/file_test

shutdown and start = > here it fails

Comment 4 Arik 2021-05-04 19:33:12 UTC
I see two problems here:

1. That the sound device remained ich6 after the VM changed to q35 (should have changed to ich9)

2. The real issue is that the following combination of:

   device     |    type    |                           address                            
--------------+------------+--------------------------------------------------------------
ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}

seems to fail without pcie-to-pci-bridge and pci-bridge (similar to bz 1770697)

And as for the scenario in which we drop the pci bridges - it happened after we started the vm using run-once.
In ProcessDownVm we clear the unmanaged devices when a stateless VM or VM that was started using run-once shuts down.

We probably shouldn't remove those pci bridges in these flows now..

Comment 5 Arik 2021-05-04 19:35:49 UTC
The devices of a VM that failed to run:

              device_id               |    device     |    type    |                           address                            
--------------------------------------+---------------+------------+--------------------------------------------------------------
 203aea20-7047-4536-b565-8b66fbda6c87 | virtio-scsi   | controller | {type=pci, slot=0x00, bus=0x04, domain=0x0000, function=0x0}
 655f3977-4dc6-42a1-9792-5353aee2a007 | usb           | controller | {type=pci, slot=0x00, bus=0x05, domain=0x0000, function=0x0}
 7522c13f-573d-4f16-a37b-704fa7827050 | memballoon    | balloon    | {type=pci, slot=0x00, bus=0x19, domain=0x0000, function=0x0}
 81955117-f37c-43f1-ba7f-3367943ffb6d | ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
 9e3933b9-8af8-4fca-be75-c87733be1292 | bridge        | interface  | {type=pci, slot=0x00, bus=0x02, domain=0x0000, function=0x0}
 d39a881d-d6cb-4908-ad6f-afb91e2e7986 | disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}
 ef749c63-9325-477e-a4f9-9131d3918c5b | cdrom         | disk       | {type=drive, bus=0, controller=0, target=0, unit=2}
 4f6ce940-e389-48dd-a3fa-41251fd4dfb9 | virtio        | rng        | {type=pci, slot=0x00, bus=0x06, domain=0x0000, function=0x0}
 6af0b5d5-a1c3-45ea-8ece-995b6adc1771 | virtio-serial | controller | {type=pci, slot=0x00, bus=0x03, domain=0x0000, function=0x0}

Comment 6 Lucia Jelinkova 2021-05-13 10:39:32 UTC
I tried to simulate this in my local environment - I created a VM with I440fx chipset, Virtio disk and a soundcard. Then changed the chipset to Q35, ran the vm, stopped the VM. My vm's disk device address was different than the one in the failed environment, so I changed it in the database to bus=0x18 slot. Run once, power off, run -> it failed with "Exit message: XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0"

I can confirm Arik's conclusion that the special combination of disk and soundcard addresses causes the VM not to start is correct. When I removed the soundcard or when I used the correct ich9 soundcard the issue did not occur.

That is why I suppose that fixing the soundcard conversion from ich6 (I440fx) to ich9 (Q35) will prevent the "Invalid PCI address 0000:12:01.0. slot must be <= 0" to occur.