Bug 1956106

Summary: VM fails on start with XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0
Product: [oVirt] ovirt-engine Reporter: Polina <pagranat>
Component: BLL.VirtAssignee: Lucia Jelinkova <ljelinko>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.6.6CC: ahadas, bugs, lilyyoung129, vitto.foster98
Target Milestone: ovirt-4.4.7Keywords: Automation, ZStream
Target Release: 4.4.7.1Flags: pm-rhel: ovirt-4.4+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.7.1 Doc Type: Bug Fix
Doc Text:
```bash lsscsi | grep disk | awk '{print $NF}' parted /dev/sda --script -- mklabel gpt parted -a optimal /dev/sda mkpart primary 0% 1024MB mkfs.ext4 -F /dev/sda1 mkdir -p /disk_passthrough_mount_point mount /dev/sda1 /disk_passthrough_mount_point echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults 0 0" >> /etc/fstab touch /disk_passthrough_mount_point/file_test echo "content" > /disk_passthrough_mount_point/file_test ```
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-06 07:28:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the second reproduce none

Comment 2 Polina 2021-05-04 10:03:42 UTC
Created attachment 1779320 [details]
the second reproduce

Comment 3 Polina 2021-05-04 11:32:32 UTC
some addition:

we see from the Test Setup 3-16 that after the attaching hostdev the VM could start first ( Test Setup  12). 
Then It is shut down at  Test Setup  14 and fails after start in Test Setup  16.
what happened between 12-14 while VM is UP.
the test connects by ssh to the VM and performs the following steps on the VM:

lsscsi | grep disk | awk '{print $NF}'
parted /dev/sda	 --script -- mklabel gpt
parted -a optimal /dev/sda mkpart primary 0% 1024MB
mkfs.ext4 -F /dev/sda1
mkdir -p /disk_passthrough_mount_point
mount /dev/sda1 /disk_passthrough_mount_point
echo "/dev/sda1 /disk_passthrough_mount_point ext4 defaults     0   0" >> /etc/fstab
touch /disk_passthrough_mount_point/file_test 
echo "content" > /disk_passthrough_mount_point/file_test

shutdown and start = > here it fails

Comment 4 Arik 2021-05-04 19:33:12 UTC
I see two problems here:

1. That the sound device remained ich6 after the VM changed to q35 (should have changed to ich9)

2. The real issue is that the following combination of:

   device     |    type    |                           address                            
--------------+------------+--------------------------------------------------------------
ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}

seems to fail without pcie-to-pci-bridge and pci-bridge (similar to bz 1770697)

And as for the scenario in which we drop the pci bridges - it happened after we started the vm using run-once.
In ProcessDownVm we clear the unmanaged devices when a stateless VM or VM that was started using run-once shuts down.

We probably shouldn't remove those pci bridges in these flows now..

Comment 5 Arik 2021-05-04 19:35:49 UTC
The devices of a VM that failed to run:

              device_id               |    device     |    type    |                           address                            
--------------------------------------+---------------+------------+--------------------------------------------------------------
 203aea20-7047-4536-b565-8b66fbda6c87 | virtio-scsi   | controller | {type=pci, slot=0x00, bus=0x04, domain=0x0000, function=0x0}
 655f3977-4dc6-42a1-9792-5353aee2a007 | usb           | controller | {type=pci, slot=0x00, bus=0x05, domain=0x0000, function=0x0}
 7522c13f-573d-4f16-a37b-704fa7827050 | memballoon    | balloon    | {type=pci, slot=0x00, bus=0x19, domain=0x0000, function=0x0}
 81955117-f37c-43f1-ba7f-3367943ffb6d | ich6          | sound      | {type=pci, slot=0x01, bus=0x12, domain=0x0000, function=0x0}
 9e3933b9-8af8-4fca-be75-c87733be1292 | bridge        | interface  | {type=pci, slot=0x00, bus=0x02, domain=0x0000, function=0x0}
 d39a881d-d6cb-4908-ad6f-afb91e2e7986 | disk          | disk       | {type=pci, slot=0x00, bus=0x18, domain=0x0000, function=0x0}
 ef749c63-9325-477e-a4f9-9131d3918c5b | cdrom         | disk       | {type=drive, bus=0, controller=0, target=0, unit=2}
 4f6ce940-e389-48dd-a3fa-41251fd4dfb9 | virtio        | rng        | {type=pci, slot=0x00, bus=0x06, domain=0x0000, function=0x0}
 6af0b5d5-a1c3-45ea-8ece-995b6adc1771 | virtio-serial | controller | {type=pci, slot=0x00, bus=0x03, domain=0x0000, function=0x0}

Comment 6 Lucia Jelinkova 2021-05-13 10:39:32 UTC
I tried to simulate this in my local environment - I created a VM with I440fx chipset, Virtio disk and a soundcard. Then changed the chipset to Q35, ran the vm, stopped the VM. My vm's disk device address was different than the one in the failed environment, so I changed it in the database to bus=0x18 slot. Run once, power off, run -> it failed with "Exit message: XML error: Invalid PCI address 0000:12:01.0. slot must be <= 0"

I can confirm Arik's conclusion that the special combination of disk and soundcard addresses causes the VM not to start is correct. When I removed the soundcard or when I used the correct ich9 soundcard the issue did not occur.

That is why I suppose that fixing the soundcard conversion from ich6 (I440fx) to ich9 (Q35) will prevent the "Invalid PCI address 0000:12:01.0. slot must be <= 0" to occur.

Comment 7 Polina 2021-06-21 10:36:23 UTC
Verified by automation job upgrading 4.3 to 4.4.7.4. (ovirt-engine-4.4.7.4-0.9.el8ev.noarch). 
VM after updating chipset from I440FX to Q35, and the following attaching iscsi host device starts successfully. the link to the automation tests results - https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.4-ge-runner-tier1-after-upgrade/60/testReport/

Comment 8 Sandro Bonazzola 2021-07-06 07:28:01 UTC
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 9 lily young 2022-12-24 07:25:31 UTC Comment hidden (spam)
Comment 10 Vitto Brown 2023-01-15 08:46:12 UTC Comment hidden (spam)