Bug 1004066 - Host: Exit message: internal error No more available PCI addresses
Summary: Host: Exit message: internal error No more available PCI addresses
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.3.0
Assignee: Omer Frenkel
QA Contact: Ilanit Stein
URL:
Whiteboard: virt
: 1024833 (view as bug list)
Depends On:
Blocks: 1015134
TreeView+ depends on / blocked
 
Reported: 2013-09-03 20:29 UTC by baiesi
Modified: 2018-12-04 15:48 UTC (History)
17 users (show)

Fixed In Version: is18
Doc Type: Bug Fix
Doc Text:
Previously, sound devices would not be found when restoring the configuration of a stateless virtual machine, resulting in a new sound device being added. This would result in new sound devices being added to stateless virtual machines each time those machines were started, eventually preventing them from starting. With this update, the code that searches for sound devices has been updated so that sound devices are now correctly discovered when restoring stateless virtual machines, and no new sound devices are created. Stateless virtual machines now only have a single sound device and will not experience any issues due to sound devices after being started multiple times.
Clone Of:
: 1015134 (view as bug list)
Environment:
Last Closed: 2014-01-21 17:36:48 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 532923 0 None None None Never
Red Hat Product Errata RHSA-2014:0038 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.3.0 update 2014-01-21 22:03:06 UTC
oVirt gerrit 19561 0 None None None Never
oVirt gerrit 19677 0 None None None Never

Description baiesi 2013-09-03 20:29:57 UTC
Summary:
Host: Exit message: internal error No more available PCI addresses

Description of problem:
I first noticed vm migration failing from the admin portal indicating "Migration failed due to Error: Fatal error during migration". Then correlated that the destination Host had an issue indicating "Exit message: internal error No more available PCI addresses".  All vm(s) associated with the Host are in a frozen state in the UI, some contain hour-glass, some migrating to and some a power up status that never change. Since this issue had occurred, the Host systems memory resources had been slowly climbing now at 82.06%. 

The test environment is currently in this condition and will remain in this state for a brief period of time in case developer wish to get access to it. I have not tried to get my Host functional again by trying to put the Host into maintenance mode. I will try this if the developers have no need for the current system state ans see if will recover.

Current state;
-Unable to migrate vm(s) to this Host
-Unable to shut-down the Hosts vm(s): a dialog indicates "Cannot shut-down VM. VM is not running."
-Unable to suspend the Hosts vm(s): a dialog indicates "Cannot hibernate VM. VM is not up."
-Unable to run the Hosts vm(s): a dialog indicates "Cannot run VM. VM is running."
-Unable to cancel migration of the Hosts vm(s): a dialog indicates "Cannot cancel migration for non migrating VM"
-The Admins Portal UI shows the Vm(s) for the Host in a frozen state as indicated above.

Version-Release number of selected component:
Host Info
OS       : RHEL6Server - 6.4.0.4.el6
Kernel   : 2.6.32 - 358.14.1.el6.x86_64
KVM Ver  : 0.12.1.2 - 2.355.el6_4.5
Libvirt  : libvirt-0.10.2-18.el6_4.9
vdsm     : vdsm-4.10.2-23.0.el6ev
spice    : 0.12.0 - 12.el6_4.2

How reproducible:Undetermined since this was the first run
Steps to Reproduce:
1.Run System test load against the system for an extended period of time

Actual results:
Failed migration errors with Host generated event: " Exit message: internal error No more available PCI addresses."

Expected results:
Continued system operation and functionality

Additional info:
I have running a 30 day test using Rhevm 3.2.
Type            : System / Longevity
Target Duration : 30 days
Current Duration: 26 days / Run 1

System Test Env:
-Red Hat Enterprise Virtualization Manager Version: 3.2.1-0.39.el6ev
-Qty 1 Rhel6.4, Rhevm Server,  high end Dell PowerEdge R710 Dual 8core, 32GBRam, rhevm-3.2.1-0.39.el6ev.noarch
-Qty 4 Rhel6.4, Hosts all high end Dell, PowerEdge R710 Dual 8core, 16GBRam
-Qty 1 Rhel6.4, Ipa Directory Server
-Qty 3 Rhel6.4, Load Client machines to dive user simulated load.

VM(s)
Total 34 Vms created

Storage
-ISCSI Total 500G
-Name Type Storage Format Cross Data-Center-Status FreeSpace
-ISCIMainStorage Data (Master) iSCSI  V3 Active 263 GB

Data collection / monitoring:
All systems being monitored for uptime, memory, swap, cpu, networkio, diskio and disk space during the test run. (except for the IPA Server/ Clients)

System Test Load:
1. VM_Crud client, A python multi-thread client using the sdk to cycles through a crud flow of VM(s) over a period of time defined by the tester to drive load against the system  (10 threads)

2. VM_Migration client, A python multi-thread client using the sdk to cycles through migrating running vms from host to host in the test environment over a period of time defined by the tester to drive load against the system (2 threads)

3. VM_Cycling client, A python multi-thread client using the sdk to cycles through a rnd run, suspend, stop of existing VM(s) in the test environment over a period of time defined by the tester to drive load against the system (10 threads)

4. UserPortal client, A python multi-thread client using selenium  to drive the User Portal.  The client cycles through unique users to run, stop or start console remote-viewer of existing VM(s) in the test environment over a period of time defined by the tester to drive load against the system (10 threads)

Let me know if there are any additional logs or system access required.

Thanks
Bruce

Comment 2 baiesi 2013-09-09 15:02:30 UTC
Let me know if anyone needs access to the test env in its current state.  I've sent the information out already to a few who has requested access.  The systems will be available until Wednesday EST 11:00.

Comment 3 Barak 2013-09-09 20:22:45 UTC
Below is the command sent for the destination host,


{'custom': {},
     'keyboardLayout':'en-us',
     'kvmEnable': 'true',{'custom': {},
     'keyboardLayout':'en-us',
     'kvmEnable': 'true',
     'pitReinjection': 'false',
     'acpiEnable': 'true',
     'emulatedMachine': 'rhel6.4.0',
     'cpuType': 'Westmere',
     'vmId': '27f78875-8e50-4dc1-9c24-e519bd8683ce',
     'devices':
	[{'device': 'qxl',
	  'specParams': {'vram': '65536'},
	  'type': 'video',
	  'deviceId': 'c4b8cb2c-a27b-46c6-a08d-ee96175a60fe'},
	 {'index': '2',
	  'iface': 'ide',
	  'bootOrder': '2',
	  'specParams': {'path': ''},
	  'readonly': 'true',
	  'deviceId': '0113864d-9dfe-4b14-8cc8-d83a85db7f98',
	  'path': '',
	  'device': 'cdrom',
	  'shared': 'false',
	  'type': 'disk'},
	 {'index': 0,
	  'iface': 'virtio',
	  'format': 'cow',
	  'bootOrder': '1',
	  'poolID': '5849b030-626e-47cb-ad90-3ce782d831b3',
	  'volumeID': 'e8fe1b6e-b62c-487c-bf04-b9a6168131fb',
	  'imageID': '28e91ef5-a15b-421d-adda-da5a2294d1ae',
	  'specParams': {},
	  'readonly': 'false',
	  'domainID': '156a4b8c-f139-46c5-9e0b-fceaaaeaff4f',
	  'optional': 'false',
	  'deviceId': '28e91ef5-a15b-421d-adda-da5a2294d1ae',
	  'address': {'bus': '0x00', ' slot': '0x05', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'},
	  'device': 'disk',
	  'shared': 'false',
	  'propagateErrors': 'off',
	  'type': 'disk'},
	 {'nicModel': 'e1000',
	  'macAddr': '00:1a:4a:10:86:31',
	  'linkActive': 'true',
	  'network': 'rhevm',
	  'filter': 'vdsm-no-mac-spoofing',
	  'specParams': {},
	  'deviceId': '051fbde9-71ab-49c1-bdf6-a22a8e24355e',
	  'address': {'bus': '0x00', ' slot': '0x03', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'},
	  'device': 'bridge',
	  'type': 'interface'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '3524a274-3439-475b-93c5-d420cafb1be3'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'c0dbb199-3648-478f-b995-45180b3c1cba'},
	 {'device': 'ich6', 'specParams': {},
	  'type': 'sound',
	  'deviceId': 'a8a760f2-6b05-4390-b4f3-0b4c7eb1d0dc'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '4db9217e-6c80-4638-974b-863b3b6dae1a'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '625b6716-8855-419f-9952-f01a542aec7f'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '4adbc5d0-f5b5-4ac1-9df9-327561b7f4ca'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '579f84f5-ae32-49d7-9977-18d57ab90a41'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '17d51030-b43d-43fa-a413-e06c8b2d62bb'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '2d28d304-32eb-423d-baee-91f218eb43f2'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'ff4c9f6a-3e7e-4a25-b0d7-776f86e5a3e4'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'f05b4df4-52b9-48d4-8a7d-f32304df9da8'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'c2bf70af-48f7-4058-a611-d95dead221c7'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '85600e66-d6e9-4bea-ac97-e9fd358ff33b'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '1fa4a40b-d847-4b81-a572-9150135aa511'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '326bf1da-0a50-4da6-ad45-f4794ab6feea'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '824d1bf0-7fff-439b-8f7a-4c0989342a49'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '4d789bd1-6d3f-4599-8299-68467c4d591d'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'eb829e9a-ef2f-4401-a1f0-49d10483c097'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '48fe2b91-46e4-461c-ae0c-bb96cb4b2676'},
	 {'device':
	  'ich6', 'specParams': {},
	  'type': 'sound',
	  'deviceId': 'b5e910dd-f7b9-497b-a518-35f750c8e1c0'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '5cc0325a-03cd-408d-8e88-c5af29f108d8'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'ea36e407-91f6-43ea-8f41-1eebff6e4661'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '0212b3b5-5170-4877-87cb-40c1b290b619'}, 
	 {'device': 'ich6', 
	  'specParams': {}, 
	  'type': 'sound', 
	  'deviceId': '11c16b25-7851-4b3d-8866-cc4414721ee3'}, 
	 {'device': 'ich6', 
	  'specParams': {}, 
	  'type': 'sound', 
	  'deviceId': '5494e581-8823-4607-bfa0-e1419173567d'}, 
	 {'device': 'ich6', 
	  'specParams': {}, 
	  'type': 'sound', 
	  'deviceId': '7c74505d-6181-4e5b-bdfe-c03319bf468f'}, 
	 {'device': 'memballoon', 
	  'specParams': {'model': 'virtio'}, 
	  'type': 'balloon', 
	  'deviceId': 'b2c0ae77-dace-4539-bade-f2c776061cef'}], 
     'smp': '1', 
     'vmType': 'kvm', 
     'timeOffset': '0', 
     'memSize': 1024, 
     'spiceSslCipherSuite': 'DEFAULT', 
     'smpCoresPerSocket': '1', 
     'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 
     'smartcardEnable': 'false', 
     'vmName': 'p_rhel6x64-29', 
     'display': 'qxl', 
     'transparentHugePages': 'true', 
     'nice': '0'}

If I counted correctly it has total of 31 devices,
the odd thing is that 26 of them are sound cards ich6

IIRC qemu has 32 slots ?

Comment 4 Barak 2013-09-09 20:30:49 UTC
Can we please get the log of the creation of this VM on the source host ?
'vmName': 'p_rhel6x64-29'

Comment 5 Barak 2013-09-09 20:42:08 UTC
Can we also get the DB backup (from log collector)

Comment 6 Michal Skrivanek 2013-09-13 07:12:38 UTC
also - I suppose the VM was edited or something, wasn't it? Or is it a freshly created new VM in webadmin?

Comment 7 Robert McSwain 2013-09-13 15:37:42 UTC
I am providing a RHEV Database from my customer who is experiencing this same issue.

Comment 10 Omer Frenkel 2013-09-16 08:48:46 UTC
please attach engine and vdsm logs

Comment 14 Omer Frenkel 2013-09-25 06:39:56 UTC
I was able to reproduce this, steps:
on 3.2:
1. create stateless desktop vm.
2. start it and once its up stop it.
3. observe vm_device table in the db and see additional sound device,
or on next run observe creation xml in vdsm.log and see the multiple sound device.
4. execute step 2 until vm cannot be started anymore with the 'No more PCI addresses' error. (this depend on number of other device in the vm, monitors, usb, disks, etc..)

on 3.3 this is reproducible only if cluster compatibility level is 3.0.

problem is that when restoring the vm configuration (as it is statelss), code that looks for existing sound devices is wrong

Comment 16 Michal Skrivanek 2013-10-03 12:09:57 UTC
merged to ovirt-engine-3.3

Comment 18 Ilanit Stein 2013-10-13 10:18:37 UTC
Verified on is18. 

Followed the reproduce flow in comment 14, on 3.0 cluster.

On vdsm log, the device list in the xml did not increase for repeatedly VM  stop & start.

Comment 19 Michal Skrivanek 2013-11-11 14:54:47 UTC
*** Bug 1024833 has been marked as a duplicate of this bug. ***

Comment 20 Charlie 2013-11-28 00:13:35 UTC
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 21 errata-xmlrpc 2014-01-21 17:36:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0038.html


Note You need to log in before you can comment on or make changes to this bug.