Bug 1015134 - Host: Exit message: internal error No more available PCI addresses
Summary: Host: Exit message: internal error No more available PCI addresses
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.2.5
Assignee: Omer Frenkel
QA Contact: Pavel Novotny
URL:
Whiteboard: virt
Depends On: 1004066
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-03 13:47 UTC by rhev-integ
Modified: 2018-12-04 15:58 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Every restart of a stateless vm added a new sound device until there were too many devices and the vm failed to start. When restoring configuration of stateless VM, sound device wasn't found, so a new sound device was added. The code that looks for sound device is fixed, and now when stateless configuration is restored, the sound device is found and no new sound device is created. Stateless vm has only one sound device as it should, and vm can be restarted endless number of times.
Clone Of: 1004066
Environment:
Last Closed: 2013-12-18 14:08:52 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 532923 0 None None None Never
Red Hat Product Errata RHBA-2013:1831 0 normal SHIPPED_LIVE rhevm 3.2.5 bug fix update 2013-12-18 19:06:59 UTC
oVirt gerrit 19561 0 None None None Never
oVirt gerrit 19677 0 None None None Never

Description rhev-integ 2013-10-03 13:47:30 UTC
+++ This bug is a RHEV-M zstream clone. The original bug is: +++
+++   https://bugzilla.redhat.com/show_bug.cgi?id=1004066. +++

======================================================================



----------------------------------------------------------------------
Following comment by baiesi on September 03 at 20:29:57, 2013

Summary:
Host: Exit message: internal error No more available PCI addresses

Description of problem:
I first noticed vm migration failing from the admin portal indicating "Migration failed due to Error: Fatal error during migration". Then correlated that the destination Host had an issue indicating "Exit message: internal error No more available PCI addresses".  All vm(s) associated with the Host are in a frozen state in the UI, some contain hour-glass, some migrating to and some a power up status that never change. Since this issue had occurred, the Host systems memory resources had been slowly climbing now at 82.06%. 

The test environment is currently in this condition and will remain in this state for a brief period of time in case developer wish to get access to it. I have not tried to get my Host functional again by trying to put the Host into maintenance mode. I will try this if the developers have no need for the current system state ans see if will recover.

Current state;
-Unable to migrate vm(s) to this Host
-Unable to shut-down the Hosts vm(s): a dialog indicates "Cannot shut-down VM. VM is not running."
-Unable to suspend the Hosts vm(s): a dialog indicates "Cannot hibernate VM. VM is not up."
-Unable to run the Hosts vm(s): a dialog indicates "Cannot run VM. VM is running."
-Unable to cancel migration of the Hosts vm(s): a dialog indicates "Cannot cancel migration for non migrating VM"
-The Admins Portal UI shows the Vm(s) for the Host in a frozen state as indicated above.

Version-Release number of selected component:
Host Info
OS       : RHEL6Server - 6.4.0.4.el6
Kernel   : 2.6.32 - 358.14.1.el6.x86_64
KVM Ver  : 0.12.1.2 - 2.355.el6_4.5
Libvirt  : libvirt-0.10.2-18.el6_4.9
vdsm     : vdsm-4.10.2-23.0.el6ev
spice    : 0.12.0 - 12.el6_4.2

How reproducible:Undetermined since this was the first run
Steps to Reproduce:
1.Run System test load against the system for an extended period of time

Actual results:
Failed migration errors with Host generated event: " Exit message: internal error No more available PCI addresses."

Expected results:
Continued system operation and functionality

Additional info:
I have running a 30 day test using Rhevm 3.2.
Type            : System / Longevity
Target Duration : 30 days
Current Duration: 26 days / Run 1

System Test Env:
-Red Hat Enterprise Virtualization Manager Version: 3.2.1-0.39.el6ev
-Qty 1 Rhel6.4, Rhevm Server,  high end Dell PowerEdge R710 Dual 8core, 32GBRam, rhevm-3.2.1-0.39.el6ev.noarch
-Qty 4 Rhel6.4, Hosts all high end Dell, PowerEdge R710 Dual 8core, 16GBRam
-Qty 1 Rhel6.4, Ipa Directory Server
-Qty 3 Rhel6.4, Load Client machines to dive user simulated load.

VM(s)
Total 34 Vms created

Storage
-ISCSI Total 500G
-Name Type Storage Format Cross Data-Center-Status FreeSpace
-ISCIMainStorage Data (Master) iSCSI  V3 Active 263 GB

Data collection / monitoring:
All systems being monitored for uptime, memory, swap, cpu, networkio, diskio and disk space during the test run. (except for the IPA Server/ Clients)

System Test Load:
1. VM_Crud client, A python multi-thread client using the sdk to cycles through a crud flow of VM(s) over a period of time defined by the tester to drive load against the system  (10 threads)

2. VM_Migration client, A python multi-thread client using the sdk to cycles through migrating running vms from host to host in the test environment over a period of time defined by the tester to drive load against the system (2 threads)

3. VM_Cycling client, A python multi-thread client using the sdk to cycles through a rnd run, suspend, stop of existing VM(s) in the test environment over a period of time defined by the tester to drive load against the system (10 threads)

4. UserPortal client, A python multi-thread client using selenium  to drive the User Portal.  The client cycles through unique users to run, stop or start console remote-viewer of existing VM(s) in the test environment over a period of time defined by the tester to drive load against the system (10 threads)

Let me know if there are any additional logs or system access required.

Thanks
Bruce

----------------------------------------------------------------------
Following comment by aberezin on September 09 at 14:45:30, 2013

Barak, can you take a look?

----------------------------------------------------------------------
Following comment by baiesi on September 09 at 15:02:30, 2013

Let me know if anyone needs access to the test env in its current state.  I've sent the information out already to a few who has requested access.  The systems will be available until Wednesday EST 11:00.

----------------------------------------------------------------------
Following comment by bazulay on September 09 at 20:22:45, 2013

Below is the command sent for the destination host,


{'custom': {},
     'keyboardLayout':'en-us',
     'kvmEnable': 'true',{'custom': {},
     'keyboardLayout':'en-us',
     'kvmEnable': 'true',
     'pitReinjection': 'false',
     'acpiEnable': 'true',
     'emulatedMachine': 'rhel6.4.0',
     'cpuType': 'Westmere',
     'vmId': '27f78875-8e50-4dc1-9c24-e519bd8683ce',
     'devices':
	[{'device': 'qxl',
	  'specParams': {'vram': '65536'},
	  'type': 'video',
	  'deviceId': 'c4b8cb2c-a27b-46c6-a08d-ee96175a60fe'},
	 {'index': '2',
	  'iface': 'ide',
	  'bootOrder': '2',
	  'specParams': {'path': ''},
	  'readonly': 'true',
	  'deviceId': '0113864d-9dfe-4b14-8cc8-d83a85db7f98',
	  'path': '',
	  'device': 'cdrom',
	  'shared': 'false',
	  'type': 'disk'},
	 {'index': 0,
	  'iface': 'virtio',
	  'format': 'cow',
	  'bootOrder': '1',
	  'poolID': '5849b030-626e-47cb-ad90-3ce782d831b3',
	  'volumeID': 'e8fe1b6e-b62c-487c-bf04-b9a6168131fb',
	  'imageID': '28e91ef5-a15b-421d-adda-da5a2294d1ae',
	  'specParams': {},
	  'readonly': 'false',
	  'domainID': '156a4b8c-f139-46c5-9e0b-fceaaaeaff4f',
	  'optional': 'false',
	  'deviceId': '28e91ef5-a15b-421d-adda-da5a2294d1ae',
	  'address': {'bus': '0x00', ' slot': '0x05', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'},
	  'device': 'disk',
	  'shared': 'false',
	  'propagateErrors': 'off',
	  'type': 'disk'},
	 {'nicModel': 'e1000',
	  'macAddr': '00:1a:4a:10:86:31',
	  'linkActive': 'true',
	  'network': 'rhevm',
	  'filter': 'vdsm-no-mac-spoofing',
	  'specParams': {},
	  'deviceId': '051fbde9-71ab-49c1-bdf6-a22a8e24355e',
	  'address': {'bus': '0x00', ' slot': '0x03', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'},
	  'device': 'bridge',
	  'type': 'interface'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '3524a274-3439-475b-93c5-d420cafb1be3'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'c0dbb199-3648-478f-b995-45180b3c1cba'},
	 {'device': 'ich6', 'specParams': {},
	  'type': 'sound',
	  'deviceId': 'a8a760f2-6b05-4390-b4f3-0b4c7eb1d0dc'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '4db9217e-6c80-4638-974b-863b3b6dae1a'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '625b6716-8855-419f-9952-f01a542aec7f'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '4adbc5d0-f5b5-4ac1-9df9-327561b7f4ca'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '579f84f5-ae32-49d7-9977-18d57ab90a41'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '17d51030-b43d-43fa-a413-e06c8b2d62bb'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '2d28d304-32eb-423d-baee-91f218eb43f2'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'ff4c9f6a-3e7e-4a25-b0d7-776f86e5a3e4'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'f05b4df4-52b9-48d4-8a7d-f32304df9da8'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'c2bf70af-48f7-4058-a611-d95dead221c7'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '85600e66-d6e9-4bea-ac97-e9fd358ff33b'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '1fa4a40b-d847-4b81-a572-9150135aa511'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '326bf1da-0a50-4da6-ad45-f4794ab6feea'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '824d1bf0-7fff-439b-8f7a-4c0989342a49'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '4d789bd1-6d3f-4599-8299-68467c4d591d'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'eb829e9a-ef2f-4401-a1f0-49d10483c097'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '48fe2b91-46e4-461c-ae0c-bb96cb4b2676'},
	 {'device':
	  'ich6', 'specParams': {},
	  'type': 'sound',
	  'deviceId': 'b5e910dd-f7b9-497b-a518-35f750c8e1c0'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '5cc0325a-03cd-408d-8e88-c5af29f108d8'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': 'ea36e407-91f6-43ea-8f41-1eebff6e4661'},
	 {'device': 'ich6',
	  'specParams': {},
	  'type': 'sound',
	  'deviceId': '0212b3b5-5170-4877-87cb-40c1b290b619'}, 
	 {'device': 'ich6', 
	  'specParams': {}, 
	  'type': 'sound', 
	  'deviceId': '11c16b25-7851-4b3d-8866-cc4414721ee3'}, 
	 {'device': 'ich6', 
	  'specParams': {}, 
	  'type': 'sound', 
	  'deviceId': '5494e581-8823-4607-bfa0-e1419173567d'}, 
	 {'device': 'ich6', 
	  'specParams': {}, 
	  'type': 'sound', 
	  'deviceId': '7c74505d-6181-4e5b-bdfe-c03319bf468f'}, 
	 {'device': 'memballoon', 
	  'specParams': {'model': 'virtio'}, 
	  'type': 'balloon', 
	  'deviceId': 'b2c0ae77-dace-4539-bade-f2c776061cef'}], 
     'smp': '1', 
     'vmType': 'kvm', 
     'timeOffset': '0', 
     'memSize': 1024, 
     'spiceSslCipherSuite': 'DEFAULT', 
     'smpCoresPerSocket': '1', 
     'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 
     'smartcardEnable': 'false', 
     'vmName': 'p_rhel6x64-29', 
     'display': 'qxl', 
     'transparentHugePages': 'true', 
     'nice': '0'}

If I counted correctly it has total of 31 devices,
the odd thing is that 26 of them are sound cards ich6

IIRC qemu has 32 slots ?

----------------------------------------------------------------------
Following comment by bazulay on September 09 at 20:30:49, 2013

Can we please get the log of the creation of this VM on the source host ?
'vmName': 'p_rhel6x64-29'

----------------------------------------------------------------------
Following comment by bazulay on September 09 at 20:42:08, 2013

Can we also get the DB backup (from log collector)

----------------------------------------------------------------------
Following comment by michal.skrivanek on September 13 at 07:12:38, 2013

also - I suppose the VM was edited or something, wasn't it? Or is it a freshly created new VM in webadmin?

----------------------------------------------------------------------
Following comment by rmcswain on September 13 at 15:37:42, 2013

I am providing a RHEV Database from my customer who is experiencing this same issue.

----------------------------------------------------------------------
Following comment by rmcswain on September 13 at 15:42:18, 2013

Created attachment 797398 [details]
Customer's db from RHEV Envrionment

----------------------------------------------------------------------
Following comment by rmcswain on September 13 at 15:46:07, 2013

Customer provided his logcollector via dropbox.redhat.com/incoming.

filename = 00930084-sosreport-LogCollector-20130912153843.tar.xz

The MD5 for this file is df6d1c14e3aeb34def37d0bbba3f4b69 and its size is 530.6M

Note that, this afternoon (Sept 12, 2013), the customer did the following:

- removed some of the problematic Pool member VMs (including LinuxLabB-37)

- created new VMs in the pool

- renamed the new VMs to take the place of the removed VMs

----------------------------------------------------------------------
Following comment by ofrenkel on September 16 at 08:48:46, 2013

please attach engine and vdsm logs

----------------------------------------------------------------------
Following comment by ofrenkel on September 16 at 08:50:50, 2013

(In reply to Robert McSwain from comment #9)
> Customer provided his logcollector via dropbox.redhat.com/incoming.
> 
> filename = 00930084-sosreport-LogCollector-20130912153843.tar.xz
> 

i have no access to this link, is there other way i can have this file?

----------------------------------------------------------------------
Following comment by avoss on September 16 at 13:55:32, 2013

Omer: You'll be able to access the file here:

http://spacesphere.usersys.redhat.com/bz1004066/00930084-sosreport-LogCollector-20130912153843.tar.xz

----------------------------------------------------------------------
Following comment by avoss on September 19 at 19:15:32, 2013

Is there anything I can do to assist?

----------------------------------------------------------------------
Following comment by ofrenkel on September 25 at 06:39:56, 2013

I was able to reproduce this, steps:
on 3.2:
1. create stateless desktop vm.
2. start it and once its up stop it.
3. observe vm_device table in the db and see additional sound device,
or on next run observe creation xml in vdsm.log and see the multiple sound device.
4. execute step 2 until vm cannot be started anymore with the 'No more PCI addresses' error. (this depend on number of other device in the vm, monitors, usb, disks, etc..)

on 3.3 this is reproducible only if cluster compatibility level is 3.0.

problem is that when restoring the vm configuration (as it is statelss), code that looks for existing sound devices is wrong

----------------------------------------------------------------------
Following comment by michal.skrivanek on September 25 at 08:01:11, 2013

this seems to deserve 3.2.z

----------------------------------------------------------------------
Following comment by michal.skrivanek on October 03 at 12:09:57, 2013

merged to ovirt-engine-3.3

Comment 7 Charlie 2013-11-28 00:42:45 UTC
This bug is currently attached to errata RHBA-2013:16431. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 8 Pavel Novotny 2013-12-04 16:09:37 UTC
Verified in rhevm-3.2.5-0.48.el6ev.noarch (sf22).

Verified according to bug 1004066, comment 14:
1. Create new desktop stateless VM (I used RHEL 6.5 OS type). 
2. Run the VM, after it's up, check the vm_device DB table for number of sound devices:

engine=> SELECT vm_id, type,device, alias from vm_device where type='sound' and vm_id='bcaee5b7-6983-43b6-ba22-5e0df795f910';
                vm_id                 | type  | device | alias  
--------------------------------------+-------+--------+--------
 bcaee5b7-6983-43b6-ba22-5e0df795f910 | sound | ich6   | sound0
(1 row)

3. Power off VM and check again number of sound devices:

SELECT vm_id, type,device, alias from vm_device where type='sound' and vm_id='bcaee5b7-6983-43b6-ba22-5e0df795f910';
                vm_id                 | type  | device | alias 
--------------------------------------+-------+--------+-------
 bcaee5b7-6983-43b6-ba22-5e0df795f910 | sound | ich6   | 
(1 row)

I have repeated steps 3 and 4 several times. The number of sound devices haven't changed.

Comment 10 errata-xmlrpc 2013-12-18 14:08:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1831.html


Note You need to log in before you can comment on or make changes to this bug.