1551651 – Instance creation fails with "internal error: process exited while connecting to monitor"

Bug 1551651 - Instance creation fails with "internal error: process exited while connecting to monitor"

Summary: Instance creation fails with "internal error: process exited while connecting...

Keywords:
Status:	CLOSED DUPLICATE of bug 1543914
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	James Slagle
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-05 15:57 UTC by Joe H. Rahme
Modified:	2018-03-09 14:30 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-09 14:25:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1551617	0	unspecified	CLOSED	nova boot image results in ERROR state	2023-03-21 18:48:35 UTC

Internal Links: 1551617

Description Joe H. Rahme 2018-03-05 15:57:57 UTC

Description of problem:

Instance creation fails with "internal error: process exited while connecting to monitor"


How reproducible:
Every time.

Steps to Reproduce:
1. Deploy 13 with customized deployment job. controller:1, compute:2
2. Create flavor, image and launch an instance

3. No valid host was found

Actual results:

The instance isn't spawned with the following errors found in /var/log/containers/nova/nova-conductor.log

[root@controller-0 nova]# grep -ri error .
./nova-conductor.log:2018-03-02 21:32:03.051 22 ERROR nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Error from last host: compute-1.localdomain (node compute-1.localdomain): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1855, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2123, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a was re-scheduled: internal error: process exited while connecting to monitor\n']
./nova-conductor.log:2018-03-02 21:32:48.732 23 ERROR nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Error from last host: compute-0.localdomain (node compute-0.localdomain): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1855, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2123, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a was re-scheduled: internal error: process exited while connecting to monitor\n']
./nova-conductor.log:2018-03-02 21:32:48.734 23 WARNING nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Setting instance to ERROR state.: MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a.

Expected results:

Instance is spawned

Additional info:

Interactive SSH from the undercloud to the compute node takes an unusually very long time. SSH from the undercloud to the controller is instant, as usual. No idea if this is related.

Comment 2 Kashyap Chamarthy 2018-03-09 14:25:15 UTC

At first sight, the error message reminds me of the duplicate of:

https://bugzilla.redhat.com/show_bug.cgi?id=1543914.

Second, chatting with Joe on IRC, he says it's not reproducible anymore with current builds, with the following versions:
-----------------------------------------------------------------------------
[heat-admin@compute-0 ~]$ yum list installed | grep selinux
ceph-selinux.x86_64                 2:12.2.1-40.el7cp  @rhelosp-ceph-3.0-mon    
container-selinux.noarch            2:2.48-1.el7       @rhelosp-rhel-7.5-extras 
libselinux.x86_64                   2.5-12.el7         @anaconda/7.5            
libselinux-python.x86_64            2.5-12.el7         @anaconda/7.5            
libselinux-ruby.x86_64              2.5-12.el7         @rhos-13.0-rhel-7-signed 
libselinux-utils.x86_64             2.5-12.el7         @anaconda/7.5            
openstack-selinux.noarch            0.8.14-0.20180221131810.4e6703e.el7ost
selinux-policy.noarch               3.13.1-192.el7     @anaconda/7.5            
selinux-policy-targeted.noarch      3.13.1-192.el7     @anaconda/7.5
-----------------------------------------------------------------------------

*** This bug has been marked as a duplicate of bug 1543914 ***

Note You need to log in before you can comment on or make changes to this bug.