Bug 1551651

Summary: Instance creation fails with "internal error: process exited while connecting to monitor"
Product: Red Hat OpenStack Reporter: Joe H. Rahme <jhakimra>
Component: openstack-tripleoAssignee: James Slagle <jslagle>
Status: CLOSED DUPLICATE QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: aschultz, kchamart, mburns, rhel-osp-director-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-09 14:25:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe H. Rahme 2018-03-05 15:57:57 UTC
Description of problem:

Instance creation fails with "internal error: process exited while connecting to monitor"


How reproducible:
Every time.

Steps to Reproduce:
1. Deploy 13 with customized deployment job. controller:1, compute:2
2. Create flavor, image and launch an instance

3. No valid host was found

Actual results:

The instance isn't spawned with the following errors found in /var/log/containers/nova/nova-conductor.log

[root@controller-0 nova]# grep -ri error .
./nova-conductor.log:2018-03-02 21:32:03.051 22 ERROR nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Error from last host: compute-1.localdomain (node compute-1.localdomain): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1855, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2123, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a was re-scheduled: internal error: process exited while connecting to monitor\n']
./nova-conductor.log:2018-03-02 21:32:48.732 23 ERROR nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Error from last host: compute-0.localdomain (node compute-0.localdomain): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1855, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2123, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a was re-scheduled: internal error: process exited while connecting to monitor\n']
./nova-conductor.log:2018-03-02 21:32:48.734 23 WARNING nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Setting instance to ERROR state.: MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a.

Expected results:

Instance is spawned

Additional info:

Interactive SSH from the undercloud to the compute node takes an unusually very long time. SSH from the undercloud to the controller is instant, as usual. No idea if this is related.

Comment 2 Kashyap Chamarthy 2018-03-09 14:25:15 UTC
At first sight, the error message reminds me of the duplicate of:

https://bugzilla.redhat.com/show_bug.cgi?id=1543914.

Second, chatting with Joe on IRC, he says it's not reproducible anymore with current builds, with the following versions:
-----------------------------------------------------------------------------
[heat-admin@compute-0 ~]$ yum list installed | grep selinux
ceph-selinux.x86_64                 2:12.2.1-40.el7cp  @rhelosp-ceph-3.0-mon    
container-selinux.noarch            2:2.48-1.el7       @rhelosp-rhel-7.5-extras 
libselinux.x86_64                   2.5-12.el7         @anaconda/7.5            
libselinux-python.x86_64            2.5-12.el7         @anaconda/7.5            
libselinux-ruby.x86_64              2.5-12.el7         @rhos-13.0-rhel-7-signed 
libselinux-utils.x86_64             2.5-12.el7         @anaconda/7.5            
openstack-selinux.noarch            0.8.14-0.20180221131810.4e6703e.el7ost
selinux-policy.noarch               3.13.1-192.el7     @anaconda/7.5            
selinux-policy-targeted.noarch      3.13.1-192.el7     @anaconda/7.5
-----------------------------------------------------------------------------

*** This bug has been marked as a duplicate of bug 1543914 ***