Description of problem: Instance creation fails with "internal error: process exited while connecting to monitor" How reproducible: Every time. Steps to Reproduce: 1. Deploy 13 with customized deployment job. controller:1, compute:2 2. Create flavor, image and launch an instance 3. No valid host was found Actual results: The instance isn't spawned with the following errors found in /var/log/containers/nova/nova-conductor.log [root@controller-0 nova]# grep -ri error . ./nova-conductor.log:2018-03-02 21:32:03.051 22 ERROR nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Error from last host: compute-1.localdomain (node compute-1.localdomain): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1855, in _do_build_and_run_instance\n filter_properties, request_spec)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2123, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a was re-scheduled: internal error: process exited while connecting to monitor\n'] ./nova-conductor.log:2018-03-02 21:32:48.732 23 ERROR nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Error from last host: compute-0.localdomain (node compute-0.localdomain): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1855, in _do_build_and_run_instance\n filter_properties, request_spec)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2123, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a was re-scheduled: internal error: process exited while connecting to monitor\n'] ./nova-conductor.log:2018-03-02 21:32:48.734 23 WARNING nova.scheduler.utils [req-662703b6-3937-42ad-8391-bd730d5a2f81 91f0eee13d3b4233945ac84022afd38b 23f33b8630d74f13a1489afa1d71b774 - default default] [instance: 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a] Setting instance to ERROR state.: MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 5d0102ef-f6c1-4320-845a-e5f9e7a3eb8a. Expected results: Instance is spawned Additional info: Interactive SSH from the undercloud to the compute node takes an unusually very long time. SSH from the undercloud to the controller is instant, as usual. No idea if this is related.
At first sight, the error message reminds me of the duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=1543914. Second, chatting with Joe on IRC, he says it's not reproducible anymore with current builds, with the following versions: ----------------------------------------------------------------------------- [heat-admin@compute-0 ~]$ yum list installed | grep selinux ceph-selinux.x86_64 2:12.2.1-40.el7cp @rhelosp-ceph-3.0-mon container-selinux.noarch 2:2.48-1.el7 @rhelosp-rhel-7.5-extras libselinux.x86_64 2.5-12.el7 @anaconda/7.5 libselinux-python.x86_64 2.5-12.el7 @anaconda/7.5 libselinux-ruby.x86_64 2.5-12.el7 @rhos-13.0-rhel-7-signed libselinux-utils.x86_64 2.5-12.el7 @anaconda/7.5 openstack-selinux.noarch 0.8.14-0.20180221131810.4e6703e.el7ost selinux-policy.noarch 3.13.1-192.el7 @anaconda/7.5 selinux-policy-targeted.noarch 3.13.1-192.el7 @anaconda/7.5 ----------------------------------------------------------------------------- *** This bug has been marked as a duplicate of bug 1543914 ***