Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1611994

Summary: nova_compute container stuck in restarting state after instance-ha deployment
Product: Red Hat OpenStack Reporter: MD Sufiyan <msufiyan>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED DUPLICATE QA Contact: Gurenko Alex <agurenko>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: aschultz, mburns, michele
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-04 15:52:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nova_compute logs none

Description MD Sufiyan 2018-08-03 08:14:36 UTC
Created attachment 1472906 [details]
nova_compute logs

Description of problem:

I have followed doc[1] for instance-ha deployment and deployed the overcloud successfully, however noticed nova_compute container stuck in restarting state due to which I'm not able to spawn any instance.

~~~
[root@compute-1 ~]# docker ps | grep -i nova_compute
580317c906ce        192.168.24.1:8787/rhosp13/openstack-nova-compute:2018-07-30.2                "kolla_start"       2 days ago          Restarting (127) 13 hours ago                       nova_compute
[root@compute-1 ~]# 
~~~


[1] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/instance_ha.html

Version-Release number of selected component (if applicable):
OSP13

How reproducible:
Everytime

Steps to Reproduce:
==================

1) Create Composite role for compute,controller & CephStorage and the below registry in roles.yaml for additional instance-ha configuration under compute role

~~
    - OS::TripleO::Services::ComputeInstanceHA
    - OS::TripleO::Services::PacemakerRemote
~~

2) Create fencing.yaml 

~~
openstack overcloud generate fencing -a reboot --ipmi-lanplus --ipmi-level administrator --output fencing.yml instackenv.json
~~

3) Create compute-instanceha.yaml for enbling instance-ha

~~
resource_registry:
  OS::TripleO::Services::ComputeInstanceHA: /usr/share/openstack-tripleo-heat-templates/puppet/services/pacemaker/compute-instanceha.yaml

parameter_defaults:
  EnableInstanceHA: true
~~

4) Create a overcloud deployment script 

~~
openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-r /home/stack/virt/roles_data.yaml -e /home/stack/virt/compute-instanceha.yaml -e /home/stack/virt/fencing.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml \
--log-file overcloud_deployment_49.log
~~

5) overcloud deployment was successful, please refer the link for deployment result. However, nova_compute container remain stuck in restarting.

http://pastebin.test.redhat.com/626827

Expected result
===============

Nova_compute container should be up and remain healthy after deployment 

Actual result
=============

nova_compute container stuck in restarting due to which not able to spawn any instance

Additional info:-

[root@compute-1 ~]# docker logs nova_compute | tail
++ [[ ! -d /var/log/kolla/nova ]]
+++ stat -c %a /var/log/kolla/nova
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/nova
++ . /usr/local/bin/kolla_nova_extend_start
+++ [[ ! -d /var/lib/nova/instances ]]
+ echo 'Running command: '\''/var/lib/nova/instanceha/check-run-nova-compute '\'''
Running command: '/var/lib/nova/instanceha/check-run-nova-compute '
+ exec /var/lib/nova/instanceha/check-run-nova-compute
/usr/bin/env: python -utt: No such file or directory
[root@compute-1 ~]#

Comment 2 Michele Baldessari 2018-08-04 15:52:00 UTC

*** This bug has been marked as a duplicate of bug 1612088 ***