1520573 – Deployment FAILED: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]

Bug 1520573 - Deployment FAILED: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]

Summary: Deployment FAILED: /usr/bin/clustercheck >/dev/null returned 1 instead of one...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ga
Target Release:	12.0 (Pike)
Assignee:	Michele Baldessari
QA Contact:	Gurenko Alex
Docs Contact:
URL:
Whiteboard:
Depends On:	1391554 1535967
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-04 17:58 UTC by Dan Trainor
Modified:	2018-04-02 19:05 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1391554
Environment:
Last Closed:	2018-04-02 19:05:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dan Trainor 2017-12-04 17:58:48 UTC

+++ This bug was initially created as a clone of Bug #1391554 +++

Description of problem:


    "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Exec[galera-ready]/returns: change from notrun to 0 failed: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]\u001b[0m\n", 
    "deploy_status_code": 6
  }, 
  "creation_time": "2016-11-03T08:52:58", 
  "updated_time": "2016-11-03T09:25:49", 
  "input_values": {}, 
  "action": "CREATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", 
  "id": "7770a8d7-288b-4e9e-9106-31cca8cf855c"


Version-Release number of selected component (if applicable):

openstack-tripleo-heat-templates-0.8.14-14.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-14.el7ost.noarch

How reproducible:
Always 

Steps to Reproduce:
1. 
openstack overcloud deploy ${DEBUGON} --templates -e ${TEMPLATEDIR7}/nodeuserdata_env.yaml -e ${TEMPLATEDIR7}/cloudname.yaml -e  ${TEMPLATEDIR7}/environments/network-isolation.yaml -e ${TEMPLATEDIR7}/puppet-ceph-external.yaml -e ${T
EMPLATEDIR7}/ips-from-pool-all.yaml -e ${TEMPLATEDIR7}/timezone.yaml  -e ${TEMPLATEDIR7}/network-environment.yaml -e ${TEMPLATEDIR7}/network-management.yaml   -e ${TEMPLATEDIR7}/puppet-ceph-external.yaml  -e ${TEMPLATEDIR7}/scheduler_hin
ts.yaml -e ${TEMPLATEDIR7}/parameters/customer.yaml --control-scale 3 --compute-scale 4   --ceph-storage-scale 0   --control-flavor control --compute-flavor compute --ntp-server ${NTPSRV} --validation-errors-fatal  --block-storage-scale 0 --
swift-storage-scale 0


2. Wait for the deployment to finish
3. Watch Resources during deployment

Actual results:

FAILED Deployment. Nodes Up, but not finishing Post Deployment. Environment not operational 

Expected results:

Successfull OSP Deployment. 


Additional info:

There exists a puppet ticket for the same message: 

https://tickets.puppetlabs.com/browse/MODULES-3476

Here puppet 3.8.6 was in use. 

Current version on OSP8 is 

puppet-3.6.2-4.el7sat.noarch

openstack-puppet-modules-7.0.19-1.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch

--- Additional comment from Francisco Javier Lopez Y Grueber on 2016-11-03 10:40:47 EDT ---

Hi, 

there was a change in regards of the switch configuration. Network has been verified. All nodes are able ping each other on all interfaces. So, I assume this is ok. 

I am not investigating the irretating os-collect-config messages as in 

https://bugs.launchpad.net/os-collect-config/+bug/1437952

--- Additional comment from Chris Jones on 2017-11-14 09:26:09 EST ---

How reproducible is this issue? Do you have full deployment logs from a failed deployment? Or a deployment we could access that has this issue?

--- Additional comment from Dan Trainor on 2017-11-30 16:02:05 EST ---

I am able to consistently produce this in my environment, though for a different fact:

"stderr: \u001b[1;33mWarning: Facter: Could not retrieve fact='erl_ssl_path', resolution='<anonymous>': undefined method `gsub!' for false:FalseClass\u001b[0m",

Full 'openstack stack failures list overcloud --long' at http://pastebin.test.redhat.com/536719


I'm using the 2017-11-28.3 puddle, deploying via UI, with the following deployment plan options:

Base resources configuration, Containerized Deployment, environments/containers-default-parameters.yaml, environments/docker-ha.yaml, High Availability (Pacemaker)

The Overcloud deployment contains three controllers and one compute node.

I'll leave the environment up and allow access to it for Damien Ciabrini, on the suggestion of Chris Jones.

Comment 1 Dan Trainor 2017-12-04 18:02:46 UTC

This appears to be due to corosync expecting an MTu of 1500, which is not the case in some environments (such as this environment, rdocloud, where the ctlplane network is set to 1350).

The 'netmtu' parameter in corosync.conf responsible for configuring this value is available in the puppet-corosync module but not exposed in o-t-h-t.

Comment 12 Dan Trainor 2018-04-02 19:05:16 UTC

Just doing some housekeeping, and wanted to chime in that I no longer have access to this environment to test these conditions.  I was reading through my deployment notes and realized that the last time I did, I was doing some QE testing that required three controllers in HA, where this problem did not occur.  

I'll close this out but re-visit it if I, or anyone else, comes across it again.

Note You need to log in before you can comment on or make changes to this bug.