During overcloud deployment we can find following commands are getting executed in the logs:
++ pcs status --full
++ grep openstack-keystone
++ grep -v Clone
+ node_states=' openstack-keystone (systemd:openstack-
keystone): (target-role:Stopped) Started
controller-2
openstack-keystone (systemd:openstack-keystone): (target-
role:Stopped) Started controller-0
openstack-keystone (systemd:openstack-keystone): (target-
role:Stopped) Started controller-1'
+ echo ' openstack-keystone (systemd:openstack-
keystone): (target-role:Stopped) Started
controller-2
openstack-keystone (systemd:openstack-keystone): (target-
role:Stopped) Started controller-0
openstack-keystone (systemd:openstack-keystone): (target-
role:Stopped) Started controller-1'
+ grep -q Started
+ echo 'openstack-keystone not yet stopped, sleeping 3 seconds.'
+ sleep 3
^^ the item above has occurs 37 times then:
+ echo 'openstack-keystone has stopped'
+ return
+ pcs status
+ grep haproxy-clone
+ pcs resource restart haproxy-clone
+ pcs resource restart redis-master
+ pcs resource restart mongod-clone
+ pcs resource restart rabbitmq-clone
Error: Could not complete shutdown of rabbitmq-clone, 1 resources
remaining
Error performing operation: Timer expired
Set 'rabbitmq-clone' option: id=rabbitmq-clone-meta_attributes-target-
role set=rabbitmq-clone-meta_attributes name=target-role=stopped
Waiting for 1 resources to stop:
* rabbitmq-clone
* rabbitmq-clone
Deleted 'rabbitmq-clone' option: id=rabbitmq-clone-meta_attributes-
target-role name=target-role
this is part of the ControllerPostPuppetRestartDeployment where it
failed deployment:
ControllerPostPuppetRestartConfig:
type: OS::Heat::SoftwareConfig
properties:
group: script
config: {get_file: pacemaker_resource_restart.sh}
-- pacemaker_resource_restart.sh --
38 if [ "$pacemaker_status" = "active" -a \
39 "$(hiera bootstrap_nodeid)" = "$(facter hostname)" ]; then
40
41 #ensure neutron constraints like
42 #https://review.openstack.org/#/c/245093/
43 if pcs constraint order show | grep "start neutron-server-
clone then start neutron-ovs-cleanup-clone"; then
44 pcs constraint remove order-neutron-server-clone-neutron-
ovs-cleanup-clone-mandatory
45 fi
46
47 pcs resource disable httpd
48 check_resource httpd stopped 300
49 pcs resource disable openstack-keystone
50 check_resource openstack-keystone stopped 1200
51
52 if pcs status | grep haproxy-clone; then
53 pcs resource restart haproxy-clone
54 fi
55 pcs resource restart redis-master
56 pcs resource restart mongod-clone
57 pcs resource restart rabbitmq-clone
^^ Deployment failed in this step and didn't continue. Thus the
resources are all stopped, as they all depend on the keystone, which
wasn't enabled.
58 pcs resource restart memcached-clone
59 pcs resource restart galera-master
60
61 pcs resource enable openstack-keystone
62 check_resource openstack-keystone started 300
63 pcs resource enable httpd
64 check_resource httpd started 800
65
^^ We were able to re-start the rabbitmq-clone by unmanaging it rabbit
from Pacemaker and trying to start it manually, this worked perfectly
^^
And the most important the deployment ends with:
Error: Could not prefetch keystone_tenant provider 'openstack':
undefined method collect' for nil:NilClass
Error: Could not prefetch keystone_role provider 'openstack': undefined
method collect' for nil:NilClass
Error: Could not prefetch keystone_user provider 'openstack': undefined
method collect' for nil:NilClass
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@adm
in]: Could not evaluate: undefined method empty?' for nil:NilClass
Warning: /Stage[main]/Heat::Keystone::Domain/Exec[heat_domain_create]:
Skipping because of failed dependencies
Hi team,
I increased the timeout to 300 and now the rabbitmq-clone can restart successfully. Sorry for the noise and closing the bug.
Best Regards,
Chen
During overcloud deployment we can find following commands are getting executed in the logs: ++ pcs status --full ++ grep openstack-keystone ++ grep -v Clone + node_states=' openstack-keystone (systemd:openstack- keystone): (target-role:Stopped) Started controller-2 openstack-keystone (systemd:openstack-keystone): (target- role:Stopped) Started controller-0 openstack-keystone (systemd:openstack-keystone): (target- role:Stopped) Started controller-1' + echo ' openstack-keystone (systemd:openstack- keystone): (target-role:Stopped) Started controller-2 openstack-keystone (systemd:openstack-keystone): (target- role:Stopped) Started controller-0 openstack-keystone (systemd:openstack-keystone): (target- role:Stopped) Started controller-1' + grep -q Started + echo 'openstack-keystone not yet stopped, sleeping 3 seconds.' + sleep 3 ^^ the item above has occurs 37 times then: + echo 'openstack-keystone has stopped' + return + pcs status + grep haproxy-clone + pcs resource restart haproxy-clone + pcs resource restart redis-master + pcs resource restart mongod-clone + pcs resource restart rabbitmq-clone Error: Could not complete shutdown of rabbitmq-clone, 1 resources remaining Error performing operation: Timer expired Set 'rabbitmq-clone' option: id=rabbitmq-clone-meta_attributes-target- role set=rabbitmq-clone-meta_attributes name=target-role=stopped Waiting for 1 resources to stop: * rabbitmq-clone * rabbitmq-clone Deleted 'rabbitmq-clone' option: id=rabbitmq-clone-meta_attributes- target-role name=target-role this is part of the ControllerPostPuppetRestartDeployment where it failed deployment: ControllerPostPuppetRestartConfig: type: OS::Heat::SoftwareConfig properties: group: script config: {get_file: pacemaker_resource_restart.sh} -- pacemaker_resource_restart.sh -- 38 if [ "$pacemaker_status" = "active" -a \ 39 "$(hiera bootstrap_nodeid)" = "$(facter hostname)" ]; then 40 41 #ensure neutron constraints like 42 #https://review.openstack.org/#/c/245093/ 43 if pcs constraint order show | grep "start neutron-server- clone then start neutron-ovs-cleanup-clone"; then 44 pcs constraint remove order-neutron-server-clone-neutron- ovs-cleanup-clone-mandatory 45 fi 46 47 pcs resource disable httpd 48 check_resource httpd stopped 300 49 pcs resource disable openstack-keystone 50 check_resource openstack-keystone stopped 1200 51 52 if pcs status | grep haproxy-clone; then 53 pcs resource restart haproxy-clone 54 fi 55 pcs resource restart redis-master 56 pcs resource restart mongod-clone 57 pcs resource restart rabbitmq-clone ^^ Deployment failed in this step and didn't continue. Thus the resources are all stopped, as they all depend on the keystone, which wasn't enabled. 58 pcs resource restart memcached-clone 59 pcs resource restart galera-master 60 61 pcs resource enable openstack-keystone 62 check_resource openstack-keystone started 300 63 pcs resource enable httpd 64 check_resource httpd started 800 65 ^^ We were able to re-start the rabbitmq-clone by unmanaging it rabbit from Pacemaker and trying to start it manually, this worked perfectly ^^ And the most important the deployment ends with: Error: Could not prefetch keystone_tenant provider 'openstack': undefined method collect' for nil:NilClass Error: Could not prefetch keystone_role provider 'openstack': undefined method collect' for nil:NilClass Error: Could not prefetch keystone_user provider 'openstack': undefined method collect' for nil:NilClass Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@adm in]: Could not evaluate: undefined method empty?' for nil:NilClass Warning: /Stage[main]/Heat::Keystone::Domain/Exec[heat_domain_create]: Skipping because of failed dependencies