Bug 1464464 - L->M upgrade fails on installing aodh step
Summary: L->M upgrade fails on installing aodh step
Keywords:
Status: CLOSED DUPLICATE of bug 1451101
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 9.0 (Mitaka)
Assignee: Sofer Athlan-Guyot
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks: 1451101
TreeView+ depends on / blocked
 
Reported: 2017-06-23 13:42 UTC by Yolanda Robla
Modified: 2017-07-25 09:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-25 09:47:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
os-collect-config from controller02 (3.06 MB, text/plain)
2017-06-23 13:54 UTC, Yolanda Robla
no flags Details
os-collect-config from controller0 (3.22 MB, text/plain)
2017-06-23 14:25 UTC, Yolanda Robla
no flags Details
deployed.json on controller00 (14.83 KB, text/plain)
2017-06-23 14:42 UTC, Yolanda Robla
no flags Details
sosreport from controller00 (17.33 MB, application/x-xz)
2017-06-23 15:21 UTC, Yolanda Robla
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1465939 1 unspecified CLOSED when migrating from 8 to 9 openstack-ceilometer-alarm-evaluator_start_0 on overcloud-test-controller-0 'not installed' 2023-09-14 04:00:02 UTC

Internal Links: 1465939

Description Yolanda Robla 2017-06-23 13:42:01 UTC
Description of problem:

I'm running the first step to upgrade overcloud nodes, from L->M . I'm using 3 baremetal controllers, 1 baremetal compute.
After performing first step:

 openstack overcloud deploy --force-postconfig --control-scale 3 --compute-scale 1 --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml  -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-aodh.yaml  --ntp-server 135.248.16.241

I'm getting this failure:

  "status": "FAILED", 
  "server_id": "413b675e-2106-443d-9719-08e292196672", 
  "config_id": "3306a226-7a45-46ed-8d2e-142ec8b4157c", 
  "output_values": {
    "deploy_stdout": " Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]\n Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]\nERROR: cluster finished transition but openstack-ceilometer-alarm-evaluator was not in stopped state, exiting.\n", 
    "deploy_stderr": "", 
    "deploy_status_code": 1
  }, 
  "creation_time": "2017-06-23T13:05:51", 
  "updated_time": "2017-06-23T13:06:34", 
  "input_values": {}, 
  "action": "CREATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", 

On the controller, i can see the ceilometer resources are not started:

[heat-admin@overcloud-controller-2 ~]$ sudo pcs status
Cluster name: tripleo_cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: overcloud-controller-0 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Fri Jun 23 13:40:36 2017		Last change: Fri Jun 23 13:06:31 2017 by root via crm_resource on overcloud-controller-0

3 nodes and 112 resources configured: 3 resources DISABLED and 0 BLOCKED from being started due to failures

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

 ip-172.17.1.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.17.4.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-10.5.195.91	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
 ip-172.17.3.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.31.255.111	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-172.17.1.11	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-0 ]
     Slaves: [ overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: mongod-clone [mongod]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: memcached-clone [memcached]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]
     openstack-ceilometer-alarm-notifier	(systemd:openstack-ceilometer-alarm-notifier):	FAILED overcloud-controller-1
     openstack-ceilometer-alarm-notifier	(systemd:openstack-ceilometer-alarm-notifier):	FAILED overcloud-controller-2
     Started: [ overcloud-controller-0 ]
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-api-clone [openstack-nova-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-glance-api-clone [openstack-glance-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: delay-clone [delay]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: httpd-clone [httpd]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-keystone-clone [openstack-keystone]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-server-clone [neutron-server]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]
     openstack-ceilometer-alarm-evaluator	(systemd:openstack-ceilometer-alarm-evaluator):	FAILED overcloud-controller-1 (disabled)
     openstack-ceilometer-alarm-evaluator	(systemd:openstack-ceilometer-alarm-evaluator):	FAILED overcloud-controller-2 (disabled)
     Started: [ overcloud-controller-0 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started overcloud-controller-0
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Failed Actions:
* openstack-ceilometer-alarm-notifier_monitor_60000 on overcloud-controller-1 'not running' (7): call=488, status=complete, exitreason='none',
    last-rc-change='Fri Jun 23 13:07:30 2017', queued=0ms, exec=0ms
* openstack-ceilometer-alarm-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=486, status=complete, exitreason='none',
    last-rc-change='Fri Jun 23 13:07:28 2017', queued=0ms, exec=0ms
* openstack-ceilometer-alarm-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=507, status=complete, exitreason='none',
    last-rc-change='Fri Jun 23 13:07:28 2017', queued=0ms, exec=0ms
* openstack-ceilometer-alarm-notifier_monitor_60000 on overcloud-controller-2 'not running' (7): call=509, status=complete, exitreason='none',
    last-rc-change='Fri Jun 23 13:07:30 2017', queued=0ms, exec=0ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


sudo systemctl | grep openstack-ceilometer
  openstack-ceilometer-api.service                                                                                               loaded    active running   Cluster Controlled openstack-ceilometer-api
  openstack-ceilometer-central.service                                                                                           loaded    active running   Cluster Controlled openstack-ceilometer-central
  openstack-ceilometer-collector.service                                                                                         loaded    active running   Cluster Controlled openstack-ceilometer-collector
  openstack-ceilometer-notification.service                                                                                      loaded    active running   Cluster Controlled openstack-ceilometer-notification

Comment 3 Yolanda Robla 2017-06-23 13:54:53 UTC
Created attachment 1291099 [details]
os-collect-config from controller02

Comment 4 Yolanda Robla 2017-06-23 14:25:19 UTC
Created attachment 1291106 [details]
os-collect-config from controller0

Comment 5 Yolanda Robla 2017-06-23 14:42:54 UTC
Created attachment 1291108 [details]
deployed.json on controller00

Comment 6 Yolanda Robla 2017-06-23 15:21:16 UTC
Created attachment 1291135 [details]
sosreport from controller00

Comment 7 Sofer Athlan-Guyot 2017-06-23 17:35:38 UTC
oki,

so we may have a race condition here that leads to failure to disable the openstack-ceilometer-alarm-evaluator resource.

During the switch to aodh we remove openstack-ceilometer-alarm package
on all three controller nodes.

If the pkg removal happens on any of the non-bootstrap controller
before the pcs resource is disabled, this operation will fail on the
non-bootstrap node where the package has been remove (as it can't
stop).

Comment 13 Yolanda Robla 2017-06-27 13:05:54 UTC
Thanks for the patch! After applying it manually to the undercloud, i could successfully perform the aodh upgrade step.


Note You need to log in before you can comment on or make changes to this bug.