| Summary: | overcloud update from osp8 to osp9 fails when ceilometer is in a weird state | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jeremy <jmelvin> |
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
| Status: | CLOSED WORKSFORME | QA Contact: | Omri Hochman <ohochman> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 9.0 (Mitaka) | CC: | ablum, aschultz, athomas, augol, dbecker, mburns, morazi, rhel-osp-director-maint, sathlang |
| Target Milestone: | --- | Flags: | jmelvin:
needinfo?
(athomas) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-02-09 15:13:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Jeremy
2016-10-05 21:27:26 UTC
seeing similar behavior for the keystone portion of the upgrade. The script seems to try to stop keystone in pcs but it doesn't fully stop. This was the same thing that happened for ceilometer which lead to the problem.
[root@overcloud-controller-0 heat-admin]# pcs status |grep -A3 keystone
Clone Set: openstack-keystone-clone [openstack-keystone] (unmanaged)
openstack-keystone (systemd:openstack-keystone): (target-role:Stopped) Started overcloud-controller-1 (unmanaged)
openstack-keystone (systemd:openstack-keystone): (target-role:Stopped) Started overcloud-controller-0 (unmanaged)
openstack-keystone (systemd:openstack-keystone): (target-role:Stopped) Started overcloud-controller-2 (unmanaged)
I am seeing the same error in our training environment. The heat templates put the whole cluster in maintenance (the unmanaged state):
./extraconfig/tasks/pacemaker_maintenance_mode.sh: pcs property set maintenance-mode=true
But then later try to disable a couple of ceilometer resources:
/usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh
if pcs status | grep openstack-ceilometer-alarm; then
# Disable pacemaker resources for ceilometer-alarms
pcs resource disable openstack-ceilometer-alarm-evaluator
check_resource openstack-ceilometer-alarm-evaluator stopped 600
pcs resource delete openstack-ceilometer-alarm-evaluator
pcs resource disable openstack-ceilometer-alarm-notifier
check_resource openstack-ceilometer-alarm-notifier stopped 600
pcs resource delete openstack-ceilometer-alarm-notifier
My workaround was to disable those ceilometer resources manually before running the upgrade so this validation check passed:
[stack@director ~]$ ssh heat-admin@control0 "sudo pcs resource disable
openstack-ceilometer-alarm-evaluator"
[stack@director ~]$ ssh heat-admin@control0 "sudo pcs resource disable
openstack-ceilometer-alarm-notifier"
This resulted in a successful upgrade. I feel like this is working around some issue in the upgrade logic IMO, since you can't disable a resource if its unmanaged.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. Hi, I was able to upgrade successfully aodh. So closing this one. If you still have the issue, feel free to re-open it. Regards, |