Bug 1290572
Summary: | Scaling out an updated overcloud (7.1->7.2) fails | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||||||
Component: | rhosp-director | Assignee: | Jiri Stransky <jstransk> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 7.0 (Kilo) | CC: | emacchi, jcoufal, jstransk, mburns, mcornea, rhel-osp-director-maint, yeylon | ||||||||
Target Milestone: | y2 | ||||||||||
Target Release: | 7.0 (Kilo) | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | openstack-tripleo-heat-templates-0.8.6-93.el7ost | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2015-12-21 16:54:38 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
This is obvious Puppet fails because systemd returns 1 when trying to restart nova-scheduler. Looking at the few logs you provided, it looks like we might need to run 'systemctl daemon-reload' after yum update sometimes. "If I rerun the deploy command for a 2nd time the stack update finishes successfully." > I'm perplex about it, that would mean my statement is wrong and a second runs provides what we missed during the first run. Can you try to run 'systemctl daemon-reload' at the end of step 2 and then run step 3? Created attachment 1104509 [details]
os-collect-config
Attaching /var/log/messages and the os-collect-config journal in case they provide any helpful info for this. I'll run a fresh environment tomorrow and run systemctl daemon-reload after step 2 and let you know how it goes.
Created attachment 1104510 [details]
/var/log/messages
The issue was that pacemaker wasn't getting into maintenance mode for the duration of the puppet run on the second and subsequent `openstack overcloud deploy` calls. The heat resource executing this action didn't receive DeployIdentifier/UpdateIdentifier as an input, which caused it to be executed only on the first time it got introduced into the stack, and never again. The resource now receives the same re-apply trigger as the Puppet runs, which means it's going to run together with Puppet as intended. Tested by deploying and then scaling up +1 compute. This is not the exact triggering scenario described by Marius, but i saw pacemaker get in and out of maintenance mode during the scale up. openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch I was able to successfully scale out according to the reproduce steps. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2651 |
Created attachment 1104490 [details] deployment error Description of problem: I'm trying to add an additional node to an update overcloud from 7.1 -> 7.2 but the stack update fails because puppet fails to restart openstack-nova-scheduler on one of the controller nodes. If I rerun the update command the stack update completes ok. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-0.8.6-91.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy 7.1 by using 7.1 templates: openstack overcloud deploy \ --templates ~/templates/my-overcloud \ --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 \ --ntp-server clock.redhat.com \ --libvirt-type qemu \ -e ~/templates/my-overcloud/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/firstboot-environment.yaml \ -e ~/templates/ceph.yaml 2. Update the undercloud to 7.2 and run the update procedure to 7.2 with 7.2 templates: /usr/bin/yes '' | openstack overcloud update stack overcloud -i \ --templates ~/templates/my-overcloud \ -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \ -e ~/templates/my-overcloud/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/firstboot-environment.yaml \ -e ~/templates/ceph.yaml \ -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \ -e ~/templates/ctrlport.yaml Wait for the update to complete 3. Try to scale out with an additional node: openstack overcloud deploy \ --templates ~/templates/my-overcloud \ --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 \ --ntp-server clock.redhat.com \ --libvirt-type qemu \ -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \ -e ~/templates/my-overcloud/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/firstboot-environment.yaml \ -e ~/templates/ceph.yaml \ -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \ -e ~/templates/ctrlport.yaml Actual results: Stack update fails. I'm attaching the deployment output. When I log in to the controller the nova-scheduler appears to be running: [root@overcloud-controller-1 heat-admin]# systemctl status openstack-nova-scheduler ● openstack-nova-scheduler.service - OpenStack Nova Scheduler Server Loaded: loaded (/usr/lib/systemd/system/openstack-nova-scheduler.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2015-12-10 14:45:55 EST; 12min ago Main PID: 11340 (nova-scheduler) CGroup: /system.slice/openstack-nova-scheduler.service └─11340 /usr/bin/python /usr/bin/nova-scheduler Dec 10 14:45:53 overcloud-controller-1.localdomain systemd[1]: Starting OpenStack Nova Scheduler Server... Dec 10 14:45:55 overcloud-controller-1.localdomain systemd[1]: Started OpenStack Nova Scheduler Server. Warning: openstack-nova-scheduler.service changed on disk. Run 'systemctl daemon-reload' to reload units. Expected results: The stack update completes ok. Additional info: If I rerun the deploy command for a 2nd time the stack update finishes successfully.