Description of problem: During the overcloud upgrade using director we encountered some issues in the keystone upgrade step, command: openstack overcloud deploy \ --stack lab \ --templates \ --ntp-server time.ord1.rackspace.com \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/templates/ips-from-pool-all.yaml \ -e ~/templates/environments/network-environment.yaml \ -e ~/templates/environments/storage-environment.yaml \ -e ~/templates/wipe_disk_resource.yaml \ -e ~/templates/rhel-registration/environment-rhel-registration.yaml \ -e ~/templates/rhel-registration/rhel-registration-resource-registry.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-keystone-liberty-mitaka.yaml \ --control-flavor control \ --compute-flavor compute \ --ceph-storage-flavor ceph-storage \ --neutron-network-type vxlan \ --neutron-tunnel-types vxlan \ --control-scale 3 \ --compute-scale 5 \ --ceph-storage-scale 4 The problems were: 1. keystone-manage bootstrap[1] ran but that is not an option for OSP8 keystone package, we ran it again and that was fixed, it's odd how this worked because at this point keystone was still not updated. 2. keystone was added as wsgi module in apache, but the openstack-keystone-clone service was still running in pacemaker, when resource httpd-clone was started it failed because the port was already used[2]. We solved this by stopping openstack-keystone-clone and start httpd-clone manually, then re-ran the upgrade command, at this point the upgrade was able to continue and finalize the keystone upgrade step It looks like heat and puppet[3] take care of stopping keystone and starting apache, but our suspicion is that it needs to be a considerable sleep time between these tasks because all openstack services depend on keystone in OSP8 and takes about 2 minutes to stop all services: I pulled the log entries generated during the upgrade, because the full sosreport is 255MB and it has more stuff of other tests we did after the upgrade. See attached file upgrade.log # journalctl -u os-collect-config --since="2016-10-26 14:35" --until "2016-10-26 21:07" > upgrade.log [1] Oct 26 14:38:42 444729-controller00.localdomain os-collect-config[4033]: [2016-10-26 14:38:42,315] (heat-config) [INFO] Warning: Scope(Class[Keystone]): Execution of db_sync does not depend on $enabled anymore. Please use sync_db instead. Oct 26 14:38:42 444729-controller00.localdomain os-collect-config[4033]: Error: /Stage[main]/Keystone/Exec[keystone-manage bootstrap]: Failed to call refresh: keystone-manage bootstrap --bootstrap-password p4g6PcMxyrNu9xgCCuRjEE9hX returned 2 instead of one of [0] Oct 26 14:38:42 444729-controller00.localdomain os-collect-config[4033]: Error: /Stage[main]/Keystone/Exec[keystone-manage bootstrap]: keystone-manage bootstrap --bootstrap-password p4g6PcMxyrNu9xgCCuRjEE9hX returned 2 instead of one of [0] Oct 26 14:38:42 444729-controller00.localdomain os-collect-config[4033]: [2016-10-26 14:38:42,315] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/56509a24-5608-4899-8c79-b9fd936d2366.pp. [6] [2] Oct 26 15:29:46 444729-controller00.localdomain os-collect-config[4033]: Error: Could not start Service[httpd]: Execution of '/usr/bin/systemctl start httpd' returned 1: Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details. Oct 26 15:29:46 444729-controller00.localdomain os-collect-config[4033]: Wrapped exception: Oct 26 15:29:46 444729-controller00.localdomain os-collect-config[4033]: Execution of '/usr/bin/systemctl start httpd' returned 1: Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details. Oct 26 15:29:46 444729-controller00.localdomain os-collect-config[4033]: Error: /Stage[main]/Apache::Service/Service[httpd]/ensure: change from stopped to running failed: Could not start Service[httpd]: Execution of '/usr/bin/systemctl start httpd' returned 1: Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details. Oct 26 15:29:46 444729-controller00.localdomain os-collect-config[4033]: Warning: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Skipping because of failed dependencies Oct 26 15:29:46 444729-controller00.localdomain os-collect-config[4033]: [2016-10-26 15:29:46,831] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/0baac45a-7e7d-4faa-ba84-326b44769b9b.pp. [6] [3] /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/major_upgrade_keystone_liberty_mitaka.yaml /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/liberty_to_mitaka_keystone_upgrade.pp Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-2.0.0-34.el7ost.noarch
seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1354046 , however they are already on the fixed version:
Indeed this looks like bug 1354046 (CCing Michele). We are already checking that Keystone disappears from pacemaker before re-managing httpd: https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L67-L87 so it's not immediately obvious how the problem could have happened. Perhaps more logs could be useful in order to see the exact time progression of events. Ideally relevant apache logs showing the failure, /var/log/cluster/..., and the portion of the os-collect-config log (the upgrade.log mentioned earlier). Also just to double check re workaround -- you managed to work around the issue by removing the openstack-keystone-clone resource manually, and running the major-upgrade-keystone-liberty-mitaka.yaml step again, right?
Jiri, yes that's the workaround , or stopping keystone in pcs. I will collect those logs and update. Thanks!
Moving to upgrades group, workaround available, not blocking any important cases. We will investigate proper fix asap.
Hi, I think I found the issue. I could reproduce the exact same error message: Error: /Stage[main]/Keystone/Exec[keystone-manage bootstrap]: Failed to call refresh: keystone-manage bootstrap --bootstrap-password 39EnVE8U7QaxGXYzpKhH47kXh returned 2 instead of one of [0] Error: /Stage[main]/Keystone/Exec[keystone-manage bootstrap]: keystone-manage bootstrap --bootstrap-password 39EnVE8U7QaxGXYzpKhH47kXh returned 2 instead of one of [0] by upgrading the puppet-module to osp9-director before running the keystone migration. The fix is to completely ignore in the documentation: https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/upgrading-red-hat-openstack-platform/chapter-3-director-based-environments-performing-upgrades-to-major-versions in this section: 3.4.3. Upgrading Keystone this command: "Before running the upgrade, update the openstack-puppet-modules package on each node with the following command on the Undercloud: " for i in $(nova list|grep ctlplane|awk -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh -o StrictHostKeyChecking=no heat-admin@$i "sudo yum -y update openstack-puppet-modules" ; done This is a documentation issue and has been raised: - here: https://bugzilla.redhat.com/show_bug.cgi?id=1414917 - there: https://bugzilla.redhat.com/show_bug.cgi?id=1414784 Could you confirm that if you don't upgrade the openstack-puppet-modules before doing the migration then the issue disapear and you don't have to run the workaround any more ? Regards,
Can you confirm with the documentation mentioned above in place that we should close this BZ out?
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Yes it's only a documentation bug and it has been fixed there https://bugzilla.redhat.com/show_bug.cgi?id=1414784#c5
Closing it as it's fixed in the documentation. *** This bug has been marked as a duplicate of bug 1414784 ***