[OSP-Director-10] Upgrade undercloud with SSL from OSP9 to OSP10 causes undercloud-upgrade failure. Environment: ------------- instack-5.0.0-1.el7ost.noarch instack-undercloud-5.0.0-0.4.0rc3.el7ost.noarch openstack-heat-api-cfn-7.0.0-3.el7ost.noarch openstack-heat-common-7.0.0-3.el7ost.noarch openstack-heat-templates-0.0.1-0.20161011152629.40a4ed0.el7ost.noarch puppet-heat-9.4.1-1.el7ost.noarch python-heatclient-1.5.0-1.el7ost.noarch python-heat-agent-0.0.1-0.20161011152629.40a4ed0.el7ost.noarch openstack-heat-engine-7.0.0-3.el7ost.noarch openstack-heat-api-7.0.0-3.el7ost.noarch python-heat-tests-7.0.0-3.el7ost.noarch openstack-tripleo-heat-templates-5.0.0-0.6.0rc3.el7ost.noarch openstack-tripleo-heat-templates-compat-2.0.0-34.3.el7ost.noarch Steps: ------- (1) Deploy osp9 with SSL enabled on Undercloud + Overcloud (2) Attempt to upgrade the undercloud according the guide : https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade Results : --------- openstack undercloud upgrade Errors: ------- 016-10-20 19:50:03 - Notice: /Stage[main]/Main/Exec[stop_nova-api]: Triggered 'refresh' from 82 events 2016-10-20 19:50:04 - Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 2 events 2016-10-20 19:50:04 - Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Triggered 'refresh' from 4 events Broadcast message from systemd-journald (Thu 2016-10-20 19:50:13 EDT): haproxy[30998]: proxy aodh has no server available! 2016-10-20 19:53:03 - Error: /Stage[main]/Neutron::Keystone::Auth/Keystone::Resource::Service_identity[neutron]/Keystone_user[neutron]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) 2016-10-20 19:55:58 - Error: /Stage[main]/Heat::Keystone::Auth/Keystone::Resource::Service_identity[heat]/Keystone_user[heat]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) 2016-10-20 19:56:02 - Notice: /Stage[main]/Glance::Keystone::Auth/Keystone::Resource::Service_identity[glance]/Keystone_service[glance::image]/ensure: created 2016-10-20 19:56:03 - Notice: /Stage[main]/Zaqar::Keystone::Auth/Keystone::Resource::Service_identity[zaqar]/Keystone_service[zaqar::messaging]/ensure: created 2016-10-20 19:58:58 - Error: /Stage[main]/Nova::Keystone::Auth/Keystone::Resource::Service_identity[nova]/Keystone_user[nova]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) 2016-10-20 19:59:00 - Notice: /Stage[main]/Mistral::Keystone::Auth/Keystone::Resource::Service_identity[mistral]/Keystone_user[mistral]/ensure: created 2016-10-20 20:01:55 - Error: /Stage[main]/Glance::Keystone::Auth/Keystone::Resource::Service_identity[glance]/Keystone_user[glance]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) 2016-10-20 20:01:58 - Notice: /Stage[main]/Zaqar::Keystone::Auth_websocket/Keystone::Resource::Service_identity[zaqar-websocket]/Keystone_user[zaqar-websocket]/ensure: created 2016-10-20 20:02:00 - Notice: /Stage[main]/Mistral::Keystone::Auth/Keystone::Resource::Service_identity[mistral]/Keystone_service[mistral::workflowv2]/ensure: created 2016-10-20 20:04:54 - Error: /Stage[main]/Ironic::Keystone::Auth/Keystone::Resource::Service_identity[ironic]/Keystone_user[ironic]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) 2016-10-20 20:07:50 - Error: /Stage[main]/Ironic::Keystone::Auth_inspector/Keystone::Resource::Service_identity[ironic-inspector]/Keystone_user[ironic-inspector]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) 2016-10-20 20:07:53 - Notice: /Stage[main]/Zaqar::Keystone::Auth/Keystone::Resource::Service_identity[zaqar]/Keystone_user[zaqar]/ensure: created 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_status_changes]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_status_changes]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/auth_type]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/auth_type]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/region_name]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/region_name]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_name]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_name]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/send_events_interval]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/send_events_interval]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/username]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/username]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/password]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/password]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_domain_id]: Dependency Keystone_user[nova] has failures: true 2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_domain_id]: Skipping because of failed dependencies 2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_data_changes]: Dependency Keystone_user[nova] has failures:
Hi, this error /bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) is usually caused by keystone not working properly. The keystone and apache log would be useful, checking what if the service is listening on 13000. My idea would be that something is wrong with the apache ssl configuration.
Created attachment 1214050 [details] adding keystone.log
Created attachment 1214051 [details] adding apache.log
(In reply to Sofer Athlan-Guyot from comment #2) > Hi, > > this error > > /bin/openstack token issue --format value' returned 1: Unable to > establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, > for a total of 170 seconds) > > is usually caused by keystone not working properly. The keystone and apache > log would be useful, checking what if the service is listening on 13000. > My idea would be that something is wrong with the apache ssl configuration. It might be configuration issue - but then we have to explore it and set the right steps to be documented . answering: steps to upgrade with OSP9 to OSP10 with SSL enabled.
moving to DFG:Security , Keith - can you check if Security DFG can help us set the right progress (steps) when it comes to upgrade the undercloud with SSL , we're failing on the above ^^ my assumption is that eventually we just need to know the steps to fix the certificate before running the 'openstack undercloud upgrade' command - but I'm not sure if that's the case.
I have reproduced the error locally. So haproxy fails to restart: Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [WARNING] 301/092953 (29179) : config : missing timeouts for proxy 'rabbitmq'. Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | While not properly invalid, you will certainly encounter various problems Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | with such a configuration. To fix this, please ensure that all following Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | timeouts are set to a non-zero value: 'client', 'connect', 'server'. Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [WARNING] 301/092953 (29179) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy aodh started. Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy ceilometer started. Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy glance_api started. Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [ALERT] 301/092953 (29179) : Starting proxy ironic-inspector: cannot bind socket [192.0.2.3:5050] Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy glance_registry started. Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy haproxy.stats started. Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy heat_api started. Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy ironic started. Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: haproxy-systemd-wrapper: exit, haproxy RC=256 The returns code 256 is a mistake which is interpreted as a return code 0 by puppet, see this thread for more information: https://www.mail-archive.com/haproxy@formilux.org/msg23896.html I'm now checking what is making haproxy fails.
This is because ironic-inspector used to bind to 0.0.0.0 (which was wrong) and now that it was added as an endpoint to haproxy, the upgrade seems to fail. As a workaround one can temporarily shut off ironic inspector and run the upgrade again.
So this actually turned out to be an orchestration issue, where keepalived needs to run before haproxy, and we have no such constraint specified in the puppet manifests.
Adding upstream review. This is currently in master, but waiting for backport.
Adding upstream launchpad.
I managed to get to upgraded undercloud with SSL using : https://review.openstack.org/#/c/391873/6
So this is definitely 'ASSIGNED' and given Omri's comment #17 it also works so once it lands it goes POST too.
The linked review has now landed to stable/newton https://review.openstack.org/#/c/393361/ so moving this to POST
moving back to ASSIGNED because of an issue discovered by dev/engineering while testing the fix which was landed as a fix (comment #20)
The test done on a hardcoded revision of the review, not the latest. The latest revision does not solve the problem. I'm testing a new patch to correct it.
Adding new launpad bug. Basically, os-net-config/config.yaml is updated (mtu added), then puppet run os-net-config which removed the keepalived configured ip. As the keepalived configuration is not modified, puppet doesn't restart it and they goes missing, causing the error.
Adding a review, still WIP. Another way could be to run the undercloud upgrade, let it fails, run systemctl restart keepalived and restart the undercloud upgrade to success.
verified with : puppet-tripleo-5.3.0-9.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html