Bug 1388283 - [OSP-Director-10] Upgrade undercloud with SSL from OSP9 to OSP10 causes undercloud-upgrade failure.
Summary: [OSP-Director-10] Upgrade undercloud with SSL from OSP9 to OSP10 causes unde...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: 10.0 (Newton)
Assignee: Sofer Athlan-Guyot
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-25 00:35 UTC by Omri Hochman
Modified: 2016-12-29 17:01 UTC (History)
15 users (show)

Fixed In Version: puppet-tripleo-5.3.0-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:24:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
adding keystone.log (5.05 MB, text/plain)
2016-10-25 19:12 UTC, Omri Hochman
no flags Details
adding apache.log (16.04 KB, application/x-bzip)
2016-10-25 19:12 UTC, Omri Hochman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1638029 0 None None None 2016-10-31 15:01:29 UTC
Launchpad 1640213 0 None None None 2016-11-08 16:13:23 UTC
OpenStack gerrit 393361 0 None MERGED Make sure keepalived is restarted before haproxy. 2020-12-17 17:31:45 UTC
OpenStack gerrit 395633 0 None MERGED Better way to ensure keepalived before haproxy. 2020-12-17 17:32:15 UTC
OpenStack gerrit 396731 0 None MERGED Ensure keepalived is restarted when necessary. 2020-12-17 17:32:15 UTC
Red Hat Product Errata RHEA-2016:2948 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Omri Hochman 2016-10-25 00:35:28 UTC
[OSP-Director-10] Upgrade undercloud with SSL from  OSP9 to OSP10 causes undercloud-upgrade failure. 

Environment:
-------------
instack-5.0.0-1.el7ost.noarch
instack-undercloud-5.0.0-0.4.0rc3.el7ost.noarch
openstack-heat-api-cfn-7.0.0-3.el7ost.noarch
openstack-heat-common-7.0.0-3.el7ost.noarch
openstack-heat-templates-0.0.1-0.20161011152629.40a4ed0.el7ost.noarch
puppet-heat-9.4.1-1.el7ost.noarch
python-heatclient-1.5.0-1.el7ost.noarch
python-heat-agent-0.0.1-0.20161011152629.40a4ed0.el7ost.noarch
openstack-heat-engine-7.0.0-3.el7ost.noarch
openstack-heat-api-7.0.0-3.el7ost.noarch
python-heat-tests-7.0.0-3.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-0.6.0rc3.el7ost.noarch
openstack-tripleo-heat-templates-compat-2.0.0-34.3.el7ost.noarch


Steps:
-------
(1) Deploy osp9 with SSL enabled on Undercloud + Overcloud
(2) Attempt to upgrade the undercloud according the guide : https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade


Results : 
---------
openstack undercloud upgrade 


Errors: 
-------
016-10-20 19:50:03 - Notice: /Stage[main]/Main/Exec[stop_nova-api]: Triggered 'refresh' from 82 events
2016-10-20 19:50:04 - Notice: /Stage[main]/Apache::Service/Service[httpd]: Triggered 'refresh' from 2 events
2016-10-20 19:50:04 - Notice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Triggered 'refresh' from 4 events

Broadcast message from systemd-journald (Thu 2016-10-20 19:50:13 EDT):

haproxy[30998]: proxy aodh has no server available!

2016-10-20 19:53:03 - Error: /Stage[main]/Neutron::Keystone::Auth/Keystone::Resource::Service_identity[neutron]/Keystone_user[neutron]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)
2016-10-20 19:55:58 - Error: /Stage[main]/Heat::Keystone::Auth/Keystone::Resource::Service_identity[heat]/Keystone_user[heat]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)
2016-10-20 19:56:02 - Notice: /Stage[main]/Glance::Keystone::Auth/Keystone::Resource::Service_identity[glance]/Keystone_service[glance::image]/ensure: created
2016-10-20 19:56:03 - Notice: /Stage[main]/Zaqar::Keystone::Auth/Keystone::Resource::Service_identity[zaqar]/Keystone_service[zaqar::messaging]/ensure: created
2016-10-20 19:58:58 - Error: /Stage[main]/Nova::Keystone::Auth/Keystone::Resource::Service_identity[nova]/Keystone_user[nova]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)
2016-10-20 19:59:00 - Notice: /Stage[main]/Mistral::Keystone::Auth/Keystone::Resource::Service_identity[mistral]/Keystone_user[mistral]/ensure: created
2016-10-20 20:01:55 - Error: /Stage[main]/Glance::Keystone::Auth/Keystone::Resource::Service_identity[glance]/Keystone_user[glance]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)
2016-10-20 20:01:58 - Notice: /Stage[main]/Zaqar::Keystone::Auth_websocket/Keystone::Resource::Service_identity[zaqar-websocket]/Keystone_user[zaqar-websocket]/ensure: created
2016-10-20 20:02:00 - Notice: /Stage[main]/Mistral::Keystone::Auth/Keystone::Resource::Service_identity[mistral]/Keystone_service[mistral::workflowv2]/ensure: created
2016-10-20 20:04:54 - Error: /Stage[main]/Ironic::Keystone::Auth/Keystone::Resource::Service_identity[ironic]/Keystone_user[ironic]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)
2016-10-20 20:07:50 - Error: /Stage[main]/Ironic::Keystone::Auth_inspector/Keystone::Resource::Service_identity[ironic-inspector]/Keystone_user[ironic-inspector]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)
2016-10-20 20:07:53 - Notice: /Stage[main]/Zaqar::Keystone::Auth/Keystone::Resource::Service_identity[zaqar]/Keystone_user[zaqar]/ensure: created
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_status_changes]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_status_changes]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/auth_type]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/auth_type]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/region_name]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/region_name]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_name]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_name]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/send_events_interval]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/send_events_interval]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/username]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/username]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/password]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/password]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_domain_id]: Dependency Keystone_user[nova] has failures: true
2016-10-20 20:07:53 - Warning: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_domain_id]: Skipping because of failed dependencies
2016-10-20 20:07:53 - Notice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_data_changes]: Dependency Keystone_user[nova] has failures:

Comment 2 Sofer Athlan-Guyot 2016-10-25 14:46:46 UTC
Hi,

this error 

   /bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds)

is usually caused by keystone not working properly.  The keystone and apache log would be useful, checking what if the service is listening on 13000.   My idea would be that something is wrong with the apache ssl configuration.

Comment 5 Omri Hochman 2016-10-25 19:12:13 UTC
Created attachment 1214050 [details]
adding keystone.log

Comment 6 Omri Hochman 2016-10-25 19:12:43 UTC
Created attachment 1214051 [details]
adding apache.log

Comment 7 Omri Hochman 2016-10-25 19:15:37 UTC
(In reply to Sofer Athlan-Guyot from comment #2)
> Hi,
> 
> this error 
> 
>    /bin/openstack token issue --format value' returned 1: Unable to
> establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22,
> for a total of 170 seconds)
> 
> is usually caused by keystone not working properly.  The keystone and apache
> log would be useful, checking what if the service is listening on 13000.  
> My idea would be that something is wrong with the apache ssl configuration.

It might be configuration issue - but then we have to explore it and set the right steps to be documented .  answering: steps to upgrade with OSP9 to OSP10 with SSL enabled.

Comment 9 Omri Hochman 2016-10-27 15:35:04 UTC
moving to DFG:Security  ,

Keith - can you check if Security DFG can help us set the right progress (steps) when it comes to upgrade the undercloud with SSL , we're failing on the above ^^ 

my assumption is that eventually we just need to know the steps to fix the certificate before running the 'openstack undercloud upgrade' command - but I'm not sure if that's the case.

Comment 10 Sofer Athlan-Guyot 2016-10-28 14:17:12 UTC
I have reproduced the error locally.


So haproxy fails to restart:

    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [WARNING] 301/092953 (29179) : config : missing timeouts for proxy 'rabbitmq'.
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | While not properly invalid, you will certainly encounter various problems
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | with such a configuration. To fix this, please ensure that all following
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [WARNING] 301/092953 (29179) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy aodh started.
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy ceilometer started.
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy glance_api started.
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [ALERT] 301/092953 (29179) : Starting proxy ironic-inspector: cannot bind socket [192.0.2.3:5050]
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy glance_registry started.
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy haproxy.stats started.
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy heat_api started.
    Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy ironic started.
    Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: haproxy-systemd-wrapper: exit, haproxy RC=256

The returns code 256 is a mistake which is interpreted as a return
code 0 by puppet, see this thread for more information:
https://www.mail-archive.com/haproxy@formilux.org/msg23896.html

I'm now checking what is making haproxy fails.

Comment 11 Juan Antonio Osorio 2016-10-31 13:48:24 UTC
This is because ironic-inspector used to bind to 0.0.0.0 (which was wrong) and now that it was added as an endpoint to haproxy, the upgrade seems to fail. As a workaround one can temporarily shut off ironic inspector and run the upgrade again.

Comment 12 Juan Antonio Osorio 2016-10-31 14:30:59 UTC
So this actually turned out to be an orchestration issue, where keepalived needs to run before haproxy, and we have no such constraint specified in the puppet manifests.

Comment 13 Sofer Athlan-Guyot 2016-10-31 15:00:26 UTC
Adding upstream review.  This is currently in master, but waiting for backport.

Comment 14 Sofer Athlan-Guyot 2016-10-31 15:01:29 UTC
Adding upstream launchpad.

Comment 17 Omri Hochman 2016-11-01 14:14:59 UTC
I managed to get to upgraded undercloud with SSL using :  https://review.openstack.org/#/c/391873/6

Comment 18 Marios Andreou 2016-11-01 18:12:41 UTC
So this is definitely 'ASSIGNED' and given Omri's comment #17 it also works so once it lands it goes POST too.

Comment 20 Marios Andreou 2016-11-07 17:17:35 UTC
The linked review has now landed to stable/newton https://review.openstack.org/#/c/393361/ so moving this to POST

Comment 22 Marios Andreou 2016-11-08 09:50:50 UTC
moving back to ASSIGNED because of an issue discovered by dev/engineering while testing the fix which was landed as a fix (comment #20)

Comment 23 Sofer Athlan-Guyot 2016-11-08 09:56:41 UTC
The test done on a hardcoded revision of the review, not the latest.  The latest revision does not solve the problem.  I'm testing a new patch to correct it.

Comment 24 Sofer Athlan-Guyot 2016-11-08 16:13:23 UTC
Adding new launpad bug.  Basically, os-net-config/config.yaml is updated (mtu added), then puppet run os-net-config which removed the keepalived configured ip.  As the keepalived configuration is not modified, puppet doesn't restart it and they goes missing, causing the error.

Comment 25 Sofer Athlan-Guyot 2016-11-08 16:15:01 UTC
Adding a review, still WIP.

Another way could be to run the undercloud upgrade, let it fails, run systemctl restart keepalived and restart the undercloud upgrade to success.

Comment 30 Omri Hochman 2016-11-18 18:49:34 UTC
verified with : puppet-tripleo-5.3.0-9.el7ost.noarch

Comment 32 errata-xmlrpc 2016-12-14 16:24:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.