Bug 1388283
| Summary: | [OSP-Director-10] Upgrade undercloud with SSL from OSP9 to OSP10 causes undercloud-upgrade failure. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Omri Hochman <ohochman> | ||||||
| Component: | puppet-tripleo | Assignee: | Sofer Athlan-Guyot <sathlang> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 10.0 (Newton) | CC: | dbecker, jcoufal, jjoyce, josorior, jschluet, jslagle, kbasil, mandreou, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang, slinaber, tvignaud | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | 10.0 (Newton) | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | puppet-tripleo-5.3.0-6.el7ost | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-12-14 16:24:57 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Omri Hochman
2016-10-25 00:35:28 UTC
Hi, this error /bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of 170 seconds) is usually caused by keystone not working properly. The keystone and apache log would be useful, checking what if the service is listening on 13000. My idea would be that something is wrong with the apache ssl configuration. Created attachment 1214050 [details]
adding keystone.log
Created attachment 1214051 [details]
adding apache.log
(In reply to Sofer Athlan-Guyot from comment #2) > Hi, > > this error > > /bin/openstack token issue --format value' returned 1: Unable to > establish connection to https://192.168.0.2:13000/v3/auth/tokens (tried 22, > for a total of 170 seconds) > > is usually caused by keystone not working properly. The keystone and apache > log would be useful, checking what if the service is listening on 13000. > My idea would be that something is wrong with the apache ssl configuration. It might be configuration issue - but then we have to explore it and set the right steps to be documented . answering: steps to upgrade with OSP9 to OSP10 with SSL enabled. moving to DFG:Security , Keith - can you check if Security DFG can help us set the right progress (steps) when it comes to upgrade the undercloud with SSL , we're failing on the above ^^ my assumption is that eventually we just need to know the steps to fix the certificate before running the 'openstack undercloud upgrade' command - but I'm not sure if that's the case. I have reproduced the error locally.
So haproxy fails to restart:
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [WARNING] 301/092953 (29179) : config : missing timeouts for proxy 'rabbitmq'.
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | While not properly invalid, you will certainly encounter various problems
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | with such a configuration. To fix this, please ensure that all following
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [WARNING] 301/092953 (29179) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy aodh started.
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy ceilometer started.
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy glance_api started.
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: [ALERT] 301/092953 (29179) : Starting proxy ironic-inspector: cannot bind socket [192.0.2.3:5050]
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy glance_registry started.
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy haproxy.stats started.
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy heat_api started.
Oct 28 09:29:53 instack.localdomain haproxy[29179]: Proxy ironic started.
Oct 28 09:29:53 instack.localdomain haproxy-systemd-wrapper[29178]: haproxy-systemd-wrapper: exit, haproxy RC=256
The returns code 256 is a mistake which is interpreted as a return
code 0 by puppet, see this thread for more information:
https://www.mail-archive.com/haproxy@formilux.org/msg23896.html
I'm now checking what is making haproxy fails.
This is because ironic-inspector used to bind to 0.0.0.0 (which was wrong) and now that it was added as an endpoint to haproxy, the upgrade seems to fail. As a workaround one can temporarily shut off ironic inspector and run the upgrade again. So this actually turned out to be an orchestration issue, where keepalived needs to run before haproxy, and we have no such constraint specified in the puppet manifests. Adding upstream review. This is currently in master, but waiting for backport. Adding upstream launchpad. I managed to get to upgraded undercloud with SSL using : https://review.openstack.org/#/c/391873/6 So this is definitely 'ASSIGNED' and given Omri's comment #17 it also works so once it lands it goes POST too. The linked review has now landed to stable/newton https://review.openstack.org/#/c/393361/ so moving this to POST moving back to ASSIGNED because of an issue discovered by dev/engineering while testing the fix which was landed as a fix (comment #20) The test done on a hardcoded revision of the review, not the latest. The latest revision does not solve the problem. I'm testing a new patch to correct it. Adding new launpad bug. Basically, os-net-config/config.yaml is updated (mtu added), then puppet run os-net-config which removed the keepalived configured ip. As the keepalived configuration is not modified, puppet doesn't restart it and they goes missing, causing the error. Adding a review, still WIP. Another way could be to run the undercloud upgrade, let it fails, run systemctl restart keepalived and restart the undercloud upgrade to success. verified with : puppet-tripleo-5.3.0-9.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html |