Bug 1287812
Summary: | During 7.1 -> 7.2 update all the HAProxy exposed services go down for approximately 3 minutes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||||
Component: | rhosp-director | Assignee: | Hugh Brock <hbrock> | ||||||
Status: | CLOSED DEFERRED | QA Contact: | Shai Revivo <srevivo> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.0 (Kilo) | CC: | bperkins, ggillies, jcoufal, mandreou, mburns, mcornea, obaranov, rhel-osp-director-maint | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 10.0 (Newton) | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-10-10 04:31:40 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
so is this (what we see in the haproxy log) a connection problem for *this* node... i mean, are the overcloud services themselves, neutron/nova etc actually down during this time (and especially on the nodes not being updated currently) as reported in the logs you attached for controller0? Can you check the state of the services on the other controllers, in particular "pcs status | grep -ni stop -C 2" to see what is stopped This was a virtual environment and I watched the services status on the HAProxy dashboard - http://${control_virtual_ip}:1993 where I noticed that all the backends went down. Note that this happened after yum update was ran on each individual controller, during this nested stack: overcloud-ControllerNodesPostDeployment-vlyhgsyrnidy-ControllerPostPuppet-haywpx4e65oq-ControllerPostPuppetRestartDeployment-fxgcdg4dqbhi Also during that time I ran a nova command on the overcloud and got a timeout. I'm going to catch the pcs status on all controller nodes during the next update and get back to you. Created attachment 1103181 [details] pcs status Attaching the output of pcs status. Here is the output of nova list during this period(looks like first it cannot reach nova-api and then it cannot reach keystone. stack@instack:~>>> nova list ERROR (ConnectionRefused): Unable to establish connection to http://172.16.23.10:8774/v2/5fc68ae206d449db88fd3cb3e8a9108c/servers/detail stack@instack:~>>> nova list No handlers could be found for logger "keystoneclient.auth.identity.generic.base" ERROR (ConnectionRefused): Unable to establish connection to http://172.16.23.10:5000/v2.0/tokens All the nodes have been updated so this happens post yum update(it looks to me like a cluster restart). I saw this happening on each of my update attempts. This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. 7.2 is no longer available for updates. User goes to latest 7.x (7.3 at the moment). |
Created attachment 1101576 [details] haproxy.log Description of problem: During 7.1 -> 7.2 update in HA environment all the HAProxy exposed services go down for approximately 3 minutes getting the cloud unaccessible during this timeframe. How reproducible: 100% Steps to Reproduce: 1. Deploy 7.1 overcloud 2. Start update to 7.2 procedure Actual results: During overcloud-ControllerNodesPostDeployment-vlyhgsyrnidy-ControllerPostPuppet-haywpx4e65oq-ControllerPostPuppetRestartDeployment-fxgcdg4dqbhi all the haproxy services backends go down for a timeframe of approximately 3 minutes. Expected results: The overcloud is still accessible during the update. Additional info: Attaching the haproxy log from one of the controller during the timeframe the services were unavailable. Please let me know if further logs are needed.