Bug 1287812 - During 7.1 -> 7.2 update all the HAProxy exposed services go down for approximately 3 minutes
During 7.1 -> 7.2 update all the HAProxy exposed services go down for approxi...
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
unspecified Severity high
: ---
: 10.0 (Newton)
Assigned To: Hugh Brock
Shai Revivo
Depends On:
  Show dependency treegraph
Reported: 2015-12-02 13:26 EST by Marius Cornea
Modified: 2016-10-10 00:31 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-10-10 00:31:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
haproxy.log (50.99 KB, text/plain)
2015-12-02 13:26 EST, Marius Cornea
no flags Details
pcs status (14.43 KB, text/plain)
2015-12-07 06:33 EST, Marius Cornea
no flags Details

  None (edit)
Description Marius Cornea 2015-12-02 13:26:53 EST
Created attachment 1101576 [details]

Description of problem:
During 7.1 -> 7.2 update in HA environment all the HAProxy exposed services go down for approximately 3 minutes getting the cloud unaccessible during this timeframe.

How reproducible:

Steps to Reproduce:
1. Deploy 7.1 overcloud
2. Start update to 7.2 procedure 

Actual results:

all the haproxy services backends go down for a timeframe of approximately 3 minutes.

Expected results:
The overcloud is still accessible during the update. 

Additional info:
Attaching the haproxy log from one of the controller during the timeframe the services were unavailable. Please let me know if further logs are needed.
Comment 2 marios 2015-12-04 08:53:22 EST
so is this (what we see in the haproxy log) a connection problem for *this* node... i mean, are the overcloud services themselves, neutron/nova etc actually down during this time (and especially on the nodes not being updated currently) as reported in the logs you attached for controller0? Can you check the state of the services on the other controllers, in particular "pcs status | grep -ni stop -C 2" to see what is stopped
Comment 3 Marius Cornea 2015-12-04 10:16:08 EST
This was a virtual environment and I watched the services status on the HAProxy dashboard - http://${control_virtual_ip}:1993 where I noticed that all the backends went down. 

Note that this happened after yum update was ran on each individual controller, during this nested stack:

Also during that time I ran a nova command on the overcloud and got a timeout.

I'm going to catch the pcs status on all controller nodes during the next update and get back to you.
Comment 4 Marius Cornea 2015-12-07 06:33 EST
Created attachment 1103181 [details]
pcs status

Attaching the output of pcs status. Here is the output of nova list during this period(looks like first it cannot reach nova-api and then it cannot reach keystone. 

stack@instack:~>>> nova list
ERROR (ConnectionRefused): Unable to establish connection to
stack@instack:~>>> nova list
No handlers could be found for logger "keystoneclient.auth.identity.generic.base"
ERROR (ConnectionRefused): Unable to establish connection to
Comment 8 Marius Cornea 2015-12-07 08:04:42 EST
All the nodes have been updated so this happens post yum update(it looks to me like a cluster restart). I saw this happening on each of my update attempts.
Comment 12 Mike Burns 2016-04-07 17:00:12 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 14 Jaromir Coufal 2016-10-10 00:31:40 EDT
7.2 is no longer available for updates. User goes to latest 7.x (7.3 at the moment).

Note You need to log in before you can comment on or make changes to this bug.