Bug 1287812

Summary: During 7.1 -> 7.2 update all the HAProxy exposed services go down for approximately 3 minutes
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: Hugh Brock <hbrock>
Status: CLOSED DEFERRED QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: bperkins, ggillies, jcoufal, mandreou, mburns, mcornea, obaranov, rhel-osp-director-maint
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-10 04:31:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
haproxy.log
none
pcs status none

Description Marius Cornea 2015-12-02 18:26:53 UTC
Created attachment 1101576 [details]
haproxy.log

Description of problem:
During 7.1 -> 7.2 update in HA environment all the HAProxy exposed services go down for approximately 3 minutes getting the cloud unaccessible during this timeframe.


How reproducible:
100%

Steps to Reproduce:
1. Deploy 7.1 overcloud
2. Start update to 7.2 procedure 

Actual results:
During 
overcloud-ControllerNodesPostDeployment-vlyhgsyrnidy-ControllerPostPuppet-haywpx4e65oq-ControllerPostPuppetRestartDeployment-fxgcdg4dqbhi

all the haproxy services backends go down for a timeframe of approximately 3 minutes.

Expected results:
The overcloud is still accessible during the update. 

Additional info:
Attaching the haproxy log from one of the controller during the timeframe the services were unavailable. Please let me know if further logs are needed.

Comment 2 Marios Andreou 2015-12-04 13:53:22 UTC
so is this (what we see in the haproxy log) a connection problem for *this* node... i mean, are the overcloud services themselves, neutron/nova etc actually down during this time (and especially on the nodes not being updated currently) as reported in the logs you attached for controller0? Can you check the state of the services on the other controllers, in particular "pcs status | grep -ni stop -C 2" to see what is stopped

Comment 3 Marius Cornea 2015-12-04 15:16:08 UTC
This was a virtual environment and I watched the services status on the HAProxy dashboard - http://${control_virtual_ip}:1993 where I noticed that all the backends went down. 

Note that this happened after yum update was ran on each individual controller, during this nested stack:
overcloud-ControllerNodesPostDeployment-vlyhgsyrnidy-ControllerPostPuppet-haywpx4e65oq-ControllerPostPuppetRestartDeployment-fxgcdg4dqbhi

Also during that time I ran a nova command on the overcloud and got a timeout.

I'm going to catch the pcs status on all controller nodes during the next update and get back to you.

Comment 4 Marius Cornea 2015-12-07 11:33:28 UTC
Created attachment 1103181 [details]
pcs status

Attaching the output of pcs status. Here is the output of nova list during this period(looks like first it cannot reach nova-api and then it cannot reach keystone. 

stack@instack:~>>> nova list
ERROR (ConnectionRefused): Unable to establish connection to http://172.16.23.10:8774/v2/5fc68ae206d449db88fd3cb3e8a9108c/servers/detail
stack@instack:~>>> nova list
No handlers could be found for logger "keystoneclient.auth.identity.generic.base"
ERROR (ConnectionRefused): Unable to establish connection to http://172.16.23.10:5000/v2.0/tokens

Comment 8 Marius Cornea 2015-12-07 13:04:42 UTC
All the nodes have been updated so this happens post yum update(it looks to me like a cluster restart). I saw this happening on each of my update attempts.

Comment 12 Mike Burns 2016-04-07 21:00:12 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 14 Jaromir Coufal 2016-10-10 04:31:40 UTC
7.2 is no longer available for updates. User goes to latest 7.x (7.3 at the moment).