1287812 – During 7.1 -> 7.2 update all the HAProxy exposed services go down for approximately 3 minutes

Bug 1287812 - During 7.1 -> 7.2 update all the HAProxy exposed services go down for approximately 3 minutes

Summary: During 7.1 -> 7.2 update all the HAProxy exposed services go down for approxi...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	10.0 (Newton)
Assignee:	Hugh Brock
QA Contact:	Shai Revivo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-02 18:26 UTC by Marius Cornea
Modified:	2016-10-10 04:31 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-10-10 04:31:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
haproxy.log (50.99 KB, text/plain) 2015-12-02 18:26 UTC, Marius Cornea	no flags	Details
pcs status (14.43 KB, text/plain) 2015-12-07 11:33 UTC, Marius Cornea	no flags	Details
View All

Description Marius Cornea 2015-12-02 18:26:53 UTC

Created attachment 1101576 [details]
haproxy.log

Description of problem:
During 7.1 -> 7.2 update in HA environment all the HAProxy exposed services go down for approximately 3 minutes getting the cloud unaccessible during this timeframe.


How reproducible:
100%

Steps to Reproduce:
1. Deploy 7.1 overcloud
2. Start update to 7.2 procedure 

Actual results:
During 
overcloud-ControllerNodesPostDeployment-vlyhgsyrnidy-ControllerPostPuppet-haywpx4e65oq-ControllerPostPuppetRestartDeployment-fxgcdg4dqbhi

all the haproxy services backends go down for a timeframe of approximately 3 minutes.

Expected results:
The overcloud is still accessible during the update. 

Additional info:
Attaching the haproxy log from one of the controller during the timeframe the services were unavailable. Please let me know if further logs are needed.

Comment 2 Marios Andreou 2015-12-04 13:53:22 UTC

so is this (what we see in the haproxy log) a connection problem for *this* node... i mean, are the overcloud services themselves, neutron/nova etc actually down during this time (and especially on the nodes not being updated currently) as reported in the logs you attached for controller0? Can you check the state of the services on the other controllers, in particular "pcs status | grep -ni stop -C 2" to see what is stopped

Comment 3 Marius Cornea 2015-12-04 15:16:08 UTC

This was a virtual environment and I watched the services status on the HAProxy dashboard - http://${control_virtual_ip}:1993 where I noticed that all the backends went down. 

Note that this happened after yum update was ran on each individual controller, during this nested stack:
overcloud-ControllerNodesPostDeployment-vlyhgsyrnidy-ControllerPostPuppet-haywpx4e65oq-ControllerPostPuppetRestartDeployment-fxgcdg4dqbhi

Also during that time I ran a nova command on the overcloud and got a timeout.

I'm going to catch the pcs status on all controller nodes during the next update and get back to you.

Comment 4 Marius Cornea 2015-12-07 11:33:28 UTC

Created attachment 1103181 [details]
pcs status

Attaching the output of pcs status. Here is the output of nova list during this period(looks like first it cannot reach nova-api and then it cannot reach keystone. 

stack@instack:~>>> nova list
ERROR (ConnectionRefused): Unable to establish connection to http://172.16.23.10:8774/v2/5fc68ae206d449db88fd3cb3e8a9108c/servers/detail
stack@instack:~>>> nova list
No handlers could be found for logger "keystoneclient.auth.identity.generic.base"
ERROR (ConnectionRefused): Unable to establish connection to http://172.16.23.10:5000/v2.0/tokens

Comment 8 Marius Cornea 2015-12-07 13:04:42 UTC

All the nodes have been updated so this happens post yum update(it looks to me like a cluster restart). I saw this happening on each of my update attempts.

Comment 12 Mike Burns 2016-04-07 21:00:12 UTC

This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 14 Jaromir Coufal 2016-10-10 04:31:40 UTC

7.2 is no longer available for updates. User goes to latest 7.x (7.3 at the moment).

Note You need to log in before you can comment on or make changes to this bug.