Bug 1238336

Summary: 2/3 requests for Nova instance VNC console fail in HA overcloud
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: calfonso, dmacpher, gfidente, gkeegan, jstransk, mburns, ohochman, rhel-osp-director-maint, rohara, rrosa
Target Milestone: gaKeywords: Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-43.el7ost Doc Type: Bug Fix
Doc Text:
Controller nodes did not share consoleauth tokens, which caused failures with parts of authentication requests. This fix incorporates memcached to share consoleauth tokens. Authentication requests are now successful.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:58:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2015-07-01 15:46:15 UTC
Description of problem:
I'm running a HA setup with 3 controllers and 1 compute on baremetal with network isolation. When trying to access an overcloud instance console via Horizon 2/3 requests fail. It appears that it loads only when requests are hitting one of the controllers (overcloud-controller-1 in my tests)

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with 3 controllers and network isolation
2. Run instance on the overcloud
3. Load VNC console of that instance from Horizon
4. If you get a Failed to connect to server (code: 1006) message hit refresh. 

Actual results:
Only 1 of 3 requests get the console loaded.

Expected results:
Console loads all the time.

Additional info:
Addresses in 10.35.169.0 network are in the internalAPI network and 10.35.173.10 is the public IP in the external network.  

[heat-admin@overcloud-controller-0 ~]$ sudo grep novncproxy_base_url /etc/nova/nova.conf  | grep -v ^#
novncproxy_base_url=http://10.35.169.14:6080/vnc_auto.html
[heat-admin@overcloud-controller-0 ~]$ sudo lsof -i :6080 -n -P
COMMAND     PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
haproxy   22719 haproxy    3u  IPv4 1656925      0t0  TCP 10.35.173.10:6080->10.34.131.165:38384 (ESTABLISHED)
haproxy   22719 haproxy   31u  IPv4   69189      0t0  TCP 10.35.169.10:6080 (LISTEN)
haproxy   22719 haproxy   32u  IPv4   69190      0t0  TCP 10.35.173.10:6080 (LISTEN)
haproxy   22719 haproxy   59u  IPv4 1656927      0t0  TCP 10.35.169.14:34645->10.35.169.11:6080 (ESTABLISHED)
nova-novn 51784    nova    4u  IPv4  152844      0t0  TCP 10.35.169.14:6080 (LISTEN)


[heat-admin@overcloud-controller-1 ~]$ sudo grep novncproxy_base_url /etc/nova/nova.conf  | grep -v ^#
novncproxy_base_url=http://10.35.169.11:6080/vnc_auto.html
[heat-admin@overcloud-controller-1 ~]$ sudo lsof -i :6080 -n -P
COMMAND     PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
haproxy   24322 haproxy   31u  IPv4   42687      0t0  TCP 10.35.169.10:6080 (LISTEN)
haproxy   24322 haproxy   32u  IPv4   42688      0t0  TCP 10.35.173.10:6080 (LISTEN)
nova-novn 49539    nova    4u  IPv4  101299      0t0  TCP 10.35.169.11:6080 (LISTEN)
nova-novn 83318    nova    4u  IPv4  101299      0t0  TCP 10.35.169.11:6080 (LISTEN)
nova-novn 83318    nova    6u  IPv4 1672483      0t0  TCP 10.35.169.11:6080->10.35.169.14:34645 (ESTABLISHED)


[heat-admin@overcloud-controller-2 ~]$ sudo grep novncproxy_base_url /etc/nova/nova.conf  | grep -v ^#
novncproxy_base_url=http://10.35.169.12:6080/vnc_auto.html
[heat-admin@overcloud-controller-2 ~]$ sudo lsof -i :6080 -n -P
COMMAND     PID    USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
haproxy   21934 haproxy   31u  IPv4  46511      0t0  TCP 10.35.169.10:6080 (LISTEN)
haproxy   21934 haproxy   32u  IPv4  46512      0t0  TCP 10.35.173.10:6080 (LISTEN)
nova-novn 49523    nova    4u  IPv4 145539      0t0  TCP 10.35.169.12:6080 (LISTEN)


[heat-admin@overcloud-compute-0 ~]$ sudo grep novncproxy_base_url /etc/nova/nova.conf  | grep -v ^#
novncproxy_base_url=http://10.35.173.10:6080/vnc_auto.html

[heat-admin@overcloud-controller-0 ~]$ grep -A6 nova_novncproxy /etc/haproxy/haproxy.cfg 
listen nova_novncproxy
  bind 10.35.169.10:6080 
  bind 10.35.173.10:6080 
  option httpchk GET /
  server overcloud-controller-0 10.35.169.14:6080 check fall 5 inter 2000 rise 2
  server overcloud-controller-1 10.35.169.11:6080 check fall 5 inter 2000 rise 2
  server overcloud-controller-2 10.35.169.12:6080 check fall 5 inter 2000 rise 2

Comment 3 chris alfonso 2015-07-01 17:42:52 UTC
Is there a way we can work around this for GA if the once controller with the vncproxy fails?

Comment 4 Jiri Stransky 2015-07-03 16:16:43 UTC
Still not sure about the root cause but by observed behavior it might be related to consoleauth tokens being valid for only one backend server. I'm guessing this will need to be fixed rather than worked around. More investigation tbd.

Comment 7 Ryan O'Hara 2015-07-16 19:02:11 UTC
Could you try adding 'balance source' right below the bind lines in haproxy.cfg for this particular proxy?

Comment 10 errata-xmlrpc 2015-08-05 13:58:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549