Description of problem: When >1 nova-consoleauth services are running on the same cloud, i.e., multiple controller systems for HA, nova-novncproxy fails to establish a connection to the VNC console. Only after right-clicking and select "Reload Frame" in the VNC iframe, does a connection succeed. Version-Release number of selected component (if applicable): Folsom and Grizzly How reproducible: Every Steps to Reproduce: 1. Install a pair of controller nodes, each with nova-consoleauth and nova-novncproxy service running 2. Navigate to the dashboard, click on a running VM, and select the VNC (Folsom) or Console (Grizzly) tab. Actual results: Red bar with text: "Failed to connect to server (code: 1006)" Expected results: Grey bar with text: "Connected (encrypted) to: QEMU (instance-00000390)" Additional info: Known bug, apparently. Lame solution (e.g., don't run >1 consoleauth service) https://bugs.launchpad.net/horizon/+bug/1068602
Actually the solution provided upstream is wrong. nova-consoleauth has been supporting storing the tokens in memcache since folsom (See https://bugs.launchpad.net/nova/+bug/989337) which allows having multiple services in the same system. When not configured the tokens will be stored in memory causing the problem you and the bug you link describe. To make consoleauth use memcached you'll need to: 1. Install and start memcached in one of the hosts: - yum install memcached - chkconfig memcached on && service memcached start 2. Set the key 'memcached_servers' under DEFAULT section of nova.conf to point to the ip and port where memached is listening (do this in all the hosts with consoleauth): - memcached_servers = <memcached ip>:11211 4. Restart all of the consoleauth services With this configuration, all the consoleauth services will know about all the tokens and it will be possible to have multiple services running in the same cluster.
The goal is to make all services - including memcached - clustered, shared-nothing for high availability. Installing *a single* memcached server is, again, not solving the problem. Why is that all the other services, glance, nova-api, etc. appear to not have the problem that nova-consoleauth has? I think nova-consoleauth has a problem that needs to be fixed.
Ouch, I read the description too fast and missed the HA part, sorry. I agree with you on that consoleauth has a problem and it should be solved if we want to have several instances in the same cluster. The main problem here is that consoleauth wrongly using memcached to store the tokens and the connection info instead of using it only as a cache for the data in the database (which is what memcached is meant for). I've just opened a blueprint upstream (see bz's url) for this issue and I'll propose a patch soon.
But, let's back up a minute - I'm not using memcached anywhere for anything (I probably should be for the dashboard, but... that's a different topic), so you can leave memcached out of the equation.
(In reply to Dan Yocum from comment #5) > But, let's back up a minute - I'm not using memcached anywhere for anything > (I probably should be for the dashboard, but... that's a different topic), > so you can leave memcached out of the equation. Sure, I get what you mean. I was just explaining what consoleauth is doing wrong with memcached right now. What I am suggesting (see linked blueprint) is to make consoleauth store the tokens and the connection info in the database when registering them so other services of the same type can access them. This way we won't have to rely on only one consoleauth or memcached service. The caching thing would be kept the same, where you can configure it to use memcached optionally. I hope it makes sense now.
Does that mean that the issue is resolved in Icehouse, then?
(In reply to Dan Yocum from comment #8) > Does that mean that the issue is resolved in Icehouse, then? TBD, code has been submitted but it has not yet been merged (then of course there is the question of whether it passes testing ;)).
Stephen, can you verify if this was merged into Icehouse?
Can't imagine it was based on upstream state. It made it to POST because a patch was submitted but never progressed to MODIFIED because it was not merged.
Pardon my ignorance on process, but where does that leave this BZ. Does it need to be moved to/approved for Target Release 6 at this point?
So to summarize the points that were mentioned above before closing the bug. 1) The behavior described in the bug originally was due to misconfiguration (we need to be running memchached and configure all the consolauth services to use it) as described in comment #2 2) THe HA side of thing that has been raised in comment #3 has not been fully addressed so I will address it here. When running multiple memchached servers, python memcached client we ship (python-memcached-1.48-4.el7.noarch) and nova consoleauth uses is smart enough to treat them as a simple consistent hashing ring. Basically if one of the servers goes down - it's tokens will be lost and all the sessions that were stored on it will be invalidated, but further writes and thus authentication will still work as long as there is at least one server running. We have agreed that this is sufficient for us to consider this resilient. Based on this - closing as NOTABUG, however feel free to revisit in case you disagree with the above.
This bug is still an issue in Icehouse. No, consoleauth must not REQUIRE memcached in an HA environment - nothing else does. The blueprint referred to in comment #4 was unapproved. What happens, now?
So we would really recommend not pursuing the direction of the patch (storing tokens in the DB). The fanout topic seems like a much better choice.
This however will require work upstream - so targeting this bug for RHOS 7 (although it is ulikely that it will merge in Kilo at this point. Pablo - is customer fine with this being worked on for the next version?
This bug was closed as part of a backlog clean up. If you see value in tracking this bug please re-open it.