Description of problem: HAProxy health check for nova_ec2 fails for all backend nodes in HA setup with 3 controller nodes. Backend nova_ec2 API is up but requires authentication so it returns a 400 response for the haproxy checks. Version-Release number of selected component (if applicable): openstack-puppet-modules-2015.1.7-5.el7ost.noarch How reproducible: 100% Actual results: [root@overcloud-controller-2 ~]# journalctl -l -o cat -u haproxy | grep -i nova_ec2 Server nova_ec2/overcloud-controller-0 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Server nova_ec2/overcloud-controller-1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Server nova_ec2/overcloud-controller-2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. proxy nova_ec2 has no server available! [root@overcloud-controller-2 ~]# grep -A5 nova_ec2 /etc/haproxy/haproxy.cfg listen nova_ec2 bind 192.0.2.17:8773 option httpchk GET / server overcloud-controller-0 192.0.2.23:8773 check fall 5 inter 2000 rise 2 server overcloud-controller-1 192.0.2.19:8773 check fall 5 inter 2000 rise 2 server overcloud-controller-2 192.0.2.21:8773 check fall 5 inter 2000 rise 2 [root@overcloud-controller-2 ~]# curl -I http://192.0.2.23:8773 HTTP/1.1 400 Bad Request Content-Type: text/xml Content-Length: 203 Date: Sun, 28 Jun 2015 10:50:22 GMT [root@overcloud-controller-2 ~]# curl -I http://192.0.2.23:8773 HTTP/1.1 400 Bad Request Content-Type: text/xml Content-Length: 203 Date: Sun, 28 Jun 2015 10:50:24 GMT [root@overcloud-controller-2 ~]# curl -I http://192.0.2.23:8773 HTTP/1.1 400 Bad Request Content-Type: text/xml Content-Length: 203 Date: Sun, 28 Jun 2015 10:50:28 GMT [root@overcloud-controller-2 ~]# lsof -i :8773 -n -P COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME nova-api 12241 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12643 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12644 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12645 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12646 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12766 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12767 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12770 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12772 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12821 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12822 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12823 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) nova-api 12824 nova 6u IPv4 130484 0t0 TCP 192.0.2.21:8773 (LISTEN) haproxy 20229 haproxy 18u IPv4 48484 0t0 TCP 192.0.2.17:8773 (LISTEN) Expected results: HAProxy detects valid nova_ec2 api status.
Giulio, please let us know once you have had a chance look at this to assess whether it's really a blocking issue.
JK. Jiri, please take a look and let us know.
Thanks Marius for excellent info in the bug description. EC2 API returns 400 unless authentication details are provided, and HAProxy then thinks there's some problem with the service. However, the core issue for us here is that EC2 API probably shouldn't be running at all, as it's not present in the Pacemaker HA docs and we don't have it pacemakerized. It's getting started and haproxied because the puppet modules take over the defaults from non-pacemaker deployment.
I don't think having it running would negatively affect the other services, but we probably shouldn't run services which we don't support. I think disabling it should be easy.
(Correction of my comment #5 -- we do pacemakerize the EC2 API because the APIs are served by a single service openstack-nova-api, it's just listening on multiple ports.) The Pacemaker HA docs probably don't disable EC2 API because they don't set the enabled_apis config option and the default is "ec2,osapi_compute,metadata" [1]. But there's no mention of port 8773 in the loadbalancer doc [2]. So in the ref arch the EC2 API is enabled (accessible on physical IPs) but not HAProxied (not accessible on VIP)? I can update the config to do exactly that but i'm not sure if that's the expected correct state. Andrew can you please shed some light on this? Maybe we should disable the EC2 API entirely, or add it to HAProxy? [1] https://github.com/beekhof/osp-ha-deploy/blob/f73eec96ddd9c7f2c85dbb0348ff909f144631ec/pcmk/nova.scenario [2] https://github.com/beekhof/osp-ha-deploy/blob/f73eec96ddd9c7f2c85dbb0348ff909f144631ec/pcmk/lb.scenario
(In reply to Jiri Stransky from comment #7) > (Correction of my comment #5 -- we do pacemakerize the EC2 API because the > APIs are served by a single service openstack-nova-api, it's just listening > on multiple ports.) > > The Pacemaker HA docs probably don't disable EC2 API because they don't set > the enabled_apis config option and the default is > "ec2,osapi_compute,metadata" [1]. But there's no mention of port 8773 in the > loadbalancer doc [2]. So in the ref arch the EC2 API is enabled (accessible > on physical IPs) but not HAProxied (not accessible on VIP)? I can update the > config to do exactly that but i'm not sure if that's the expected correct > state. Andrew can you please shed some light on this? Maybe we should > disable the EC2 API entirely, or add it to HAProxy? I can narrow the question down. Do we want to support the EC2 API. yes - add port 8773 to the load balancer. no - disable EC2 with the enabled_apis config option. The fact that EC2 is enabled yet not load balanced indicates that this was an oversight on our part. The decision as to whether or not we are interested in exposing the EC2 API is not something I can answer.
Talked to jayg on daily scrum - Astapor doesn't expose EC2 API because it doesn't behave well under HA. There's no port 8773 in Astapor's Nova loadbalancer manifest either [1], maybe the right thing could be to stay consistent with Astapor then. [1] https://github.com/redhat-openstack/astapor/blob/d793eeb5f559874bab95177189aacbdcf06c092e/puppet/modules/quickstack/manifests/load_balancer/nova.pp
> I can narrow the question down. Do we want to support the EC2 API. > > yes - add port 8773 to the load balancer. > no - disable EC2 with the enabled_apis config option. > > The fact that EC2 is enabled yet not load balanced indicates that this was > an oversight on our part. The decision as to whether or not we are > interested in exposing the EC2 API is not something I can answer. The EC2 API is supported from a Nova point of view, albeit deprecated at this point in the hope that we will eventually move to the newer out of tree EC2 API implementation which includes broader coverage of the relevant APIS. We do have a number of customers that use it. I do not however believe this is a release blocker, as long as at worst we release with a similar level of inclusivity with regards to the EC2 API as we did in the OpenStack Platform 6 deployment architecture.
Done upstream. Can be backported once we get acks.
Using openstack-puppet-modules-7.0.16-1.el7ost.noarch.rpm the HAProxy listener is not using httpchk anymore for the nova_ec2 listener; backends are still seen DOWN though because we don't enable the ec2 API by default. I think the BZ can be closed as the problem won't be seen anymore if the service is enabled.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0604.html