Bug 1572807 - [Deployment] haproxy shows 'NO SERV' despite opendaylight up and operational
Summary: [Deployment] haproxy shows 'NO SERV' despite opendaylight up and operational
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 13.0 (Queens)
Assignee: Tim Rozet
QA Contact: Tomas Jamrisko
URL:
Whiteboard: odl_deployment
Depends On:
Blocks: 1572808
TreeView+ depends on / blocked
 
Reported: 2018-04-27 23:18 UTC by Waldemar Znoinski
Modified: 2018-10-18 07:22 UTC (History)
10 users (show)

Fixed In Version: puppet-tripleo-8.3.2-5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
N/A
Last Closed: 2018-06-27 13:53:50 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1768037 None None None 2018-04-30 14:18:44 UTC
OpenStack gerrit 565539 'None' MERGED Fixes HA Proxy backend check for ODL 2020-01-28 15:51:14 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:55:07 UTC

Description Waldemar Znoinski 2018-04-27 23:18:02 UTC
Description of problem:

I'm using opendaylight osp13 puddle 2018-04-26.3 + opendaylight-8.0.0-7 with https://url.corp.redhat.com/a894030 included (to fix bootstrap features related odl startup failure issue seen with regular 8.0.0-7 RPM)
my impression is it's just wrong configuration of haproxy port 8081 entry but let's see what the problem is first...

deployment error: overcloud deployment fails with:
    overcloud.AllNodesDeploySteps.ComputeDeployment_Step4.1:
      resource_type: OS::Heat::StructuredDeployment
      physical_resource_id: fa9ae9fc-3d07-4a08-98b5-bd0f7b255c01
      status: CREATE_FAILED
      status_reason: |
        Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
      deploy_stdout: |
        ...
                "Error: curl -k -o /dev/null --fail --silent --head -u odladmin:redhat http://172.17.1.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]",
                "Error: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Wait for NetVirt OVSDB to come up]/returns: change from notrun to 0 failed: curl -k -o /dev/null --fail --silent --head -u odladmin:redhat http://172.17.1.10:8081/r
    estconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]",
                "Warning: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Set OVS Manager to OpenDaylight]: Skipping because of failed dependencies"
            ]
        }
            to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/fb0303fd-6eb6-4986-a5e1-3e84ba47fcba_playbook.retry

        PLAY RECAP *********************************************************************
        localhost                  : ok=4    changed=1    unreachable=0    failed=1

        (truncated, view all with --long)
      deploy_stderr: |

checking manually from controller (gives same results as the failing overcloud task):
    [root@controller-0 ~]# curl -k -o /dev/null --fail --silent --head -u odladmin:redhat http://172.17.1.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    [root@controller-0 ~]# echo $?
    22

but the opendaylight (when queried at IP java is listening on) works:
    [root@controller-0 ~]# curl --head -u odladmin:redhat http://172.17.1.16:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.1 200 OK
    Set-Cookie: JSESSIONID=1c2om024hogzs1ksv8cf836hkw;Path=/restconf
    Expires: Thu, 01 Jan 1970 00:00:00 GMT
    Set-Cookie: rememberMe=deleteMe; Path=/restconf; Max-Age=0; Expires=Thu, 26-Apr-2018 18:07:46 GMT
    Content-Type: application/yang.data+json
    Content-Length: 0

    [root@controller-0 ~]# curl -u odladmin:redhat http://172.17.1.16:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    {"topology":[{"topology-id":"netvirt:1"}]}

yet querying the same using IP that haproxy is listening on fails:
    [root@controller-0 ~]# curl --head -u odladmin:redhat http://172.17.1.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.0 503 Service Unavailable
    Cache-Control: no-cache
    Connection: close
    Content-Type: text/html

details of which IP/port which process is listening on:
    [root@controller-0 ~]# netstat -nap | grep 8081 | grep -i listen
    tcp        0      0 192.168.24.6:8081       0.0.0.0:*               LISTEN      63589/haproxy
    tcp        0      0 172.17.1.10:8081        0.0.0.0:*               LISTEN      63589/haproxy
    tcp        0      0 172.17.1.16:8081        0.0.0.0:*               LISTEN      36736/java

haproxy containers
root@controller-0 log]# docker ps -a | grep -i haproxy
    3ca6e5f4dcd4        192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest                         "/bin/bash /usr/lo..."   4 hours ago         Up 4 hours                                   haproxy-bundle-docker-0
    129460f0fb25        192.168.24.1:8787/rhosp13/openstack-haproxy:2018-04-26.3                       "/docker_puppet_ap..."   4 hours ago         Exited (0) 4 hours ago                       haproxy_init_bundle
    a4edb453a267        192.168.24.1:8787/rhosp13/openstack-haproxy:2018-04-26.3                       "/bin/bash -c '/us..."   4 hours ago         Exited (0) 4 hours ago                       haproxy_image_tag


haproxy conf:
    listen opendaylight
      bind 172.17.1.10:8081 transparent
      bind 192.168.24.6:8081 transparent
      mode http
      http-request set-header X-Forwarded-Proto https if { ssl_fc }
      http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
      option httpchk
      option httplog
      server controller-0.internalapi.localdomain 172.17.1.16:8081 check fall 5 inter 2000 rise 2

opendaylight_api container status:
    290cad3b8471        192.168.24.1:8787/rhosp13/openstack-opendaylight:2018-04-26.3                  "kolla_start"            3 hours ago         Up 3 hours (healthy)                         opendaylight_api


double checking on a different deployment with the same puddle + RPM - same story:
    [root@controller-0 ~]# netstat -nap | grep -i 8081  | grep -i listen
    tcp        0      0 192.168.24.12:8081      0.0.0.0:*               LISTEN      63451/haproxy
    tcp        0      0 172.17.1.13:8081        0.0.0.0:*               LISTEN      63451/haproxy
    tcp        0      0 172.17.1.18:8081        0.0.0.0:*               LISTEN      36669/java

    [root@controller-0 ~]# curl --head -u odladmin:redhat http://172.17.1.18:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.1 200 OK
    Set-Cookie: JSESSIONID=dpvpjls9rrn094opsgaii8i4;Path=/restconf
    Expires: Thu, 01 Jan 1970 00:00:00 GMT
    Set-Cookie: rememberMe=deleteMe; Path=/restconf; Max-Age=0; Expires=Thu, 26-Apr-2018 21:11:43 GMT
    Content-Type: application/yang.data+json
    Content-Length: 0

    [root@controller-0 ~]# curl --head -u odladmin:redhat http://172.17.1.13:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.0 503 Service Unavailable
    Cache-Control: no-cache
    Connection: close
    Content-Type: text/html

    [root@controller-0 ~]# curl --head -u odladmin:redhat http://192.168.24.12:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.0 503 Service Unavailable
    Cache-Control: no-cache
    Connection: close
    Content-Type: text/html

    Apr 27 22:12:56 controller-0 haproxy[63451]: 172.17.1.13:54554 [27/Apr/2018:22:12:56.602] opendaylight opendaylight/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "HEAD /restconf/operational/network-topology:network-topology/topology

    ()[root@controller-0 /]# curl http://172.17.1.13:8081
    <html><body><h1>503 Service Unavailable</h1>
    No server is available to handle this request.
    </body></html>
    curl (http://172.17.1.13:8081/): response: 503, time: 0.000, size: 107

##### on a working deployment with 2018-04-19.2 puddle and odl RPM

haproxy containers:
    [root@controller-0 ~]# docker ps -a | grep -i haproxy
    7e9ab960c557        192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest                         "/bin/bash /usr/lo..."   11 hours ago        Up 11 hours                                       haproxy-bundle-docker-0
    3f30d7dea216        192.168.24.1:8787/rhosp13/openstack-haproxy:2018-04-19.2                       "/docker_puppet_ap..."   7 days ago          Exited (0) 7 days ago                             haproxy_init_bundle
    be525f9a0917        192.168.24.1:8787/rhosp13/openstack-haproxy:2018-04-19.2                       "/bin/bash -c '/us..."   7 days ago          Exited (0) 7 days ago                             haproxy_image_tag


haproxy conf:
    listen opendaylight
      bind 172.17.1.12:8081 transparent
      bind 192.168.24.13:8081 transparent
      mode http
      balance source
      server controller-0.internalapi.localdomain 172.17.1.14:8081 check fall 5 inter 2000 rise 2


    [root@controller-0 ~]# netstat -nap | grep -i 8081.*listen
    tcp        0      0 192.168.24.13:8081      0.0.0.0:*               LISTEN      11710/haproxy
    tcp        0      0 172.17.1.12:8081        0.0.0.0:*               LISTEN      11710/haproxy
    tcp        0      0 172.17.1.14:8081        0.0.0.0:*               LISTEN      7079/java

running the same query on ODL IP
    [root@controller-0 ~]# curl --head -u odladmin:redhat http://172.17.1.14:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.1 200 OK
    Set-Cookie: JSESSIONID=gdj0qg38kug97ql7ord7tg6b;Path=/restconf
    Expires: Thu, 01 Jan 1970 00:00:00 GMT
    Set-Cookie: rememberMe=deleteMe; Path=/restconf; Max-Age=0; Expires=Thu, 26-Apr-2018 21:56:04 GMT
    Content-Type: application/yang.data+json
    Content-Length: 0

quering the HAPROXY ip works:
    [root@controller-0 ~]# curl --head -u odladmin:redhat http://172.17.1.12:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1
    HTTP/1.1 200 OK
    Set-Cookie: JSESSIONID=1jc9tuibcb0qw1ja7za7y53n37;Path=/restconf
    Expires: Thu, 01 Jan 1970 00:00:00 GMT
    Set-Cookie: rememberMe=deleteMe; Path=/restconf; Max-Age=0; Expires=Thu, 26-Apr-2018 21:56:23 GMT
    Content-Type: application/yang.data+json
    Content-Length: 0

while it may look like wrongly built RPM that I built the above suggests the odl itself is ok, starts fine 


... after fidling around and commenting out the below line and restarting haproxy container the curl that was failing started to work:
      option httpchk
confirming with last known working puddle (2018-04-24.1) the haproxy conf doesn't have that option: https://url.corp.redhat.com/ca9c3f4 

Version-Release number of selected component (if applicable):
osp13 puddle 2018-04-26.3 + opendaylight-8.0.0-7 with https://url.corp.redhat.com/a894030

How reproducible:
100%

Steps to Reproduce:
1. deploy osp13 with above puddle (odl version is not relevant by the looks of things)
2.
3.

Actual results:
deployment fails, haproxy shows 'NO SRV' for ODL's port 8081 (which actually works fine when curl'd)

Expected results:
deployment to work, haproxy passing thru requests to opendaylight's 8081 with no problem

Additional info:
 wasn't the above enough ? :)

Comment 1 Waldemar Znoinski 2018-04-30 10:10:57 UTC
it looks like after https://github.com/openstack/puppet-tripleo/commit/8f9a98888efdb75fd877b24c946214d60fdfacec which brought the default values (including httpck) is causing it

Comment 2 Janki 2018-04-30 12:04:28 UTC
Can you please make below changes to the manifest and try deploying?

listen_options  => {
        'balance' => 'roundrobin',
      },

so the whole ODL config section would look like

  if $opendaylight {
    ::tripleo::haproxy::endpoint { 'opendaylight':
      internal_ip     => unique([hiera('opendaylight_api_vip', $controller_virtual_ip), $controller_virtual_ip]),
      service_port    => $ports[opendaylight_api_port],
      ip_addresses    => hiera('opendaylight_api_node_ips', $controller_hosts_real),
      server_names    => hiera('opendaylight_api_node_names', $controller_hosts_names_real),
      mode            => 'http',
      member_options  => union($haproxy_member_options, $internal_tls_member_options),
      service_network => $opendaylight_network,
      listen_options  => {
        'balance' => 'roundrobin',
      },
    }

    ::tripleo::haproxy::endpoint { 'opendaylight_ws':
      internal_ip     => unique([hiera('opendaylight_api_vip', $controller_virtual_ip), $controller_virtual_ip]),
      service_port    => $ports[opendaylight_ws_port],
      ip_addresses    => hiera('opendaylight_api_node_ips', $controller_hosts_real),
      server_names    => hiera('opendaylight_api_node_names', $controller_hosts_names_real),
      mode            => 'http',
      service_network => $opendaylight_network,
      listen_options  => {
        # NOTE(jaosorior): Websockets have more overhead in establishing
        # connections than regular HTTP connections. Also, since it begins
        # as an HTTP connection and then "upgrades" to a TCP connection, some
        # timeouts get overridden by others at certain times of the connection.
        # The following values were taken from the following site:
        # http://blog.haproxy.com/2012/11/07/websockets-load-balancing-with-haproxy/
        'timeout' => ['connect 5s', 'client 25s', 'server 25s', 'tunnel 3600s'],
        'balance' => 'roundrobin',
      },
    }

Comment 3 Mike Kolesnik 2018-04-30 12:22:15 UTC
(In reply to Waldemar Znoinski from comment #1)
> it looks like after
> https://github.com/openstack/puppet-tripleo/commit/
> 8f9a98888efdb75fd877b24c946214d60fdfacec which brought the default values
> (including httpck) is causing it

Just to clarify, the removal of the listen_options field caused puppet to revert to 'roundrobin' which is the default, but also added the hhtpchk option which is what's causing the haproxy to flip out.

Comment 13 errata-xmlrpc 2018-06-27 13:53:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.