Bug 1565376
Summary: | [Deployment] haproxy containers stopped on two controllers on fresh deployment | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sai Sindhur Malleni <smalleni> |
Component: | puppet-tripleo | Assignee: | Tim Rozet <trozet> |
Status: | CLOSED ERRATA | QA Contact: | Tomas Jamrisko <tjamrisk> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 13.0 (Queens) | CC: | aadam, atelang, bperkins, jjoyce, josorior, jschluet, mkolesni, nyechiel, ojanas, rscarazz, slinaber, tvignaud |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | odl_deployment | ||
Fixed In Version: | puppet-tripleo-8.3.2-2.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
N/A
|
|
Last Closed: | 2018-06-27 13:50:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sai Sindhur Malleni
2018-04-09 23:55:54 UTC
EDIT I have noticed that the haproxy containers are stopped on controller-1 and controller-2 on fresh deployment as well. On the machines we see that the problem related to the container is really specific, and you can see it by starting the container by hand: [ALERT] 099/133036 (10) : Starting proxy opendaylight_ws: cannot bind socket [172.16.0.15:8185] [ALERT] 099/133036 (10) : Starting proxy opendaylight_ws: cannot bind socket [192.168.24.59:8185] Which should mean that the ports that haproxy want to use are occupied by something, but in fact what we have on the controller is: [root@overcloud-controller-1 heat-admin]# netstat -nlp|grep 8185 tcp 0 0 172.16.0.20:8185 0.0.0.0:* LISTEN 496289/java So the local IP of the machine 172.16.0.20 correctly listens with the opendaylight service (driven by the container) and nothing else. One particular thing is that controller-1 do not have any VIP on it, and the problem does not happen on controller-0, where the VIP lives. Commenting the opendaylight_ws section in /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg on the machine makes haproxy start, but it remains to be understood why it cannot bind the port. Just a quick update, looks like when the VIP moves to controller-2, haproxy container is started on controller-2 and stopped on the others. So looks like haproxy seems to be working only on the controller that has the VIP. The reason it is not starting on non-VIP control nodes is because the binding for haproxy is not set to transparent. Therefore since the nodes do not have the VIP on their machines, haproxy will not start because it cannot bind to those IPs. Most other services use transparent mode which will allow haproxy to start even when it does not have the referenced bind address. Therefore the behavior is expected here. However, is the behavior correct? On both Zaqar Websocket and ODL Websocket services we are not using transparent binding with a note from Juan indicating it is done intentionally: if $zaqar_ws { ::tripleo::haproxy::endpoint { 'zaqar_ws': public_virtual_ip => $public_virtual_ip, internal_ip => hiera('zaqar_ws_vip', $controller_virtual_ip), service_port => $ports[zaqar_ws_port], ip_addresses => hiera('zaqar_ws_node_ips', $controller_hosts_real), server_names => hiera('zaqar_ws_node_names', $controller_hosts_names_real), mode => 'http', haproxy_listen_bind_param => [], # We don't use a transparent proxy here I'm guessing there is some issue with using transparent proxy with websocket, but we need Juan to tell us what the original issue here was. Changed the HAProxy configuration for opendaylight_ws to include transparent on the controllers and restarted haproxy-bundle. Tried VM boot ping scenario, VMs go into ACTIVE as expected and are pingable. -------------------------------------------------------------------------------- +-----------------------------------------------------------------------------------------------------------------------------------+ | Response Times (sec) | +--------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+ | Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count | +--------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+ | neutron.create_router | 1.62 | 1.899 | 3.281 | 3.89 | 4.327 | 2.259 | 100.0% | 50 | | neutron.create_network | 0.246 | 0.462 | 0.612 | 0.689 | 0.774 | 0.444 | 100.0% | 50 | | neutron.create_subnet | 0.582 | 0.849 | 1.014 | 1.038 | 1.421 | 0.854 | 100.0% | 50 | | neutron.add_interface_router | 2.029 | 2.42 | 2.859 | 2.966 | 3.152 | 2.453 | 100.0% | 50 | | nova.boot_server | 38.003 | 77.645 | 90.082 | 91.992 | 92.988 | 75.522 | 100.0% | 50 | | vm.attach_floating_ip | 3.779 | 5.075 | 5.671 | 5.786 | 6.557 | 5.051 | 100.0% | 50 | | -> neutron.create_floating_ip | 1.374 | 1.711 | 2.074 | 2.132 | 2.156 | 1.752 | 100.0% | 50 | | -> nova.associate_floating_ip | 2.029 | 3.24 | 3.978 | 4.175 | 4.655 | 3.298 | 100.0% | 50 | | vm.wait_for_ping | 0.019 | 0.023 | 0.028 | 0.029 | 121.23 | 4.851 | 96.0% | 50 | | total | 47.474 | 88.715 | 101.95 | 103.039 | 215.239 | 91.436 | 96.0% | 50 | | -> duration | 46.474 | 87.715 | 100.95 | 102.039 | 214.239 | 90.436 | 96.0% | 50 | | -> idle_duration | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 96.0% | 50 | +--------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+ FYI, this test was to launch and delete 50 VMs at a concurrency of 8. haproxy-bundle is started on all 3 controllers after this change. [root@overcloud-controller-0 heat-admin]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: overcloud-controller-0 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum Last updated: Mon Apr 16 20:10:46 2018 Last change: Mon Apr 16 19:49:34 2018 by hacluster via crmd on overcloud-controller-1 12 nodes configured 37 resources configured Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] GuestOnline: [ galera-bundle-0@overcloud-controller-0 galera-bundle-1@overcloud-controller-1 galera-bundle-2@overcloud-controller-2 rabbitmq-bundle-0@overcloud-controller-0 rabbitmq-bundle-1@overcloud-controller-1 rabbitmq-bundle-2@overcloud-controller-2 redis-bundle-0@overcloud-controller-0 redis-bundle-1@overcloud-controller-1 redis-bundle-2@overcloud-controller-2 ] Full list of resources: Docker container set: rabbitmq-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2 Docker container set: galera-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1 galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-controller-2 Docker container set: redis-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2 ip-192.168.24.60 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.21.0.100 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.16.0.19 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.16.0.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.18.0.18 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.19.0.12 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 Docker container set: haproxy-bundle [docker-registry.engineering.redhat.com/rhosp13/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-1 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-2 Docker container: openstack-cinder-volume [docker-registry.engineering.redhat.com/rhosp13/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-1 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled The upstream change was merged on Apr 20th, moving this to POST Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |