rhosp-director: composable roles deployment gets stuck: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Environment: puppet-haproxy-1.5.0-0.20170728184739.6ffcb07.el7ost.noarch haproxy-1.5.18-6.el7.x86_64 instack-undercloud-7.2.1-0.20170729010706.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch Steps to reproduce: Attempt to deploy OC deployment with composable roles. Deployment command: openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -r /home/stack/roles_data.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/docker-images.yaml (undercloud) [stack@undercloud-0 ~]$ cat roles_data.yaml ############################################################################### # File generated by TripleO ############################################################################### ############################################################################### # Role: ControllerOpenstack # ############################################################################### - name: Controller description: | Controller role that does not contain the database, messaging and networking components. Use in combination with the Database, Messaging and Networker roles. tags: - primary - controller networks: - External - InternalApi - Storage - StorageMgmt - Tenant HostnameFormatDefault: '%stackname%-controller-%index%' ServicesDefault: - OS::TripleO::Services::AodhApi - OS::TripleO::Services::AodhEvaluator - OS::TripleO::Services::AodhListener - OS::TripleO::Services::AodhNotifier - OS::TripleO::Services::AuditD - OS::TripleO::Services::BarbicanApi - OS::TripleO::Services::CACerts - OS::TripleO::Services::CeilometerAgentCentral - OS::TripleO::Services::CeilometerAgentNotification - OS::TripleO::Services::CeilometerApi - OS::TripleO::Services::CeilometerExpirer - OS::TripleO::Services::CephExternal - OS::TripleO::Services::CephMds - OS::TripleO::Services::CephMon - OS::TripleO::Services::CephRbdMirror - OS::TripleO::Services::CephRgw - OS::TripleO::Services::CinderApi - OS::TripleO::Services::CinderBackup - OS::TripleO::Services::CinderHPELeftHandISCSI - OS::TripleO::Services::CinderScheduler - OS::TripleO::Services::CinderVolume - OS::TripleO::Services::Collectd - OS::TripleO::Services::Congress - OS::TripleO::Services::Clustercheck - OS::TripleO::Services::Docker - OS::TripleO::Services::Ec2Api - OS::TripleO::Services::Etcd - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::GlanceApi - OS::TripleO::Services::GnocchiApi - OS::TripleO::Services::GnocchiMetricd - OS::TripleO::Services::GnocchiStatsd - OS::TripleO::Services::HAproxy - OS::TripleO::Services::HeatApi - OS::TripleO::Services::HeatApiCfn - OS::TripleO::Services::HeatApiCloudwatch - OS::TripleO::Services::HeatEngine - OS::TripleO::Services::Horizon - OS::TripleO::Services::IronicApi - OS::TripleO::Services::IronicConductor - OS::TripleO::Services::Iscsid - OS::TripleO::Services::Keepalived - OS::TripleO::Services::Kernel - OS::TripleO::Services::Keystone - OS::TripleO::Services::ManilaApi - OS::TripleO::Services::ManilaBackendCephFs - OS::TripleO::Services::ManilaBackendGeneric - OS::TripleO::Services::ManilaBackendNetapp - OS::TripleO::Services::ManilaScheduler - OS::TripleO::Services::ManilaShare - OS::TripleO::Services::Memcached - OS::TripleO::Services::MongoDb - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::NovaApi - OS::TripleO::Services::NovaConductor - OS::TripleO::Services::NovaConsoleauth - OS::TripleO::Services::NovaIronic - OS::TripleO::Services::NovaMetadata - OS::TripleO::Services::NovaPlacement - OS::TripleO::Services::NovaScheduler - OS::TripleO::Services::NovaVncProxy - OS::TripleO::Services::Ntp - OS::TripleO::Services::OctaviaApi - OS::TripleO::Services::OctaviaHealthManager - OS::TripleO::Services::OctaviaHousekeeping - OS::TripleO::Services::OctaviaWorker - OS::TripleO::Services::OpenDaylightApi - OS::TripleO::Services::OpenDaylightOvs - OS::TripleO::Services::OVNDBs - OS::TripleO::Services::OVNController - OS::TripleO::Services::Pacemaker - OS::TripleO::Services::PankoApi - OS::TripleO::Services::Redis - OS::TripleO::Services::SaharaApi - OS::TripleO::Services::SaharaEngine - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Sshd - OS::TripleO::Services::SwiftProxy - OS::TripleO::Services::SwiftRingBuilder - OS::TripleO::Services::SwiftStorage - OS::TripleO::Services::Tacker - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Vpp - OS::TripleO::Services::Zaqar ############################################################################### # Role: Database # ############################################################################### - name: Database description: | Standalone database role with the database being managed via Pacemaker networks: - InternalApi HostnameFormatDefault: '%stackname%-database-%index%' ServicesDefault: - OS::TripleO::Services::AuditD - OS::TripleO::Services::CACerts - OS::TripleO::Services::Collectd - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::Kernel - OS::TripleO::Services::MySQL - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::Ntp - OS::TripleO::Services::Pacemaker - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Docker - OS::TripleO::Services::Sshd ############################################################################### # Role: Messaging # ############################################################################### - name: Messaging description: | Standalone messaging role with RabbitMQ being managed via Pacemaker networks: - InternalApi HostnameFormatDefault: '%stackname%-messaging-%index%' ServicesDefault: - OS::TripleO::Services::AuditD - OS::TripleO::Services::CACerts - OS::TripleO::Services::Collectd - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::Kernel - OS::TripleO::Services::Ntp - OS::TripleO::Services::Pacemaker - OS::TripleO::Services::RabbitMQ - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Sshd - OS::TripleO::Services::Docker ############################################################################### # Role: Networker # ############################################################################### - name: Networker description: | Standalone networking role to run Neutron services their own. Includes Pacemaker integration via PacemakerRemote networks: - InternalApi - External HostnameFormatDefault: '%stackname%-networker-%index%' ServicesDefault: - OS::TripleO::Services::AuditD - OS::TripleO::Services::CACerts - OS::TripleO::Services::Collectd - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::Kernel - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::NeutronApi - OS::TripleO::Services::NeutronBgpVpnApi - OS::TripleO::Services::NeutronCorePlugin - OS::TripleO::Services::NeutronDhcpAgent - OS::TripleO::Services::NeutronL2gwAgent - OS::TripleO::Services::NeutronL2gwApi - OS::TripleO::Services::NeutronL3Agent - OS::TripleO::Services::NeutronLbaasv2Agent - OS::TripleO::Services::NeutronMetadataAgent - OS::TripleO::Services::NeutronML2FujitsuCfab - OS::TripleO::Services::NeutronML2FujitsuFossw - OS::TripleO::Services::NeutronOvsAgent - OS::TripleO::Services::NeutronVppAgent - OS::TripleO::Services::Ntp - OS::TripleO::Services::OpenDaylightOvs - OS::TripleO::Services::PacemakerRemote - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Docker - OS::TripleO::Services::Sshd ############################################################################### # Role: Compute # ############################################################################### - name: Compute description: | Basic Compute Node role CountDefault: 1 networks: - InternalApi - Tenant - Storage HostnameFormatDefault: '%stackname%-novacompute-%index%' disable_upgrade_deployment: True ServicesDefault: - OS::TripleO::Services::AuditD - OS::TripleO::Services::CACerts - OS::TripleO::Services::CephClient - OS::TripleO::Services::CephExternal - OS::TripleO::Services::CertmongerUser - OS::TripleO::Services::Collectd - OS::TripleO::Services::ComputeCeilometerAgent - OS::TripleO::Services::ComputeNeutronCorePlugin - OS::TripleO::Services::ComputeNeutronL3Agent - OS::TripleO::Services::ComputeNeutronMetadataAgent - OS::TripleO::Services::ComputeNeutronOvsAgent - OS::TripleO::Services::Docker - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::Iscsid - OS::TripleO::Services::Kernel - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::NeutronLinuxbridgeAgent - OS::TripleO::Services::NeutronSriovAgent - OS::TripleO::Services::NeutronVppAgent - OS::TripleO::Services::NovaCompute - OS::TripleO::Services::NovaLibvirt - OS::TripleO::Services::NovaMigrationTarget - OS::TripleO::Services::Ntp - OS::TripleO::Services::OpenDaylightOvs - OS::TripleO::Services::Securetty - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Sshd - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned - OS::TripleO::Services::Vpp - OS::TripleO::Services::OVNController ############################################################################### # Role: CephStorage # ############################################################################### - name: CephStorage description: | Ceph OSD Storage node role networks: - Storage - StorageMgmt ServicesDefault: - OS::TripleO::Services::AuditD - OS::TripleO::Services::CACerts - OS::TripleO::Services::CephOSD - OS::TripleO::Services::CertmongerUser - OS::TripleO::Services::Collectd - OS::TripleO::Services::Docker - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::Kernel - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::Ntp - OS::TripleO::Services::Securetty - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Sshd - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned Result: the deployment gets stuck. Looking on what it's stuck: (undercloud) [stack@undercloud-0 ~]$ heat resource-list -n5 overcloud|grep -v COMPLE WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ | AllNodesDeploySteps | 0cc75327-de9c-4e5f-b55b-2192ddead33b | OS::TripleO::PostDeploySteps | CREATE_IN_PROGRESS | 2017-08-28T18:55:19Z | overcloud | | ControllerDeployment_Step3 | e0b575e8-9f1c-42bb-b199-4fd90b13b2fd | OS::Heat::StructuredDeploymentGroup | CREATE_IN_PROGRESS | 2017-08-28T19:11:45Z | overcloud-AllNodesDeploySteps-mnw7npmaa72j | | 0 | 12b034ee-e017-4c05-ab92-8c3e344e9b4e | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2017-08-28T19:27:24Z | overcloud-AllNodesDeploySteps-mnw7npmaa72j-ControllerDeployment_Step3-vfhc4blmshe5 | +----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ (undercloud) [stack@undercloud-0 ~]$ heat deployment-show 12b034ee-e017-4c05-ab92-8c3e344e9b4e WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "IN_PROGRESS", "server_id": "7d3d7442-9278-4e0d-abe5-ed21609cf554", "config_id": "9fe0f172-fc74-475b-9c60-1f60a95cbe5d", "output_values": null, "creation_time": "2017-08-28T19:27:27Z", "input_values": { "update_identifier": "1503946498", "docker_puppet_debug": "", "role_name": "Controller", "step": 3, "bootstrap_server_id": "7d3d7442-9278-4e0d-abe5-ed21609cf554" }, "action": "CREATE", "status_reason": "Deploy data available", "id": "12b034ee-e017-4c05-ab92-8c3e344e9b4e" } (undercloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-----------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------------------+--------+------------+-------------+------------------------+ | e6dce180-53dd-427e-9ac9-e0a92dd8d511 | ceph-0 | ACTIVE | - | Running | ctlplane=192.168.24.9 | | fcfde688-40ab-4dfd-b152-5f1a718d551b | ceph-1 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | f9bbd65b-c781-4e6b-9912-dc536addfd6a | ceph-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 69938854-4baa-47cd-9932-ca2afbc0b2e6 | compute-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 7d3d7442-9278-4e0d-abe5-ed21609cf554 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | a03a4294-2460-4cbd-a50d-f16c91f4329a | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.7 | | e4b7b58d-ea00-4533-bf07-7cbf5b844d74 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | d91f4c00-a6c7-4b95-831e-6c4415ce0e5b | overcloud-database-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | e1c71bdf-43e4-48a1-b89a-5473a8818ab0 | overcloud-database-1 | ACTIVE | - | Running | ctlplane=192.168.24.15 | | 58aae23d-1088-48bb-8023-6b0c4caf0183 | overcloud-database-2 | ACTIVE | - | Running | ctlplane=192.168.24.12 | | 7009b922-6e9f-440d-973a-9403abd159e8 | overcloud-messaging-0 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | c9f3f417-680d-4b05-96dd-d0c8f0da92c1 | overcloud-messaging-1 | ACTIVE | - | Running | ctlplane=192.168.24.17 | | ad7e4458-c53e-4c01-8005-0c71cd103278 | overcloud-messaging-2 | ACTIVE | - | Running | ctlplane=192.168.24.24 | | b7aedbfb-08c2-42db-a598-ebe0a93a0de6 | overcloud-networker-0 | ACTIVE | - | Running | ctlplane=192.168.24.16 | | 51cfc60f-ca1b-47ea-8b02-8363d7f04ce7 | overcloud-networker-1 | ACTIVE | - | Running | ctlplane=192.168.24.19 | +--------------------------------------+-----------------------+--------+------------+-------------+------------------------+ So the task is on controller-0 Checking that node for errors: Many repeating error messages: Aug 28 19:55:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:55:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:56:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:56:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:56:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:56:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:56:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle Aug 28 19:56:00 controller-0 pengine[21737]: error: Couldn't expand haproxy-bundle_stop_0 to haproxy-bundle_stopped_0 in haproxy-bundle
I was able to debug this environment a bit today w/ Sasha. It appears that database syncs are failing. It looks to me that MySQL is running. I was able to attached to MySQL via localhost (mysql -u root) and verify all of the databases are getting created. But some of the db sync for services are timing out. I manually tried to connect and got this on the command line: mysql -u heat -h 172.17.1.13 -p3UazsaeTC64V9UvEcJ3GZ9rbd ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0 Double checked the HA proxy config and I see this is the correct VIP for MySQL. Could be related to firewall rules given that this deployment has the controller and database servers split out. On the controller I see: [root@controller-0 containers]# iptables-save | grep 3306 -A INPUT -p tcp -m multiport --dports 3306 -m state --state NEW -m comment --comment "100 mysql_haproxy ipv4" -j ACCEPT [root@overcloud-database-0 ~]# iptables-save | grep 3306 -A INPUT -p tcp -m multiport --dports 873,3123,3306,4444,4567,4568,9200 -m state --state NEW -m comment --comment "104 mysql galera-bundle ipv4" -j ACCEPT It would seem that both services are accepting 3306 traffic. Would be good to have someone from the HA team review these configs and see if they line up correctly.
I am travelling so am a bit slow to respond, but do we have sosreports around for this or a live env? From Dan's initial analysis at c#1 and the error messages, I would guess that haproxy is having some sort of issues (maybe the bundle is constantly restarting or what not). Sasha, if you can send me some env login or some sosreports I can investigate a bit more. (NB: I deploy composable HA on a daily basis with galera/rabbit split out to 6 separate nodes, so my best guess without more data would be that it is due to the fact that we do not have yet a new pacemaker build with all the needed bundle fixes, but I'd like to take a deeper look in any case)
Thanks Sasha! So the issue is that the clustercheck container is erroring out when talking to mysql and hence haproxy will refuse to accept connections on 3306 because all three backends are down. A) Cluster check not working [root@controller-2 log]# docker exec -it clustercheck /bin/bash ()[mysql@controller-2 /]$ mysql -h 127.0.0.1 -u clustercheck -pdrwh87rmM8KzWxyGcJWZ2TbGC ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111) B) Haproxy refusing connections [root@controller-2 log]# mysql -u heat -h 172.17.1.13 -p3UazsaeTC64V9UvEcJ3GZ9rbd ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0 C) Connections straight to mysql work correctly: [root@controller-2 log]# mysql -u heat -h 172.17.1.22 -p3UazsaeTC64V9UvEcJ3GZ9rbd MariaDB [(none)]> Bye [root@controller-2 log]# mysql -u heat -h 172.17.1.21 -p3UazsaeTC64V9UvEcJ3GZ9rbd MariaDB [(none)]> Bye [root@controller-2 log]# mysql -u heat -h 172.17.1.16 -p3UazsaeTC64V9UvEcJ3GZ9rbd MariaDB [(none)]> Bye The reason this is not working in this environment is that the clustercheck container needs to be always deployed on the database role. I will make sure that upstream this will be fixed. But for the time being you can just add OS::TripleO::Services::Clustercheck to your database role and remove it from the ControllerOpenstack role upstream
Hi Michele, Since the roles_data.yaml was prepared with "openstack overcloud roles generate" - I added this comment https://bugzilla.redhat.com/show_bug.cgi?id=1485108#c9 in the respective bug. Thanks.
Confirm that I was able to deploy successfully, once I moved the "OS::TripleO::Services::Clustercheck" to database role from controller.
pike review merged, moving to POST and linking the right review
Verified: Environment: openstack-tripleo-heat-templates-7.0.1-0.20170919183703.el7ost.noarch Clustercheck is added by default to the database role: ############################################################################### # Role: Database # ############################################################################### - name: Database description: | Standalone database role with the database being managed via Pacemaker networks: - InternalApi HostnameFormatDefault: '%stackname%-database-%index%' ServicesDefault: - OS::TripleO::Services::AuditD - OS::TripleO::Services::CACerts - OS::TripleO::Services::CertmongerUser - OS::TripleO::Services::Collectd - OS::TripleO::Services::Clustercheck - OS::TripleO::Services::Docker - OS::TripleO::Services::FluentdClient - OS::TripleO::Services::Kernel - OS::TripleO::Services::MySQL - OS::TripleO::Services::MySQLClient - OS::TripleO::Services::Ntp - OS::TripleO::Services::ContainersLogrotateCrond - OS::TripleO::Services::Pacemaker - OS::TripleO::Services::SensuClient - OS::TripleO::Services::Snmp - OS::TripleO::Services::Timezone - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::Tuned
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462