Description of problem: I was able to do a full OSP12 deployment without Ceph, not I retry with Ceph enabled. My deployment fails because cinder-manage db sync reachs a timeout. pcs status gives my that: Failed Actions: * haproxy-bundle-docker-0_start_0 on overcloud-controller-0 'unknown error' (1): call=105, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:07 2018', queued=0ms, exec=1313ms * haproxy-bundle-docker-2_start_0 on overcloud-controller-0 'unknown error' (1): call=109, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:10 2018', queued=0ms, exec=1298ms * haproxy-bundle-docker-1_start_0 on overcloud-controller-0 'unknown error' (1): call=107, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:09 2018', queued=0ms, exec=1363ms * haproxy-bundle-docker-0_start_0 on overcloud-controller-2 'unknown error' (1): call=107, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:10 2018', queued=0ms, exec=1318ms * haproxy-bundle-docker-1_start_0 on overcloud-controller-2 'unknown error' (1): call=109, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:12 2018', queued=0ms, exec=1323ms * haproxy-bundle-docker-2_start_0 on overcloud-controller-2 'unknown error' (1): call=105, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:07 2018', queued=0ms, exec=1326ms * haproxy-bundle-docker-2_start_0 on overcloud-controller-1 'unknown error' (1): call=109, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:12 2018', queued=0ms, exec=1322ms * haproxy-bundle-docker-1_start_0 on overcloud-controller-1 'unknown error' (1): call=105, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:07 2018', queued=0ms, exec=1328ms * haproxy-bundle-docker-0_start_0 on overcloud-controller-1 'unknown error' (1): call=107, status=complete, exitreason='Newly created docker container exited after start', last-rc-change='Fri Jan 26 19:52:09 2018', queued=0ms, exec=1310m [root@overcloud-controller-0 log]# pcs resource show haproxy-bundle Bundle: haproxy-bundle Docker: image=192.168.249.8:5000/rhosp12/openstack-haproxy:pcmklatest network=host options="--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS" replicas=3 run-command="/bin/bash /usr/local/bin/kolla_start" Storage Mapping: options=ro source-dir=/var/lib/kolla/config_files/haproxy.json target-dir=/var/lib/kolla/config_files/config.json (haproxy-cfg-files) options=ro source-dir=/var/lib/config-data/puppet-generated/haproxy/ target-dir=/var/lib/kolla/config_files/src (haproxy-cfg-data) options=ro source-dir=/etc/hosts target-dir=/etc/hosts (haproxy-hosts) options=ro source-dir=/etc/localtime target-dir=/etc/localtime (haproxy-localtime) options=ro source-dir=/etc/pki/ca-trust/extracted target-dir=/etc/pki/ca-trust/extracted (haproxy-pki-extracted) options=ro source-dir=/etc/pki/tls/certs/ca-bundle.crt target-dir=/etc/pki/tls/certs/ca-bundle.crt (haproxy-pki-ca-bundle-crt) options=ro source-dir=/etc/pki/tls/certs/ca-bundle.trust.crt target-dir=/etc/pki/tls/certs/ca-bundle.trust.crt (haproxy-pki-ca-bundle-trust-crt) options=ro source-dir=/etc/pki/tls/cert.pem target-dir=/etc/pki/tls/cert.pem (haproxy-pki-cert) options=rw source-dir=/dev/log target-dir=/dev/log (haproxy-dev-log) [root@overcloud-controller-0 log]# docker logs haproxy-bundle Error: No such container: haproxy-bundle I've nothing in /var/log/containers: [root@overcloud-controller-0 containers]# find /var/log/containers/ -type f /var/log/containers/horizon/horizon.log /var/log/containers/memcached/memcached.log [root@overcloud-controller-0 containers]# cat /var/log/containers/horizon/horizon.log [root@overcloud-controller-0 containers]# cat /var/log/containers/memcached/memcached.log [root@overcloud-controller-0 containers]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f2e1cdba9d8c 192.168.249.8:5000/rhosp12/openstack-redis:pcmklatest "/bin/bash /usr/local" About an hour ago Up About an hour redis-bundle-docker-0 46ca7063f3a2 192.168.249.8:5000/rhosp12/openstack-haproxy:latest "/bin/bash -c 'cp -a " About an hour ago Exited (0) About an hour ago haproxy_init_bundle 209388242c1b 192.168.249.8:5000/rhosp12/openstack-redis:latest "/bin/bash -c 'cp -a " About an hour ago Exited (0) About an hour ago redis_init_bundle 3d46fed9ec53 192.168.249.8:5000/rhosp12/openstack-mariadb:pcmklatest "/bin/bash /usr/local" About an hour ago Up About an hour galera-bundle-docker-0 6dce70e18d54 192.168.249.8:5000/rhosp12/openstack-rabbitmq:pcmklatest "/bin/bash /usr/local" About an hour ago Up About an hour (healthy) rabbitmq-bundle-docker-0 3a03dbbc8a81 192.168.249.8:5000/rhosp12/openstack-mariadb:latest "/bin/bash -c 'cp -a " About an hour ago Exited (0) About an hour ago mysql_init_bundle e46016adccff 192.168.249.8:5000/rhosp12/openstack-mariadb:latest "kolla_start" About an hour ago Up About an hour clustercheck 5f2cbff5bafe 192.168.249.8:5000/rhosp12/openstack-aodh-api:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago aodh_init_log 7aa812aa4511 192.168.249.8:5000/rhosp12/openstack-panko-api:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago panko_init_log bec9d5d9805c 192.168.249.8:5000/rhosp12/openstack-horizon:latest "/bin/bash -c 'touch " About an hour ago Exited (0) About an hour ago horizon_fix_perms d875072cbe0b 192.168.249.8:5000/rhosp12/openstack-keystone:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago keystone_init_log 6e0ba98027eb 192.168.249.8:5000/rhosp12/openstack-glance-api:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago glance_init_logs 88fc6f1844eb 192.168.249.8:5000/rhosp12/openstack-heat-engine:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago heat_init_log 6fcf97a39c83 192.168.249.8:5000/rhosp12/openstack-nova-api:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago nova_init_logs 454c3bdc3c17 192.168.249.8:5000/rhosp12/openstack-rabbitmq:latest "/bin/bash -c 'cp -a " About an hour ago Exited (0) About an hour ago rabbitmq_init_bundle b9cd2596f74e 192.168.249.8:5000/rhosp12/openstack-gnocchi-api:latest "/bin/bash -c 'chown " About an hour ago Exited (0) About an hour ago gnocchi_init_log 4a2b2cbb4e3c registry.access.redhat.com/rhceph/rhceph-2-rhel7:latest "/entrypoint.sh" About an hour ago Up About an hour ceph-rgw-overcloud-controller-0 acd69513e728 registry.access.redhat.com/rhceph/rhceph-2-rhel7:latest "/entrypoint.sh" About an hour ago Up About an hour ceph-mon-overcloud-controller-0 8371566df993 192.168.249.8:5000/rhosp12/openstack-mariadb:latest "/bin/bash -c '/usr/b" About an hour ago Exited (0) About an hour ago mysql_image_tag fb2ab111c595 192.168.249.8:5000/rhosp12/openstack-memcached:latest "/bin/bash -c 'source" About an hour ago Up About an hour memcached 8af6da99bfad 192.168.249.8:5000/rhosp12/openstack-haproxy:latest "/bin/bash -c '/usr/b" About an hour ago Exited (0) 16 minutes ago haproxy_image_tag 1be6aa4060dc 192.168.249.8:5000/rhosp12/openstack-mariadb:latest "bash -ecx 'if [ -e /" About an hour ago Exited (0) About an hour ago mysql_bootstrap a2b23bc1e701 192.168.249.8:5000/rhosp12/openstack-redis:latest "/bin/bash -c '/usr/b" About an hour ago Exited (0) About an hour ago redis_image_tag 43027667f6a9 192.168.249.8:5000/rhosp12/openstack-rabbitmq:latest "/bin/bash -c '/usr/b" About an hour ago Exited (0) About an hour ago rabbitmq_image_tag 3734fa7c3ef2 192.168.249.8:5000/rhosp12/openstack-rabbitmq:latest "kolla_start" About an hour ago Exited (0) About an hour ago rabbitmq_bootstrap cb7f05492c04 192.168.249.8:5000/rhosp12/openstack-memcached:latest "/bin/bash -c 'source" About an hour ago Exited (0) About an hour ago memcached_init_logs 3a69051db3c7 192.168.249.8:5000/rhosp12/openstack-mariadb:latest "chown -R mysql: /var" About an hour ago Exited (0) About an hour ago mysql_data_ownership "docker logs haproxy_init_bundle" show me the log of a successful run of puppet. [root@overcloud-controller-0 log]# cat /var/log/cinder/cinder-manage.log 2018-01-26 19:54:03.876 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:54:16.895 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:54:29.911 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:54:42.928 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:54:55.945 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:55:08.963 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -6 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:55:21.979 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -7 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:55:34.996 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -8 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:55:48.011 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -9 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:56:01.027 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -10 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:56:14.043 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -11 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:56:27.059 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -12 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:56:40.076 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -13 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:56:53.092 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -14 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:57:06.099 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -15 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:57:19.115 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -16 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:57:32.131 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -17 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:57:45.148 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -18 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:57:58.163 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -19 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:58:10.514 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -20 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:58:23.522 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -21 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:58:36.540 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -22 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") 2018-01-26 19:58:49.548 122212 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -23 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '192.168.140.252' ([Errno 113] No route to host)") [root@overcloud-controller-0 log]# ping -c1 192.168.140.252 PING 192.168.140.252 (192.168.140.252) 56(84) bytes of data. ^C --- 192.168.140.252 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms [root@overcloud-controller-0 log]# ip a |grep 192.168.140 inet 192.168.140.152/24 brd 192.168.140.255 scope global vlan140 Any advice would be welcome.
this needs investigation by PIDONE as to why pacemaker failed to configure/start haproxy
It is not clear if it is an overcloud update we're talking about or not. Was this an overcloud in state CREATE_COMPLETE that was updated adding ceph and then failed, or not? Is it reproducible? Do we have sosreports of the nodes?
The overcloud deployment has failed with this on the #1 controller: "Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout", "Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout",
Do you have any idea of where I should start to investigate?
I have managed to reproduce the same problem with puddle 2018-01-26.2.
So after digging enough around the journal we can see the following: <7>haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ds [ALERT] 030/153522 (27) : Starting proxy ceph_rgw: cannot bind socket [192.168.170.251:8080] [ALERT] 030/153522 (27) : Starting proxy ceph_rgw: cannot bind socket [192.168.249.251:8080] <5>haproxy-systemd-wrapper: exit, haproxy RC=1 The problem is that haproxy is configured to proxy rados gw connections via: listen ceph_rgw bind 192.168.170.251:8080 transparent bind 192.168.249.251:8080 transparent http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } option httpchk HEAD / server overcloud-controller-0.storage.fv3.net 192.168.170.158:8080 check fall 5 inter 2000 rise 2 server overcloud-controller-1.storage.fv3.net 192.168.170.161:8080 check fall 5 inter 2000 rise 2 server overcloud-controller-2.storage.fv3.net 192.168.170.157:8080 check fall 5 inter 2000 rise 2 But radosgw is binding to those two IPs up there as well (192.168.170.251:8080 and 192.168.249.251:8080). In fact it is listening to all IPs (which kind of violates network-isolation): [root@overcloud-controller-0 audit]# ss -tnlp | grep 8080 LISTEN 0 128 *:8080 *:* users:(("radosgw",pid=56628,fd=70)) I am no ceph expert, but the ceph.conf that is being mapped inside the ceph rgw container seems to imply that it should be binding on some IP (see rgw frontends snippets)?: global] cluster network = 192.168.180.0/24 fsid = 785dbf60-05f2-11e8-918f-5254008eef51 journal_collocation = False journal_size = 10000 mon host = 192.168.170.157,192.168.170.158,192.168.170.161 mon initial members = overcloud-controller-2,overcloud-controller-0,overcloud-controller-1 osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 3 public network = 192.168.170.0/24 raw_multi_journal = True rgw_keystone_accepted_roles = Member, _member_, admin rgw_keystone_admin_domain = default rgw_keystone_admin_password = QceDmgCgm7wNw7BEgyEWAzXZj rgw_keystone_admin_project = service rgw_keystone_admin_user = swift rgw_keystone_api_version = 3 rgw_keystone_url = http://192.168.140.252:5000 rgw_s3_auth_use_keystone = true [client.rgw.overcloud-controller-2] host = overcloud-controller-2 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-2/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-2.log rgw frontends = civetweb port=192.168.170.157:8080 num_threads=100 [client.rgw.overcloud-controller-1] host = overcloud-controller-1 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-1/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-1.log rgw frontends = civetweb port=192.168.170.161:8080 num_threads=100 [client.rgw.overcloud-controller-0] host = overcloud-controller-0 keyring = /var/lib/ceph/radosgw/ceph-rgw.overcloud-controller-0/keyring log file = /var/log/ceph/ceph-rgw-overcloud-controller-0.log rgw frontends = civetweb port=192.168.170.158:8080 num_threads=100 Moving to the Ceph dfg as this binding seems to be the root cause here.
*** This bug has been marked as a duplicate of bug 1509584 ***