Bug 1464588

Summary: [osp11][minor update]update of non-ha overcloud stucked on controller update stage
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: rhosp-directorAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED NOTABUG QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: dbecker, lbezdick, mburns, morazi, ohochman, rhel-osp-director-maint, sasha, sathlang, shardy, tvignaud
Target Milestone: rcKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-18 09:09:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artem Hrechanychenko 2017-06-23 20:49:59 UTC
Description of problem:
Minor update of overcloud to osp11 + rhel7.4 testing repos + local mirrors stucked.
From controller /var/log/messages

Jun 23 16:30:24 controller-0 snmpd[72072]: Connection from UDP: [192.168.24.1]:42353->[192.168.24.10]:161
Jun 23 16:30:53 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:31:25 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:31:57 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:32:29 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:33:01 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:33:33 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:34:05 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:34:37 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:35:09 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:35:41 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:36:13 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:36:45 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:37:17 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:37:49 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:38:21 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:38:53 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:39:25 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:39:57 controller-0 proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[a59afc1f-2efd-4969-a268-a28450ccbb1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
Jun 23 16:40:17 controller-0 snmpd[72072]: Connection from UDP: [192.168.24.1]:48634->[192.168.24.10]:161



[heat-admin@controller-0 ~]$ systemctl --failed
  UNIT                              LOAD   ACTIVE SUB    DESCRIPTION
● dhcp-interface loaded failed failed DHCP interface ovs-system
● haproxy.service                   loaded failed failed HAProxy Load Balancer
● openstack-cinder-volume.service   loaded failed failed OpenStack Cinder Volume Server


[heat-admin@controller-0 ~]$ sudo systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-06-23 13:08:00 UTC; 7h ago
 Main PID: 31616 (code=exited, status=143)

Jun 23 13:07:29 controller-0.localdomain haproxy[116940]: Connect from 172.17.1.14:60200 to 172.17.1.18:3306 (mysql/TCP)
Jun 23 13:07:58 controller-0.localdomain haproxy[116940]: Connect from 172.17.1.21:56104 to 172.17.1.18:8778 (nova_placement/HTTP)
Jun 23 13:07:58 controller-0.localdomain haproxy[116940]: Connect from 192.168.24.14:43458 to 192.168.24.14:35357 (keystone_admin/HTTP)
Jun 23 13:08:00 controller-0.localdomain systemd[1]: Stopping HAProxy Load Balancer...
Jun 23 13:08:00 controller-0.localdomain haproxy-systemd-wrapper[31616]: haproxy-systemd-wrapper: SIGTERM -> 116940.
Jun 23 13:08:00 controller-0.localdomain haproxy-systemd-wrapper[31616]: haproxy-systemd-wrapper: exit, haproxy RC=143
Jun 23 13:08:00 controller-0.localdomain systemd[1]: haproxy.service: main process exited, code=exited, status=143/n/a
Jun 23 13:08:00 controller-0.localdomain systemd[1]: Stopped HAProxy Load Balancer.
Jun 23 13:08:00 controller-0.localdomain systemd[1]: Unit haproxy.service entered failed state.
Jun 23 13:08:00 controller-0.localdomain systemd[1]: haproxy.service failed.
[heat-admin@controller-0 ~]$ 


Jun 23 13:08:00 controller-0.localdomain pengine[15325]:   notice: Stop    haproxy:0        (controller-0)
Jun 23 13:08:00 controller-0.localdomain crmd[15326]:   notice: Initiating stop operation haproxy_stop_0 locally on controller-0
Jun 23 13:08:00 controller-0.localdomain haproxy-systemd-wrapper[31616]: haproxy-systemd-wrapper: SIGTERM -> 116940.
Jun 23 13:08:00 controller-0.localdomain haproxy-systemd-wrapper[31616]: haproxy-systemd-wrapper: exit, haproxy RC=143
Jun 23 13:08:00 controller-0.localdomain systemd[1]: haproxy.service: main process exited, code=exited, status=143/n/a
Jun 23 13:08:00 controller-0.localdomain systemd[1]: Unit haproxy.service entered failed state.
Jun 23 13:08:00 controller-0.localdomain systemd[1]: haproxy.service failed.
Jun 23 13:08:02 controller-0.localdomain crmd[15326]:   notice: Result of stop operation for haproxy on controller-0: 0 (ok)
Jun 23 13:13:14 controller-0.localdomain yum[347498]: Updated: haproxy-1.5.18-6.el7.x86_64


heat-admin@controller-0 ~]$ yum -v repolist
Not loading "rhnplugin" plugin, as it is disabled
Loading "product-id" plugin
Loading "search-disabled-repos" plugin
Loading "subscription-manager" plugin
Not root, Subscription Management repositories not updated
Config time: 0.083
Yum version: 3.4.3
rhelosp-11.0-devtools-puddle                                                                                                                                                                                                              4/4
rhelosp-11.0-puddle                                                                                                                                                                                                                   737/737
Setting up Package Sacks
pkgsack time: 0.005
Repo-id      : rhelosp-11.0-ceph-2.0-mon/x86_64
Repo-name    : Ceph 2.0 MON
Repo-revision: 1497883293
Repo-updated : Mon Jun 19 14:41:33 2017
Repo-pkgs    : 133
Repo-size    : 729 M
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/pulp/content/dist/rhel/server/7/7Server/x86_64/ceph-mon/2/os/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-11.repo

Repo-id      : rhelosp-11.0-ceph-2.0-osd/x86_64
Repo-name    : Ceph 2.0 OSD
Repo-revision: 1497883292
Repo-updated : Mon Jun 19 14:41:32 2017
Repo-pkgs    : 115
Repo-size    : 672 M
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/pulp/content/dist/rhel/server/7/7Server/x86_64/ceph-osd/2/os/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-11.repo

Repo-id      : rhelosp-11.0-ceph-2.0-tools/x86_64
Repo-name    : Ceph 2.0 Tools
Repo-revision: 1497883292
Repo-updated : Mon Jun 19 14:41:32 2017
Repo-pkgs    : 153
Repo-size    : 230 M
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/pulp/content/dist/rhel/server/7/7Server/x86_64/ceph-tools/2/os/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-11.repo

Repo-id      : rhelosp-11.0-devtools-puddle/x86_64
Repo-name    : RHOS-11.0
Repo-revision: 1497972122
Repo-updated : Tue Jun 20 15:22:03 2017
Repo-pkgs    : 4
Repo-size    : 1.2 M
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/rcm-guest/puddles/OpenStack/11.0-RHEL-7/2017-06-20.2/RH7-RHOS-DEVTOOLS-11.0/x86_64/os
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-11.repo

Repo-id      : rhelosp-11.0-puddle/x86_64
Repo-name    : RHOS-11.0
Repo-revision: 1497972077
Repo-updated : Tue Jun 20 15:21:56 2017
Repo-pkgs    : 737
Repo-size    : 2.0 G
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/rcm-guest/puddles/OpenStack/11.0-RHEL-7/2017-06-20.2/RH7-RHOS-11.0/x86_64/os
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-11.repo

Repo-id      : rhelosp-rhel-7.4-extras/x86_64
Repo-name    : Red Hat Enterprise Linux 7Server - x86_64 - Extras
Repo-revision: 1498209942
Repo-updated : Fri Jun 23 09:25:41 2017
Repo-pkgs    : 1
Repo-size    : 452 k
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/composes/nightly/EXTRAS-RHEL-7.4/latest-EXTRAS-7-RHEL-7/compose/Server/x86_64/os/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-rhel-7.4.repo

Repo-id      : rhelosp-rhel-7.4-ha/x86_64
Repo-name    : Red Hat Enterprise Linux 7Server - x86_64 - HA
Repo-revision: 1498067830
Repo-updated : Wed Jun 21 17:57:10 2017
Repo-pkgs    : 35
Repo-size    : 13 M
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/composes/nightly/latest-RHEL-7/compose/Server/x86_64/os/addons/HighAvailability/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-rhel-7.4.repo

Repo-id      : rhelosp-rhel-7.4-server/x86_64
Repo-name    : Red Hat Enterprise Linux 7Server - x86_64 - Server
Repo-revision: 1498067805
Repo-updated : Wed Jun 21 17:56:45 2017
Repo-pkgs    : 5,142
Repo-size    : 3.7 G
Repo-baseurl : http://rhos-qe-mirror-qeos.usersys.redhat.com/composes/nightly/latest-RHEL-7/compose/Server/x86_64/os/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release-rhel-7.4.repo

Repo-id      : rhos-release
Repo-name    : RHOS Release
Repo-revision: 1498149399
Repo-updated : Thu Jun 22 16:36:43 2017
Repo-pkgs    : 165
Repo-size    : 3.0 M
Repo-baseurl : http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:45 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release.repo

Repo-id      : rhos-release-extras/7Server
Repo-name    : RHOS Release Extras
Repo-revision: 1443035482
Repo-updated : Wed Sep 23 19:11:23 2015
Repo-pkgs    : 2
Repo-size    : 655 k
Repo-baseurl : http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/extras/7Server
Repo-expire  : 21,600 second(s) (last: Fri Jun 23 13:01:46 2017)
  Filter     : read-only:present
Repo-filename: /etc/yum.repos.d/rhos-release.repo

repolist: 6,487
[heat-admin@controller-0 ~]$ 


Version-Release number of selected component (if applicable):
osp11 -> osp11+ rhel7.4 testing repos

How reproducible:


Steps to Reproduce:
1.deploy undercloud and overcloud osp11 using infrared
infrared virsh -v --host-address $HOST--host-key ~/.ssh/id_rsa --cleanup yes && infrared virsh -v --host-address $HOST --host-key ~/.ssh/id_rsa  --topology-nodes undercloud:1,controller:1,compute:1  -e  override.controller.cpu=6 -e override.controller.memory=16384 -e  override.undercloud.disks.disk1.size=100G && infrared tripleo-undercloud --version 11 --images-task=rpm && infrared tripleo-overcloud -v --introspect yes --tagging yes --post no --deployment-files virt --version 11 --deploy yes


2.minor update of undercloud node using local mirrors
ir tripleo-undercloud -v --update-undercloud yes --mirror qeos --build 7.4-testing --osrelease 7.4

3. update overcloud using ir and local mirrors
ir tripleo-overcloud -v --updateto 7.4-testing --deployment-files virt --mirror qeos --osrelease 7.4



Actual results:
overcloud update stucked

Expected results:
overcloud update succeeded 

Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2017-06-23 20:50:05 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Artem Hrechanychenko 2017-06-26 05:29:25 UTC
Update failed after 4 hours:
cmd: source ~/stackrc ;
 openstack stack failures list overcloud

start: 2017-06-25 22:57:03.016532

end: 2017-06-25 22:57:19.857321

delta: 0:00:16.840789

stdout: overcloud.Controller.0.UpdateDeployment:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 1951898e-23d8-4eb1-8c85-bbe787f9e4c0
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted
  deploy_stdout: |
    Started yum_update.sh on server 39200262-621f-427d-98cf-229b49140c83 at Sun Jun 25 17:50:56 EDT 2017
    Not running due to unset update_identifier
  deploy_stderr: |

overcloud.Compute.0:
  resource_type: OS::TripleO::Compute
  physical_resource_id: 1a9fa9f1-0049-4231-afc2-cf091e0c2f27
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted

[[ previous task time: 4:01:33.617169 = 14493.62s / 14808.38s ]]

Comment 3 Steven Hardy 2017-06-26 09:37:34 UTC
I had a look at the logs, and it seems rabbit crashed early in the update (or perhaps even before the update was started), with errors like:

=ERROR REPORT==== 25-Jun-2017::22:59:31 ===
Error on AMQP connection <0.4605.0> (172.17.1.13:47914 -> 172.17.1.13:5672 - neutron-server:109031:f8f7b06a-abd1-463e-928a-406867f0a948, vhost: '/', user: 'guest', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"


The update then got stuck because the yum update triggers service restarts, which failed because rabbit wasn't working.

So the question is why did rabbit fail, and was it working before the update was attempted?

Comment 4 Lukas Bezdicka 2017-06-26 09:51:12 UTC
This is quite common for openstack services, we should be able to handle rabbitmq failure and die gracefully too.

Comment 5 Artem Hrechanychenko 2017-06-26 14:04:30 UTC
before update rabbit worked as expected.
During updated was shut-down and didn't recoveded
http://pastebin.test.redhat.com/497588

Comment 6 Lukas Bezdicka 2017-06-26 14:56:39 UTC
Deployment is with pacemaker which means we shut down pacemaker services on the controller node that runs yum but at it is single controller this efectively takes down rabbitmq and mysql.

[root@controller-0 ~]# pcs status
Error: cluster is not currently running on this node

Doing so causes openstack services to loop on AMQP and mariadb and even though they get shutdown request they don't stop nor they restart.

pcs cluster start and pcs resource cleanup could fix this but I'm pretty sure galera won't survive it anyway. Is this supported by HA/pacemaker team?

Comment 7 Artem Hrechanychenko 2017-06-27 08:04:53 UTC
Seventh attempt of redeployment and updates was succeed.

Comment 8 Artem Hrechanychenko 2017-06-27 21:13:13 UTC
again stucked
no w/a.
blocked us in testing non-ha upgrade from osp11->osp112


2017-06-27 21:04:47.617 80982 ERROR heat.engine.service [req-2701d8db-688b-4696-be24-6199b41380c2 - - - - -] Service 62b80531-5a95-4554-9339-ef8d51b1be85 update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:47.617 80979 ERROR heat.engine.service [req-aa1cfb5a-07a8-4bfb-a9cb-77832046e1bc - - - - -] Service 70739846-253a-4816-bde7-8fa639862eae update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:47.618 80981 ERROR heat.engine.service [req-99cf4795-e6c6-46d0-8255-baae368813b7 - - - - -] Service 8eb3f8ef-06d0-4d8b-9686-62f737815ce5 update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:50.621 80984 ERROR heat.engine.service [req-75774a56-aca8-4346-bf85-d3d43c4dec6c - - - - -] Service d50eafaa-2331-4ac7-bafe-db344f13b2b2 update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:50.621 80980 ERROR heat.engine.service [req-431f04f5-28d2-4425-921b-f2a40c9f9edd - - - - -] Service 64658e0c-1d87-4d87-ac9b-16b3b1b31b91 update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:50.621 80985 ERROR heat.engine.service [req-cba05be3-4214-494b-9b68-9b488467442f - - - - -] Service adf56641-803d-49f0-a264-ba88356dfa27 update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:50.621 80978 ERROR heat.engine.service [req-2a580256-f07c-4d30-80fd-7850e663bdfa - - - - -] Service 1035c73f-d520-478b-8e66-8c6ac3ccf4d3 update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:50.621 80983 ERROR heat.engine.service [req-e4615cf4-798c-4104-a70e-afaf259d8916 - - - - -] Service 6dafb60a-54ee-4cec-b1eb-1430441b9b9b update failed: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.15' ([Errno 113] EHOSTUNREACH)")
2017-06-27 21:04:59.785 80978 ERROR oslo.messaging._drivers.impl_rabbit [-] [eea6961b-ffb3-45a8-a4b7-23fda988df3e] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45358
2017-06-27 21:04:59.786 80978 ERROR oslo.messaging._drivers.impl_rabbit [-] [6f5c7f3a-d0be-4b61-aa37-1058f12b7353] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:04:59.787 80978 ERROR oslo.messaging._drivers.impl_rabbit [-] [1cd0b13e-86b8-47e9-861f-6cff12f3149f] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45386
2017-06-27 21:04:59.792 80983 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd3be3e0-6288-47ea-bde9-ee01b5ec3832] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45384
2017-06-27 21:04:59.793 80983 ERROR oslo.messaging._drivers.impl_rabbit [-] [a533e0e3-6577-40e5-93fd-c7d74c649b1c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45364
2017-06-27 21:04:59.794 80983 ERROR oslo.messaging._drivers.impl_rabbit [-] [350b7603-6228-465e-bec8-012bdbc6131b] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45350
2017-06-27 21:04:59.836 80984 ERROR oslo.messaging._drivers.impl_rabbit [-] [71f28f53-986c-49de-b8e6-1b1f26a3fcf0] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45348
2017-06-27 21:04:59.836 80984 ERROR oslo.messaging._drivers.impl_rabbit [-] [c8e685ec-894a-45d6-95f8-e56372372b86] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45368
2017-06-27 21:04:59.837 80984 ERROR oslo.messaging._drivers.impl_rabbit [-] [6e5eb076-0e91-4a76-a6e2-669335cf40c3] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45382
2017-06-27 21:04:59.901 80982 ERROR oslo.messaging._drivers.impl_rabbit [-] [6da7814b-aebf-4503-9bf1-c4d89b65b0f8] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45374
2017-06-27 21:04:59.902 80980 ERROR oslo.messaging._drivers.impl_rabbit [-] [a47deabb-c688-4c91-88a2-5249b2ea7b8c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45344
2017-06-27 21:04:59.902 80982 ERROR oslo.messaging._drivers.impl_rabbit [-] [32b02e87-6008-42b9-b0af-fe20ebb06212] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45360
2017-06-27 21:04:59.902 80980 ERROR oslo.messaging._drivers.impl_rabbit [-] [d80d5bb1-675e-446c-b9df-37005bbba66b] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:04:59.903 80982 ERROR oslo.messaging._drivers.impl_rabbit [-] [f1ae847e-d26f-4f50-b82d-93607d63fdc5] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45346
2017-06-27 21:04:59.903 80980 ERROR oslo.messaging._drivers.impl_rabbit [-] [e3584be9-76b4-4081-9e90-bbd79febdb7c] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45362
2017-06-27 21:04:59.907 80981 ERROR oslo.messaging._drivers.impl_rabbit [-] [4aff3f62-64d6-4235-b4f3-55102e87b8c0] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:04:59.921 80981 ERROR oslo.messaging._drivers.impl_rabbit [-] [a9b8fea3-88b8-4f1c-850a-26889d2163d0] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:04:59.922 80981 ERROR oslo.messaging._drivers.impl_rabbit [-] [ad9d7ac4-a016-47ac-82eb-84f015bfdfd5] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45342
2017-06-27 21:05:00.038 80985 ERROR oslo.messaging._drivers.impl_rabbit [-] [d0376174-50da-41e7-a298-f3d09fafb1f4] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45352
2017-06-27 21:05:00.039 80979 ERROR oslo.messaging._drivers.impl_rabbit [-] [2fd77107-0f57-4700-b09f-e00ea9e61366] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 45366
2017-06-27 21:05:00.047 80985 ERROR oslo.messaging._drivers.impl_rabbit [-] [6f1dba1a-906f-4405-885a-8cce351acc7b] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:05:00.047 80979 ERROR oslo.messaging._drivers.impl_rabbit [-] [d4a2ac11-8b30-4c0a-9e5b-901e5e3c5d06] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:05:00.055 80979 ERROR oslo.messaging._drivers.impl_rabbit [-] [2503334f-a607-4b04-8f26-698debe70e1d] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
2017-06-27 21:05:00.055 80985 ERROR oslo.messaging._drivers.impl_rabbit [-] [fc6d1799-4576-4890-9a65-10a2c9289845] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None

Comment 9 Lukas Bezdicka 2017-06-28 13:07:21 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1464588#c6 last time it passed only because of running pcs cluster start and pcs resource cleanup during yum update on controller. It will _always_ fail otherwise.

Comment 10 Artem Hrechanychenko 2017-06-29 09:10:53 UTC
Confirmed. pcs cluster start and pcs resource cleanup during yum update on controller helps

Comment 11 Sofer Athlan-Guyot 2017-08-18 09:09:32 UTC
Hi Artem,

as Lukas pointed out, non-ha with pacemaker is not supported unless you do some manual workaround.  It's more a quick dev platform.

I'm closing this as not a bug, but if you still think this should be support then we can have it as an RFE for next release I guess tracked in its own bz.

Thanks,

Comment 12 Sofer Athlan-Guyot 2017-08-18 10:26:23 UTC
*** Bug 1463287 has been marked as a duplicate of this bug. ***