Bug 1542936

Summary: [UPDATES] Docker restart during minor update causes issues to containers
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: puppet-tripleoAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: augol, dciabrin, emacchi, jjoyce, jschluet, lbezdick, mbracho, mbultel, mburns, rbartal, slinaber, tvignaud
Target Milestone: z2Keywords: Regression, Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-7.4.8-4.el7ost openstack-tripleo-heat-templates-7.0.9-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 17:28:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1545356    
Bug Blocks:    

Description Yurii Prokulevych 2018-02-07 12:00:37 UTC
Description of problem:
-----------------------
Minor update of RHOS-12 failed with next message:

Error: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
Error: Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Exec[galera-ready]/returns: change from notrun to 0 failed: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
Info: Class[Tripleo::Profile::Pacemaker::Database::Mysql_bundle]: Unscheduling all events on Class[Tripleo::Profile::Pacemaker::Database::Mysql_bundle]
Info: Creating state file /var/lib/puppet/state/state.yaml
Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)

From /var/log/messages it seems docker.service was restarted:
-------------------------------------------------------------
Feb  6 09:06:29 localhost puppet-user[471365]:   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Feb  6 09:06:29 localhost puppet-user[471365]: Compiled catalog for database-0.localdomain in environment production in 2.29 seconds
Feb  6 09:06:30 localhost puppet-user[471365]: (/Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-registry]/returns) executed successfully
Feb  6 09:06:30 localhost systemd: Stopping Docker Application Container Engine...
Feb  6 09:06:30 localhost dockerd-current: time="2018-02-06T09:06:30.385375196-05:00" level=info msg="Processing signal 'terminated'"
Feb  6 09:06:30 localhost dockerd-current: time="2018-02-06T09:06:30.448329984-05:00" level=warning msg="libcontainerd: container b6ef627065e65634d7f7c5ca514b8a238a9c5b5a6045ab50dbc315c1e75d212c restart canceled
"
Feb  6 09:06:40 localhost dockerd-current: time="2018-02-06T09:06:40.388903578-05:00" level=info msg="Container f956f7b5eab67a10b37e589d3f320c6de6cc98fc6e121849470618e47c21de7f failed to exit within 10 seconds o
f signal 15 - using the force"
Feb  6 09:06:40 localhost crmd[468289]:   error: Unexpected disconnect on remote-node galera-bundle-0
Feb  6 09:06:40 localhost crmd[468289]:   error: Result of monitor operation for galera-bundle-0 on database-0: Error
Feb  6 09:06:40 localhost dockerd-current: time="2018-02-06T09:06:40.562707692-05:00" level=info msg="stopping containerd after receiving terminated"
Feb  6 09:06:40 localhost crmd[468289]:  notice: Node galera-bundle-0 state is now lost
Feb  6 09:06:40 localhost crmd[468289]:  notice: Result of stop operation for galera-bundle-0 on database-0: 0 (ok)
Feb  6 09:06:40 localhost attrd[468287]:  notice: Removing all galera-bundle-0 attributes for peer database-0
Feb  6 09:06:40 localhost crmd[468289]:  notice: Result of stop operation for galera-bundle-docker-0 on database-0: 0 (ok)
Feb  6 09:06:40 localhost attrd[468287]:  notice: Removing all galera-bundle-0 attributes for peer messaging-1
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: INFO: checking for nsenter, which is required when 'monitor_cmd' is specified
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: NOTICE: Image (192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest) does not exist locally but will be pulled during start
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: NOTICE: Beginning pull of image, 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: ERROR: failed to pull image 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest
Feb  6 09:06:40 localhost lrmd[468286]:  notice: galera-bundle-docker-0_start_0:472094:stderr [ Cannot connect to the Docker daemon. Is the docker daemon running on this host? ]
Feb  6 09:06:40 localhost lrmd[468286]:  notice: galera-bundle-docker-0_start_0:472094:stderr [ Cannot connect to the Docker daemon. Is the docker daemon running on this host? ]
Feb  6 09:06:40 localhost lrmd[468286]:  notice: galera-bundle-docker-0_start_0:472094:stderr [ ocf-exit-reason:failed to pull image 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest ]
Feb  6 09:06:40 localhost crmd[468289]:  notice: Result of start operation for galera-bundle-docker-0 on database-0: 1 (unknown error)
Feb  6 09:06:40 localhost crmd[468289]:  notice: database-0-galera-bundle-docker-0_start_0:29 [ Cannot connect to the Docker daemon. Is the docker daemon running on this host?\nCannot connect to the Docker daemo
n. Is the docker daemon running on this host?\nocf-exit-reason:failed to pull image 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest\n ]
Feb  6 09:06:41 localhost crmd[468289]:  notice: Result of stop operation for galera-bundle-docker-0 on database-0: 0 (ok)
Feb  6 09:06:41 localhost attrd[468287]:  notice: Removing all galera-bundle-0 attributes for peer messaging-1
Feb  6 09:06:41 localhost systemd: Starting Docker Storage Setup...
Feb  6 09:06:41 localhost systemd: Starting SystemWide Container Registries


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
puppet-tripleo-7.4.3-11.el7ost.noarch

Steps to Reproduce:
-------------------
1. Install RHOS-12 ga and try update to 2018-01-26.2


Actual results:
---------------
Update failed

Additional info:
----------------
Virtual setup: 3controllers + 3database + 3messaging + 3ceph + 2compute + 2networker

Comment 12 errata-xmlrpc 2018-03-28 17:28:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0607