Bug 1542936 - [UPDATES] Docker restart during minor update causes issues to containers
Summary: [UPDATES] Docker restart during minor update causes issues to containers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z2
: 12.0 (Pike)
Assignee: Michele Baldessari
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On: 1545356
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-07 12:00 UTC by Yurii Prokulevych
Modified: 2018-03-28 17:29 UTC (History)
12 users (show)

Fixed In Version: puppet-tripleo-7.4.8-4.el7ost openstack-tripleo-heat-templates-7.0.9-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 17:28:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1747851 0 None None None 2018-02-07 13:32:14 UTC
OpenStack gerrit 543001 0 'None' MERGED Use --live-restore when setting up the docker service 2020-09-17 10:30:50 UTC
OpenStack gerrit 550239 0 'None' MERGED Add --live-restore to the docker_options in puppet/services/docker.yaml 2020-09-17 10:30:48 UTC
OpenStack gerrit 550240 0 'None' MERGED Improve the minor update of the docker service 2020-09-17 10:30:48 UTC
OpenStack gerrit 550241 0 'None' MERGED Make the minor update for docker idempotent 2020-09-17 10:30:50 UTC
Red Hat Product Errata RHBA-2018:0607 0 None None None 2018-03-28 17:29:25 UTC

Description Yurii Prokulevych 2018-02-07 12:00:37 UTC
Description of problem:
-----------------------
Minor update of RHOS-12 failed with next message:

Error: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
Error: Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Exec[galera-ready]/returns: change from notrun to 0 failed: /usr/bin/clustercheck >/dev/null returned 1 instead of one of [0]
Info: Class[Tripleo::Profile::Pacemaker::Database::Mysql_bundle]: Unscheduling all events on Class[Tripleo::Profile::Pacemaker::Database::Mysql_bundle]
Info: Creating state file /var/lib/puppet/state/state.yaml
Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)

From /var/log/messages it seems docker.service was restarted:
-------------------------------------------------------------
Feb  6 09:06:29 localhost puppet-user[471365]:   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')
Feb  6 09:06:29 localhost puppet-user[471365]: Compiled catalog for database-0.localdomain in environment production in 2.29 seconds
Feb  6 09:06:30 localhost puppet-user[471365]: (/Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-registry]/returns) executed successfully
Feb  6 09:06:30 localhost systemd: Stopping Docker Application Container Engine...
Feb  6 09:06:30 localhost dockerd-current: time="2018-02-06T09:06:30.385375196-05:00" level=info msg="Processing signal 'terminated'"
Feb  6 09:06:30 localhost dockerd-current: time="2018-02-06T09:06:30.448329984-05:00" level=warning msg="libcontainerd: container b6ef627065e65634d7f7c5ca514b8a238a9c5b5a6045ab50dbc315c1e75d212c restart canceled
"
Feb  6 09:06:40 localhost dockerd-current: time="2018-02-06T09:06:40.388903578-05:00" level=info msg="Container f956f7b5eab67a10b37e589d3f320c6de6cc98fc6e121849470618e47c21de7f failed to exit within 10 seconds o
f signal 15 - using the force"
Feb  6 09:06:40 localhost crmd[468289]:   error: Unexpected disconnect on remote-node galera-bundle-0
Feb  6 09:06:40 localhost crmd[468289]:   error: Result of monitor operation for galera-bundle-0 on database-0: Error
Feb  6 09:06:40 localhost dockerd-current: time="2018-02-06T09:06:40.562707692-05:00" level=info msg="stopping containerd after receiving terminated"
Feb  6 09:06:40 localhost crmd[468289]:  notice: Node galera-bundle-0 state is now lost
Feb  6 09:06:40 localhost crmd[468289]:  notice: Result of stop operation for galera-bundle-0 on database-0: 0 (ok)
Feb  6 09:06:40 localhost attrd[468287]:  notice: Removing all galera-bundle-0 attributes for peer database-0
Feb  6 09:06:40 localhost crmd[468289]:  notice: Result of stop operation for galera-bundle-docker-0 on database-0: 0 (ok)
Feb  6 09:06:40 localhost attrd[468287]:  notice: Removing all galera-bundle-0 attributes for peer messaging-1
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: INFO: checking for nsenter, which is required when 'monitor_cmd' is specified
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: NOTICE: Image (192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest) does not exist locally but will be pulled during start
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: NOTICE: Beginning pull of image, 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest
Feb  6 09:06:40 localhost docker(galera-bundle-docker-0)[472094]: ERROR: failed to pull image 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest
Feb  6 09:06:40 localhost lrmd[468286]:  notice: galera-bundle-docker-0_start_0:472094:stderr [ Cannot connect to the Docker daemon. Is the docker daemon running on this host? ]
Feb  6 09:06:40 localhost lrmd[468286]:  notice: galera-bundle-docker-0_start_0:472094:stderr [ Cannot connect to the Docker daemon. Is the docker daemon running on this host? ]
Feb  6 09:06:40 localhost lrmd[468286]:  notice: galera-bundle-docker-0_start_0:472094:stderr [ ocf-exit-reason:failed to pull image 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest ]
Feb  6 09:06:40 localhost crmd[468289]:  notice: Result of start operation for galera-bundle-docker-0 on database-0: 1 (unknown error)
Feb  6 09:06:40 localhost crmd[468289]:  notice: database-0-galera-bundle-docker-0_start_0:29 [ Cannot connect to the Docker daemon. Is the docker daemon running on this host?\nCannot connect to the Docker daemo
n. Is the docker daemon running on this host?\nocf-exit-reason:failed to pull image 192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest\n ]
Feb  6 09:06:41 localhost crmd[468289]:  notice: Result of stop operation for galera-bundle-docker-0 on database-0: 0 (ok)
Feb  6 09:06:41 localhost attrd[468287]:  notice: Removing all galera-bundle-0 attributes for peer messaging-1
Feb  6 09:06:41 localhost systemd: Starting Docker Storage Setup...
Feb  6 09:06:41 localhost systemd: Starting SystemWide Container Registries


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
puppet-tripleo-7.4.3-11.el7ost.noarch

Steps to Reproduce:
-------------------
1. Install RHOS-12 ga and try update to 2018-01-26.2


Actual results:
---------------
Update failed

Additional info:
----------------
Virtual setup: 3controllers + 3database + 3messaging + 3ceph + 2compute + 2networker

Comment 12 errata-xmlrpc 2018-03-28 17:28:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0607


Note You need to log in before you can comment on or make changes to this bug.