Bug 1684646

Summary: docker daemon becomes unresponsive during redeploy
Product: Red Hat OpenStack Reporter: Lars Kellogg-Stedman <lars>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED DUPLICATE QA Contact: Gurenko Alex <agurenko>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aschultz, dbecker, lars, mburns, morazi
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-04 01:57:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lars Kellogg-Stedman 2019-03-01 18:17:57 UTC
Description of problem:

Re-running the 'openstack overcloud deploy ...' command in an OSP 13 overcloud results in the Docker daemon on all overcloud hosts becoming unresponsive.  On all the systems, we see paunch hanging on this:

    /usr/bin/python2 /usr/bin/paunch --debug apply --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_3.json --config-id tripleo_step3 --managed-by tripleo-ComputeDeployedServer

At this point, commands that do not interact with containers  ("docker info", "docker image ls") work fine, but commands that interact with containers ("docker container ls") hang indefinitely.

Restarting the daemon ("systemctl restart docker") seems to restore the expected behavior, but it's clear if that will allow the deploy operation to complete.


Version-Release number of selected component (if applicable):

OSP 13

Comment 1 Lars Kellogg-Stedman 2019-03-01 18:21:34 UTC
Restarting docker caused the deploy to fail.  This is not the first time we have seen this behavior. I'm going to re-run the deploy to see if it reproduces immediately or not.

Comment 2 Alex Schultz 2019-03-01 20:19:31 UTC
Please provide a sosreport for the system and the steps to reproduce this issue. Does it happen all the time or randomly

Comment 3 Lars Kellogg-Stedman 2019-03-04 01:57:36 UTC
It happens all the time, and appears to be a dupe of either https://bugzilla.redhat.com/show_bug.cgi?id=1671861 or https://bugzilla.redhat.com/show_bug.cgi?id=1678096.  Upgrading to docker-1.13.1-94.gitb2f74b2.el7.x86_64 seems to resolve the issue.

*** This bug has been marked as a duplicate of bug 1678096 ***