Bug 1286326

Summary: docker timeouts when waiting for mkfs to finish on devicemapper storage driver device
Product: Red Hat Enterprise Linux 7 Reporter: Jaroslav Henner <jhenner>
Component: docker-latestAssignee: Lokesh Mandvekar <lsm5>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.1CC: dwalsh, lsm5, lsu
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 14:54:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Henner 2015-11-28 04:11:02 UTC
Description of problem:

By default on RHEL 7.1 docker creates sparse file one upon start (systemctl start docker). That file is used as a loopback-device (seemingly using dmsetup syscall eqivalents), then issues

mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base

This command may take several minutes to complete. Meanwhile the avgqu-sz gauge of sar shoots up to >25000. Then the systemctl start reports timeout.


Version-Release number of selected component (if applicable):
docker-1.7.1-115.el7.x86_64


How reproducible:
Something like 5 out of 10 cases on all-in-one Openstack VMs on machines I obtained from beaker. Setting unsafe caching of the VM disk seems to decrease the occurence


Steps to Reproduce:
# systemctl stop docker
#dmsetup remove /dev/dm-0
#dmsetup remove /dev/dm-1
# rm /var/lib/docker/devicemapper/devicemapper/*
# systemctl start docker


Actual results:
[root@test-node-compute-f429d openshift]# systemctl status -l docker.service
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─docker-sdn-ovs.conf
   Active: failed (Result: timeout) since Út 2015-09-29 15:30:25 UTC; 1min 53s ago
     Docs: http://docs.docker.com
 Main PID: 25025

zář 29 15:27:25 test-node-compute-f429d.example.com systemd[1]: Starting Docker Application Container Engine...
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.975880434Z" level=warning msg="--storage-opt dm.thinpooldev is preferred over --storage-opt dm.datadev or dm.metadatadev"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.976662822Z" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.978128795Z" level=error msg="Unable to delete device: Error running DeleteDevice dm_task_run failed"
zář 29 15:28:55 test-node-compute-f429d.example.com systemd[1]: docker.service operation timed out. Terminating.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: docker.service stopping timed out (2). Killing.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Failed to start Docker Application Container Engine.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Unit docker.service entered failed state.


Expected results:
service started


Additional info:
Everything written in the Description in this BZ is just a copy of https://github.com/docker/docker/issues/16653. I suggest to discuss upstream. This bug is often triggered by using the openshift-ansible which is recommended way of deploying Openshift, but it is not packaged and branded, therefore I am quite unsure how to report it downstream to make sure it is fixed and documented somehow.

Comment 1 Daniel Walsh 2015-12-01 22:42:25 UTC
Lokesh setting the timout to 0 will fix this, Probably can close this as a duplicate if you have a RHEL bug for this now.

Comment 2 Daniel Walsh 2016-01-06 18:51:22 UTC
Lokesh any update.

Comment 3 Daniel Walsh 2016-02-22 21:25:06 UTC
Fixed in docker-1.10

Comment 4 Lokesh Mandvekar 2016-04-18 18:32:38 UTC
changing component based on docker version

Comment 6 Luwen Su 2016-05-04 06:36:15 UTC
This works for me in docker-latest-1.10.3-21.el7.x86_64,

https://github.com/docker/docker/pull/18310

Normally start the docker-latest service:
INFO[0001] devmapper: Successfully created filesystem xfs on device docker-8:3-48631329-base 


The command
#mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base  & service docker-latest restart

1.mkfs.ext4 will finish with no hang
2.docker-latest service will fail to start since the failure of creating filesystem, as the patch expected 

Move to verified

Comment 8 errata-xmlrpc 2016-05-12 14:54:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1057.html