Bug 1286326

Summary:	docker timeouts when waiting for mkfs to finish on devicemapper storage driver device
Product:	Red Hat Enterprise Linux 7	Reporter:	Jaroslav Henner <jhenner>
Component:	docker-latest	Assignee:	Lokesh Mandvekar <lsm5>
Status:	CLOSED ERRATA	QA Contact:	atomic-bugs <atomic-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.1	CC:	dwalsh, lsm5, lsu
Target Milestone:	rc	Keywords:	Extras
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-05-12 14:54:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jaroslav Henner 2015-11-28 04:11:02 UTC

Description of problem:

By default on RHEL 7.1 docker creates sparse file one upon start (systemctl start docker). That file is used as a loopback-device (seemingly using dmsetup syscall eqivalents), then issues

mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base

This command may take several minutes to complete. Meanwhile the avgqu-sz gauge of sar shoots up to >25000. Then the systemctl start reports timeout.


Version-Release number of selected component (if applicable):
docker-1.7.1-115.el7.x86_64


How reproducible:
Something like 5 out of 10 cases on all-in-one Openstack VMs on machines I obtained from beaker. Setting unsafe caching of the VM disk seems to decrease the occurence


Steps to Reproduce:
# systemctl stop docker
#dmsetup remove /dev/dm-0
#dmsetup remove /dev/dm-1
# rm /var/lib/docker/devicemapper/devicemapper/*
# systemctl start docker


Actual results:
[root@test-node-compute-f429d openshift]# systemctl status -l docker.service
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─docker-sdn-ovs.conf
   Active: failed (Result: timeout) since Út 2015-09-29 15:30:25 UTC; 1min 53s ago
     Docs: http://docs.docker.com
 Main PID: 25025

zář 29 15:27:25 test-node-compute-f429d.example.com systemd[1]: Starting Docker Application Container Engine...
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.975880434Z" level=warning msg="--storage-opt dm.thinpooldev is preferred over --storage-opt dm.datadev or dm.metadatadev"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.976662822Z" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.978128795Z" level=error msg="Unable to delete device: Error running DeleteDevice dm_task_run failed"
zář 29 15:28:55 test-node-compute-f429d.example.com systemd[1]: docker.service operation timed out. Terminating.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: docker.service stopping timed out (2). Killing.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Failed to start Docker Application Container Engine.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Unit docker.service entered failed state.


Expected results:
service started


Additional info:
Everything written in the Description in this BZ is just a copy of https://github.com/docker/docker/issues/16653. I suggest to discuss upstream. This bug is often triggered by using the openshift-ansible which is recommended way of deploying Openshift, but it is not packaged and branded, therefore I am quite unsure how to report it downstream to make sure it is fixed and documented somehow.

Comment 1 Daniel Walsh 2015-12-01 22:42:25 UTC

Lokesh setting the timout to 0 will fix this, Probably can close this as a duplicate if you have a RHEL bug for this now.

Comment 2 Daniel Walsh 2016-01-06 18:51:22 UTC

Lokesh any update.

Comment 3 Daniel Walsh 2016-02-22 21:25:06 UTC

Fixed in docker-1.10

Comment 4 Lokesh Mandvekar 2016-04-18 18:32:38 UTC

changing component based on docker version

Comment 6 Luwen Su 2016-05-04 06:36:15 UTC

This works for me in docker-latest-1.10.3-21.el7.x86_64,

https://github.com/docker/docker/pull/18310

Normally start the docker-latest service:
INFO[0001] devmapper: Successfully created filesystem xfs on device docker-8:3-48631329-base 


The command
#mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base  & service docker-latest restart

1.mkfs.ext4 will finish with no hang
2.docker-latest service will fail to start since the failure of creating filesystem, as the patch expected 

Move to verified

Comment 8 errata-xmlrpc 2016-05-12 14:54:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1057.html