Bug 1286326 - docker timeouts when waiting for mkfs to finish on devicemapper storage driver device
docker timeouts when waiting for mkfs to finish on devicemapper storage drive...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker-latest (Show other bugs)
7.1
Unspecified Unspecified
unspecified Severity medium
: rc
: ---
Assigned To: Lokesh Mandvekar
atomic-bugs@redhat.com
: Extras
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-27 23:11 EST by Jaroslav Henner
Modified: 2016-05-12 10:54 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 10:54:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jaroslav Henner 2015-11-27 23:11:02 EST
Description of problem:

By default on RHEL 7.1 docker creates sparse file one upon start (systemctl start docker). That file is used as a loopback-device (seemingly using dmsetup syscall eqivalents), then issues

mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base

This command may take several minutes to complete. Meanwhile the avgqu-sz gauge of sar shoots up to >25000. Then the systemctl start reports timeout.


Version-Release number of selected component (if applicable):
docker-1.7.1-115.el7.x86_64


How reproducible:
Something like 5 out of 10 cases on all-in-one Openstack VMs on machines I obtained from beaker. Setting unsafe caching of the VM disk seems to decrease the occurence


Steps to Reproduce:
# systemctl stop docker
#dmsetup remove /dev/dm-0
#dmsetup remove /dev/dm-1
# rm /var/lib/docker/devicemapper/devicemapper/*
# systemctl start docker


Actual results:
[root@test-node-compute-f429d openshift]# systemctl status -l docker.service
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─docker-sdn-ovs.conf
   Active: failed (Result: timeout) since Út 2015-09-29 15:30:25 UTC; 1min 53s ago
     Docs: http://docs.docker.com
 Main PID: 25025

zář 29 15:27:25 test-node-compute-f429d.example.com systemd[1]: Starting Docker Application Container Engine...
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.975880434Z" level=warning msg="--storage-opt dm.thinpooldev is preferred over --storage-opt dm.datadev or dm.metadatadev"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.976662822Z" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.978128795Z" level=error msg="Unable to delete device: Error running DeleteDevice dm_task_run failed"
zář 29 15:28:55 test-node-compute-f429d.example.com systemd[1]: docker.service operation timed out. Terminating.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: docker.service stopping timed out (2). Killing.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Failed to start Docker Application Container Engine.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Unit docker.service entered failed state.


Expected results:
service started


Additional info:
Everything written in the Description in this BZ is just a copy of https://github.com/docker/docker/issues/16653. I suggest to discuss upstream. This bug is often triggered by using the openshift-ansible which is recommended way of deploying Openshift, but it is not packaged and branded, therefore I am quite unsure how to report it downstream to make sure it is fixed and documented somehow.
Comment 1 Daniel Walsh 2015-12-01 17:42:25 EST
Lokesh setting the timout to 0 will fix this, Probably can close this as a duplicate if you have a RHEL bug for this now.
Comment 2 Daniel Walsh 2016-01-06 13:51:22 EST
Lokesh any update.
Comment 3 Daniel Walsh 2016-02-22 16:25:06 EST
Fixed in docker-1.10
Comment 4 Lokesh Mandvekar 2016-04-18 14:32:38 EDT
changing component based on docker version
Comment 6 Luwen Su 2016-05-04 02:36:15 EDT
This works for me in docker-latest-1.10.3-21.el7.x86_64,

https://github.com/docker/docker/pull/18310

Normally start the docker-latest service:
INFO[0001] devmapper: Successfully created filesystem xfs on device docker-8:3-48631329-base 


The command
#mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base  & service docker-latest restart

1.mkfs.ext4 will finish with no hang
2.docker-latest service will fail to start since the failure of creating filesystem, as the patch expected 

Move to verified
Comment 8 errata-xmlrpc 2016-05-12 10:54:17 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1057.html

Note You need to log in before you can comment on or make changes to this bug.