RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1263394 - Docker does not start up on the node
Summary: Docker does not start up on the node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.1
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: rc
: 7.1
Assignee: Lokesh Mandvekar
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
: 1230389 (view as bug list)
Depends On:
Blocks: 1264193
TreeView+ depends on / blocked
 
Reported: 2015-09-15 18:09 UTC by Veer Muchandi
Modified: 2019-03-06 01:04 UTC (History)
8 users (show)

Fixed In Version: docker-1.9.1-2.el7_2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1264193 (view as bug list)
Environment:
Last Closed: 2016-01-07 21:59:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Veer Muchandi 2015-09-15 18:09:06 UTC
Description of problem:
Nodes have been running for a few weeks now. Suddenly one node went down and Docker does not start. Again same thing happened to another node.

Version-Release number of selected component (if applicable):
3.0.1

How reproducible:
This happened in the existing OpenShift environment on its own. I cannot recreate the situation. I can provide the logs from the node that went down. 

Steps to Reproduce:
1.
2.
3.

Actual results:
Node is supposed to run without issues. Docker should start up. 

Expected results:
Docker does not start.

Additional info:

# systemctl start docker
Job for docker.service failed. See 'systemctl status docker.service' and 'journalctl -xn' for details.


[root@node1 cloud-user]# journalctl -xn -u docker -l
-- Logs begin at Mon 2015-09-14 20:42:58 EDT, end at Mon 2015-09-14 21:28:05 EDT. --
Sep 14 20:44:56 node1.demov3.osecloud.com docker[1137]: time="2015-09-14T20:44:56-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..."
Sep 14 20:44:56 node1.demov3.osecloud.com systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has failed.
-- 
-- The result is failed.
Sep 14 20:44:56 node1.demov3.osecloud.com systemd[1]: Unit docker.service entered failed state.
Sep 14 21:26:35 node1.demov3.osecloud.com systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has begun starting up.
Sep 14 21:26:35 node1.demov3.osecloud.com docker[1317]: time="2015-09-14T21:26:35-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)"
Sep 14 21:26:35 node1.demov3.osecloud.com docker[1317]: time="2015-09-14T21:26:35-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
Sep 14 21:28:05 node1.demov3.osecloud.com systemd[1]: docker.service operation timed out. Terminating.
Sep 14 21:28:05 node1.demov3.osecloud.com docker[1317]: time="2015-09-14T21:28:05-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..."
Sep 14 21:28:05 node1.demov3.osecloud.com systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has failed.
-- 
-- The result is failed.
Sep 14 21:28:05 node1.demov3.osecloud.com systemd[1]: Unit docker.service entered failed state.

Comment 3 Daniel Walsh 2015-09-15 19:37:06 UTC
Could you be out of space?

Comment 4 Veer Muchandi 2015-09-15 20:24:16 UTC
There is space
# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        20G  5.5G   15G  28% /
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G   17M  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup

Comment 5 Daniel Walsh 2015-09-15 20:46:42 UTC
What happens if you run 

docker -d

Comment 6 Veer Muchandi 2015-09-15 20:57:49 UTC
# docker -d
INFO[0000] +job serveapi(unix:///var/run/docker.sock)   
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock) 
ERRO[0000] WARNING: No --storage-opt dm.thinpooldev specified, using loopback; this configuration is strongly discouraged for production use

Comment 8 Daniel Walsh 2015-09-15 21:11:22 UTC
Ok So docker is working without all of the other flags.

vivek what are the standard docker flags on RHEL?

docker -d --selinux-enabled --storage-opt ...

Comment 9 Daniel Walsh 2015-09-15 21:12:53 UTC
Could this be running out of the 10 gig of space in the devicemapper?

Comment 10 Vivek Goyal 2015-09-15 21:21:40 UTC
Try --storage-opt dm.basesize=10G and see if same problem happens.

People have complained that mkfs.xfs is taking time with 100G thin devices when loopback devices are being used on top of cloud storage. And system times out.

In my testing it was taking 30 seoconds. Primarily it is slower storage problem.

Comment 11 Vivek Goyal 2015-09-15 21:23:12 UTC
For dm.basesize=10G to be effective, you will have to setup a fresh instance of docker. If you restart already setup instance with new parameter, it will not help.

Comment 12 Veer Muchandi 2015-09-15 21:39:14 UTC
# docker -d --storage-opt dm.basesize=10G
ERRO[0000] WARNING: No --storage-opt dm.thinpooldev specified, using loopback; this configuration is strongly discouraged for production use 
ERRO[0000] Unable to delete device: Error running DeleteDevice dm_task_run failed 
INFO[0000] +job serveapi(unix:///var/run/docker.sock)   
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock) 
INFO[0008] +job init_networkdriver()                    
INFO[0008] -job init_networkdriver() = OK (0)           
INFO[0008] Loading containers: start.                   
..........................................................................................................................................................................................................................................
INFO[0008] Loading containers: done.                    
INFO[0008] docker daemon: 1.6.2 ba1f6c3/1.6.2; execdriver: native-0.2; graphdriver: devicemapper 
INFO[0008] +job acceptconnections()                     
INFO[0008] -job acceptconnections() = OK (0)            
INFO[0008] Daemon has completed initialization

Comment 13 Veer Muchandi 2015-09-15 22:46:23 UTC
after running docker -d, I tried starting docker again with systemctl start docker and it starts
Not sure what the difference is but it at least starts up.

Comment 14 Vivek Goyal 2015-09-16 12:08:52 UTC
Delays happen only first time when mkfs.xfs is being done. If you have run docker with "-d" successfully, that means mkfs is done and when you run next time using systemd it succeeds as it does not go through mkfs step.

Comment 15 Daniel Walsh 2015-09-16 12:32:12 UTC
If you are running on a loopback device, you should really consider moving to a physical device, We are heavily recommending that people do not run loopback devmapper in production.

Comment 16 Veer Muchandi 2015-09-17 22:25:36 UTC
Inspite of docker service coming up after I run docker -d, it seems to be causing issues. I am not able to start any pods on openshift after this. They are all in pending status.

Comment 17 Daniel Walsh 2015-09-18 13:26:41 UTC
Does something like 

docker run --rm fedora echo hello

Work?

Comment 18 Veer Muchandi 2015-09-18 13:49:16 UTC
Dan, Last night I ended up scrapping these boxes as I need OpenShift. I am reinstalling the whole thing again,

Comment 19 Daniel Walsh 2015-09-28 17:52:22 UTC
Ok reopen if it happens again.

Comment 20 Scott Dodson 2015-10-22 16:05:53 UTC
docker-1.8.2-7.el7.x86_64 seems to have reduced the timeout to 60 seconds which is less than the systemd default of 90 seconds. When this happened I'm unable to start docker after cleaning up /var/lib/docker.

I believe the intention was to increase the timeout rather than decrease it so I think this should be increased to 2min or more.

On my laptop I can reliably create the 100G loopback device in 90 seconds but not 60 seconds. Cloud storage may be considerably slower than the SSD in my laptop however.

I'll try to run a few rounds of tests in AWS and an OpenStack environment but I may not have time to do so.

Comment 21 Daniel Walsh 2015-10-28 17:42:59 UTC
Lokesh lets set the timeout to 5 minutes.

Comment 22 Scott Dodson 2015-11-19 18:42:26 UTC
*** Bug 1230389 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.