Bug 1316786 - Docker can activate storage before LVM is ready, causing "Failed to start Docker Application Container Engine."
Docker can activate storage before LVM is ready, causing "Failed to start Doc...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker (Show other bugs)
7.2
All Linux
unspecified Severity high
: rc
: ---
Assigned To: Vivek Goyal
atomic-bugs@redhat.com
: Extras, UpcomingRelease
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-11 00:09 EST by Paul Wayper
Modified: 2017-07-03 11:13 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Sometimes docker-storage-setup service can start early and thin pool might not be ready yet. Consequence: docker-storage-setup can fail early and then docker will fail. Fix: Now docker-storage-setup waits for thin pool to come up. Default wait time is 60 seconds and it is configurable. Result: docker-storage-setup and docker will start fine upon reboot.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-17 15:43:00 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Paul Wayper 2016-03-11 00:09:09 EST
Description of problem:

Because the docker-storage-setup.service does not wait on LVM2 activation, it is possible for the Docker service to be brought up before its storage is ready, causing the service to fail.

When the user restarts the docker service, the LVM storage is ready and the docker service starts and runs correctly.

Version-Release number of selected component (if applicable):

OpenShift Enterprise 3.1.1

How reproducible:

Always

Steps to Reproduce:
1. Configure machine with LVM LVs that take a long time to activate - e.g. using SAN or having many LVs.
2. Reboot machine

Actual results:

3. Docker service does not start.  The following error messages appear in the journal:

Mar 08 12:32:07 docker.example.com systemd[1]: docker.service failed.
Mar 08 12:32:07 docker.example.com systemd[1]: start request repeated too quickly for docker.service
Mar 08 12:32:07 docker.example.com systemd[1]: Failed to start Docker Application Container Engine.

Expected results:

3. Docker waits for LVM LVs to activate and starts after that.

Additional info:

To fix this, add:

Wants=lvm2-activation.service

to /usr/lib/systemd/system/docker-storage-setup.service
Comment 2 Alexander Koksharov 2016-03-15 09:36:32 EDT
solution proposed did not help

Additional info:
  root filesystem is on a lvm, too. so all lvm related services are already triggered when system reach docker[-storage-setup].service. 
  How to make docker.service dependency on a device itself not on other service?
Comment 4 Vivek Goyal 2016-03-16 13:42:15 EDT
What's the customer configuration. What are we waiting to be up and running. Are we waiting for thin pool device to be up and running or waiting for physical device (on which pool is setup) to be up and running.
Comment 5 Vivek Goyal 2016-03-16 13:42:43 EDT
Can you provide contents of /etc/sysconfig/docker-storage-setup also.
Comment 6 Vivek Goyal 2016-03-16 13:43:16 EDT
These systemd restart messages kill the actual error message. How do we get rid of those.
Comment 7 Vivek Goyal 2016-03-17 11:40:46 EDT
There does not seem to be a generic service we can wait for that will fix the issue. We most likely will have to identify the device we want to wait for and somehow create a dependency on that device so that docker-storage-setup runs after that device is up.

So that's why this question of which is the device in question.
Comment 8 Alexander Koksharov 2016-03-21 08:28:13 EDT
device in question is:

> cat docker-storage-setup 
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
DEVS=/dev/vdb
VG=docker-vg

volume group have to be active before docker service start
Comment 9 Vivek Goyal 2016-03-21 09:39:21 EDT
So /dev/vdb is device in question here? Also do you have full journal logs. In the logs which you pasted above, I only see that "start request repeated too quickly for docker.service". No real error message.
Comment 10 Vivek Goyal 2016-03-21 10:14:00 EDT
Can customer disable Restart=on-failure. This is forcing restart of docker even if it fails and in the process losing the actual error message.
Comment 11 Vivek Goyal 2016-03-21 12:22:31 EDT
Can you try following.

- Copy /usr/lib/systemd/system/docker-storage-setup.service  /etc/systemd/system/

- Edit /etc/systemd/system/docker-storage-setup.service and add following

  Wants=dev-sdb.service
  After=dev-sdb.service

- systemctl daemon-reload

And see if this solves the problem. 

This is assuming that /dev/sdb is the name of the device.
Comment 12 Vivek Goyal 2016-03-21 12:23:52 EDT
Little correction, above Wants and After suffix should be device and not service.

  Wants=dev-sdb.device
  After=dev-sdb.device
Comment 13 Vivek Goyal 2016-03-21 14:56:21 EDT
Actually, I think real problem here might be that we should wait for lvm thin pool to be ready. That way it will work with the configurations where user has created a volume group and passed to docker-storage-setup. During first run, a thin pool will be created but over subsequent reboots, it will take a while for thin pool to show up.

So if one waits for lvm thin pool to be active with a timeout period, that should help.

Right now I am not sure how to wait for lvm thin pool. Wants=<pool>.device and After=<pool>.device directives don't seem to work same way for lvm/dm devices. CCing some folks from lvm team and they might have better idea about how to create those dependencies.
Comment 15 Vivek Goyal 2016-03-21 16:23:24 EDT
BTW, please do ask customer to try putting dependency on underlying block device (/dev/vdb) in docker-storage-setup.service file. I tested it and it worked for me. Thing is it is racy w.r.t lvm. In my test, by the time I am
checking for volume group and lvm thin pool, it is up and running. But that
might not be the case all the time.

Wants=dev-sdb.device
After=dev-sdb.device
Comment 16 Alexander Koksharov 2016-03-29 07:20:18 EDT
Solution does not work.

Here is response from the Customer:
cat /etc/systemd/system/docker-storage-setup.service
[Unit]
Description=Docker Storage Setup
Wants=dev-vdb.device
After=cloud-final.service dev-vdb.device
Before=docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/docker-storage-setup
EnvironmentFile=-/etc/sysconfig/docker-storage-setup

[Install]
WantedBy=multi-user.target


Does not help.
Comment 19 Alexander Koksharov 2016-04-01 05:28:49 EDT
# cat /etc/systemd/system/docker-storage-setup.service
[Unit]
Description=Docker Storage Setup
Wants=dev-vdb.device
After=cloud-final.service dev-vdb.device dm-event.service
Before=docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/docker-storage-setup
EnvironmentFile=-/etc/sysconfig/docker-storage-setup

[Install]
WantedBy=multi-user.target

does not help.

It looks like dm-event.service is already started before the lvm is active.
Comment 30 Josep 'Pep' Turro Mauri 2016-05-04 04:27:29 EDT
This seems to be the proposed fix:

 https://github.com/projectatomic/docker-storage-setup/pull/111

Moving the bz to the docker component in RHEL where we should get the changes.
Comment 55 Daniel Walsh 2016-10-18 10:46:07 EDT
Fixed in docker-1.10 release.
Comment 59 Luwen Su 2016-11-14 11:03:26 EST
I met the similar problem and can make sure it's fixed in current version.
Though myy environment is as simple as a sdb device in vm, so feel free re-open the bug if you encounter it in a product environment.
Comment 61 errata-xmlrpc 2017-01-17 15:43:00 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0116.html

Note You need to log in before you can comment on or make changes to this bug.