Red Hat Bugzilla – Bug 1316786
Docker can activate storage before LVM is ready, causing "Failed to start Docker Application Container Engine."
Last modified: 2017-07-03 11:13:41 EDT
Description of problem:
Because the docker-storage-setup.service does not wait on LVM2 activation, it is possible for the Docker service to be brought up before its storage is ready, causing the service to fail.
When the user restarts the docker service, the LVM storage is ready and the docker service starts and runs correctly.
Version-Release number of selected component (if applicable):
OpenShift Enterprise 3.1.1
Steps to Reproduce:
1. Configure machine with LVM LVs that take a long time to activate - e.g. using SAN or having many LVs.
2. Reboot machine
3. Docker service does not start. The following error messages appear in the journal:
Mar 08 12:32:07 docker.example.com systemd: docker.service failed.
Mar 08 12:32:07 docker.example.com systemd: start request repeated too quickly for docker.service
Mar 08 12:32:07 docker.example.com systemd: Failed to start Docker Application Container Engine.
3. Docker waits for LVM LVs to activate and starts after that.
To fix this, add:
solution proposed did not help
root filesystem is on a lvm, too. so all lvm related services are already triggered when system reach docker[-storage-setup].service.
How to make docker.service dependency on a device itself not on other service?
What's the customer configuration. What are we waiting to be up and running. Are we waiting for thin pool device to be up and running or waiting for physical device (on which pool is setup) to be up and running.
Can you provide contents of /etc/sysconfig/docker-storage-setup also.
These systemd restart messages kill the actual error message. How do we get rid of those.
There does not seem to be a generic service we can wait for that will fix the issue. We most likely will have to identify the device we want to wait for and somehow create a dependency on that device so that docker-storage-setup runs after that device is up.
So that's why this question of which is the device in question.
device in question is:
> cat docker-storage-setup
# Edit this file to override any configuration options specified in
# For more details refer to "man docker-storage-setup"
volume group have to be active before docker service start
So /dev/vdb is device in question here? Also do you have full journal logs. In the logs which you pasted above, I only see that "start request repeated too quickly for docker.service". No real error message.
Can customer disable Restart=on-failure. This is forcing restart of docker even if it fails and in the process losing the actual error message.
Can you try following.
- Copy /usr/lib/systemd/system/docker-storage-setup.service /etc/systemd/system/
- Edit /etc/systemd/system/docker-storage-setup.service and add following
- systemctl daemon-reload
And see if this solves the problem.
This is assuming that /dev/sdb is the name of the device.
Little correction, above Wants and After suffix should be device and not service.
Actually, I think real problem here might be that we should wait for lvm thin pool to be ready. That way it will work with the configurations where user has created a volume group and passed to docker-storage-setup. During first run, a thin pool will be created but over subsequent reboots, it will take a while for thin pool to show up.
So if one waits for lvm thin pool to be active with a timeout period, that should help.
Right now I am not sure how to wait for lvm thin pool. Wants=<pool>.device and After=<pool>.device directives don't seem to work same way for lvm/dm devices. CCing some folks from lvm team and they might have better idea about how to create those dependencies.
BTW, please do ask customer to try putting dependency on underlying block device (/dev/vdb) in docker-storage-setup.service file. I tested it and it worked for me. Thing is it is racy w.r.t lvm. In my test, by the time I am
checking for volume group and lvm thin pool, it is up and running. But that
might not be the case all the time.
Solution does not work.
Here is response from the Customer:
Description=Docker Storage Setup
Does not help.
# cat /etc/systemd/system/docker-storage-setup.service
Description=Docker Storage Setup
After=cloud-final.service dev-vdb.device dm-event.service
does not help.
It looks like dm-event.service is already started before the lvm is active.
This seems to be the proposed fix:
Moving the bz to the docker component in RHEL where we should get the changes.
Fixed in docker-1.10 release.
I met the similar problem and can make sure it's fixed in current version.
Though myy environment is as simple as a sdb device in vm, so feel free re-open the bug if you encounter it in a product environment.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.