Bug 1316786
| Summary: | Docker can activate storage before LVM is ready, causing "Failed to start Docker Application Container Engine." | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Paul Wayper <pwayper> |
| Component: | docker | Assignee: | Vivek Goyal <vgoyal> |
| Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | agk, akokshar, aos-bugs, bvincell, dwalsh, erich, ghelleks, jhonce, jokerman, lsm5, lsu, mmccomas, mmcgrath, mmillson, pep, prajnoha, pwayper, vgoyal, zkabelac |
| Target Milestone: | rc | Keywords: | Extras, UpcomingRelease |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
Sometimes docker-storage-setup service can start early and thin pool might not be ready yet.
Consequence:
docker-storage-setup can fail early and then docker will fail.
Fix:
Now docker-storage-setup waits for thin pool to come up. Default wait time is 60 seconds and it is configurable.
Result:
docker-storage-setup and docker will start fine upon reboot.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-17 20:43:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Paul Wayper
2016-03-11 05:09:09 UTC
solution proposed did not help Additional info: root filesystem is on a lvm, too. so all lvm related services are already triggered when system reach docker[-storage-setup].service. How to make docker.service dependency on a device itself not on other service? What's the customer configuration. What are we waiting to be up and running. Are we waiting for thin pool device to be up and running or waiting for physical device (on which pool is setup) to be up and running. Can you provide contents of /etc/sysconfig/docker-storage-setup also. These systemd restart messages kill the actual error message. How do we get rid of those. There does not seem to be a generic service we can wait for that will fix the issue. We most likely will have to identify the device we want to wait for and somehow create a dependency on that device so that docker-storage-setup runs after that device is up. So that's why this question of which is the device in question. device in question is:
> cat docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
DEVS=/dev/vdb
VG=docker-vg
volume group have to be active before docker service start
So /dev/vdb is device in question here? Also do you have full journal logs. In the logs which you pasted above, I only see that "start request repeated too quickly for docker.service". No real error message. Can customer disable Restart=on-failure. This is forcing restart of docker even if it fails and in the process losing the actual error message. Can you try following. - Copy /usr/lib/systemd/system/docker-storage-setup.service /etc/systemd/system/ - Edit /etc/systemd/system/docker-storage-setup.service and add following Wants=dev-sdb.service After=dev-sdb.service - systemctl daemon-reload And see if this solves the problem. This is assuming that /dev/sdb is the name of the device. Little correction, above Wants and After suffix should be device and not service. Wants=dev-sdb.device After=dev-sdb.device Actually, I think real problem here might be that we should wait for lvm thin pool to be ready. That way it will work with the configurations where user has created a volume group and passed to docker-storage-setup. During first run, a thin pool will be created but over subsequent reboots, it will take a while for thin pool to show up. So if one waits for lvm thin pool to be active with a timeout period, that should help. Right now I am not sure how to wait for lvm thin pool. Wants=<pool>.device and After=<pool>.device directives don't seem to work same way for lvm/dm devices. CCing some folks from lvm team and they might have better idea about how to create those dependencies. BTW, please do ask customer to try putting dependency on underlying block device (/dev/vdb) in docker-storage-setup.service file. I tested it and it worked for me. Thing is it is racy w.r.t lvm. In my test, by the time I am checking for volume group and lvm thin pool, it is up and running. But that might not be the case all the time. Wants=dev-sdb.device After=dev-sdb.device Solution does not work. Here is response from the Customer: cat /etc/systemd/system/docker-storage-setup.service [Unit] Description=Docker Storage Setup Wants=dev-vdb.device After=cloud-final.service dev-vdb.device Before=docker.service [Service] Type=oneshot ExecStart=/usr/bin/docker-storage-setup EnvironmentFile=-/etc/sysconfig/docker-storage-setup [Install] WantedBy=multi-user.target Does not help. # cat /etc/systemd/system/docker-storage-setup.service [Unit] Description=Docker Storage Setup Wants=dev-vdb.device After=cloud-final.service dev-vdb.device dm-event.service Before=docker.service [Service] Type=oneshot ExecStart=/usr/bin/docker-storage-setup EnvironmentFile=-/etc/sysconfig/docker-storage-setup [Install] WantedBy=multi-user.target does not help. It looks like dm-event.service is already started before the lvm is active. This seems to be the proposed fix: https://github.com/projectatomic/docker-storage-setup/pull/111 Moving the bz to the docker component in RHEL where we should get the changes. Fixed in docker-1.10 release. I met the similar problem and can make sure it's fixed in current version. Though myy environment is as simple as a sdb device in vm, so feel free re-open the bug if you encounter it in a product environment. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0116.html |