1316786 – Docker can activate storage before LVM is ready, causing "Failed to start Docker Application Container Engine."

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1316786 - Docker can activate storage before LVM is ready, causing "Failed to start Docker Application Container Engine."

Summary: Docker can activate storage before LVM is ready, causing "Failed to start Doc...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	docker
Sub Component:
Version:	7.2
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Vivek Goyal
QA Contact:	atomic-bugs@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-11 05:09 UTC by Paul Wayper
Modified:	2019-11-14 07:35 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Sometimes docker-storage-setup service can start early and thin pool might not be ready yet. Consequence: docker-storage-setup can fail early and then docker will fail. Fix: Now docker-storage-setup waits for thin pool to come up. Default wait time is 60 seconds and it is configurable. Result: docker-storage-setup and docker will start fine upon reboot.
Clone Of:
Environment:
Last Closed:	2017-01-17 20:43:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0116	0	normal	SHIPPED_LIVE	Moderate: docker security, bug fix, and enhancement update	2017-01-18 01:39:43 UTC

Description Paul Wayper 2016-03-11 05:09:09 UTC

Description of problem:

Because the docker-storage-setup.service does not wait on LVM2 activation, it is possible for the Docker service to be brought up before its storage is ready, causing the service to fail.

When the user restarts the docker service, the LVM storage is ready and the docker service starts and runs correctly.

Version-Release number of selected component (if applicable):

OpenShift Enterprise 3.1.1

How reproducible:

Always

Steps to Reproduce:
1. Configure machine with LVM LVs that take a long time to activate - e.g. using SAN or having many LVs.
2. Reboot machine

Actual results:

3. Docker service does not start.  The following error messages appear in the journal:

Mar 08 12:32:07 docker.example.com systemd[1]: docker.service failed.
Mar 08 12:32:07 docker.example.com systemd[1]: start request repeated too quickly for docker.service
Mar 08 12:32:07 docker.example.com systemd[1]: Failed to start Docker Application Container Engine.

Expected results:

3. Docker waits for LVM LVs to activate and starts after that.

Additional info:

To fix this, add:

Wants=lvm2-activation.service

to /usr/lib/systemd/system/docker-storage-setup.service

Comment 2 Alexander Koksharov 2016-03-15 13:36:32 UTC

solution proposed did not help

Additional info:
  root filesystem is on a lvm, too. so all lvm related services are already triggered when system reach docker[-storage-setup].service. 
  How to make docker.service dependency on a device itself not on other service?

Comment 4 Vivek Goyal 2016-03-16 17:42:15 UTC

What's the customer configuration. What are we waiting to be up and running. Are we waiting for thin pool device to be up and running or waiting for physical device (on which pool is setup) to be up and running.

Comment 5 Vivek Goyal 2016-03-16 17:42:43 UTC

Can you provide contents of /etc/sysconfig/docker-storage-setup also.

Comment 6 Vivek Goyal 2016-03-16 17:43:16 UTC

These systemd restart messages kill the actual error message. How do we get rid of those.

Comment 7 Vivek Goyal 2016-03-17 15:40:46 UTC

There does not seem to be a generic service we can wait for that will fix the issue. We most likely will have to identify the device we want to wait for and somehow create a dependency on that device so that docker-storage-setup runs after that device is up.

So that's why this question of which is the device in question.

Comment 8 Alexander Koksharov 2016-03-21 12:28:13 UTC

device in question is:

> cat docker-storage-setup 
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
DEVS=/dev/vdb
VG=docker-vg

volume group have to be active before docker service start

Comment 9 Vivek Goyal 2016-03-21 13:39:21 UTC

So /dev/vdb is device in question here? Also do you have full journal logs. In the logs which you pasted above, I only see that "start request repeated too quickly for docker.service". No real error message.

Comment 10 Vivek Goyal 2016-03-21 14:14:00 UTC

Can customer disable Restart=on-failure. This is forcing restart of docker even if it fails and in the process losing the actual error message.

Comment 11 Vivek Goyal 2016-03-21 16:22:31 UTC

Can you try following.

- Copy /usr/lib/systemd/system/docker-storage-setup.service  /etc/systemd/system/

- Edit /etc/systemd/system/docker-storage-setup.service and add following

  Wants=dev-sdb.service
  After=dev-sdb.service

- systemctl daemon-reload

And see if this solves the problem. 

This is assuming that /dev/sdb is the name of the device.

Comment 12 Vivek Goyal 2016-03-21 16:23:52 UTC

Little correction, above Wants and After suffix should be device and not service.

  Wants=dev-sdb.device
  After=dev-sdb.device

Comment 13 Vivek Goyal 2016-03-21 18:56:21 UTC

Actually, I think real problem here might be that we should wait for lvm thin pool to be ready. That way it will work with the configurations where user has created a volume group and passed to docker-storage-setup. During first run, a thin pool will be created but over subsequent reboots, it will take a while for thin pool to show up.

So if one waits for lvm thin pool to be active with a timeout period, that should help.

Right now I am not sure how to wait for lvm thin pool. Wants=<pool>.device and After=<pool>.device directives don't seem to work same way for lvm/dm devices. CCing some folks from lvm team and they might have better idea about how to create those dependencies.

Comment 15 Vivek Goyal 2016-03-21 20:23:24 UTC

BTW, please do ask customer to try putting dependency on underlying block device (/dev/vdb) in docker-storage-setup.service file. I tested it and it worked for me. Thing is it is racy w.r.t lvm. In my test, by the time I am
checking for volume group and lvm thin pool, it is up and running. But that
might not be the case all the time.

Wants=dev-sdb.device
After=dev-sdb.device

Comment 16 Alexander Koksharov 2016-03-29 11:20:18 UTC

Solution does not work.

Here is response from the Customer:
cat /etc/systemd/system/docker-storage-setup.service
[Unit]
Description=Docker Storage Setup
Wants=dev-vdb.device
After=cloud-final.service dev-vdb.device
Before=docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/docker-storage-setup
EnvironmentFile=-/etc/sysconfig/docker-storage-setup

[Install]
WantedBy=multi-user.target


Does not help.

Comment 19 Alexander Koksharov 2016-04-01 09:28:49 UTC

# cat /etc/systemd/system/docker-storage-setup.service
[Unit]
Description=Docker Storage Setup
Wants=dev-vdb.device
After=cloud-final.service dev-vdb.device dm-event.service
Before=docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/docker-storage-setup
EnvironmentFile=-/etc/sysconfig/docker-storage-setup

[Install]
WantedBy=multi-user.target

does not help.

It looks like dm-event.service is already started before the lvm is active.

Comment 30 Josep 'Pep' Turro Mauri 2016-05-04 08:27:29 UTC

This seems to be the proposed fix:

 https://github.com/projectatomic/docker-storage-setup/pull/111

Moving the bz to the docker component in RHEL where we should get the changes.

Comment 55 Daniel Walsh 2016-10-18 14:46:07 UTC

Fixed in docker-1.10 release.

Comment 59 Luwen Su 2016-11-14 16:03:26 UTC

I met the similar problem and can make sure it's fixed in current version.
Though myy environment is as simple as a sdb device in vm, so feel free re-open the bug if you encounter it in a product environment.

Comment 61 errata-xmlrpc 2017-01-17 20:43:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0116.html

Note You need to log in before you can comment on or make changes to this bug.