1122445 – device-mapper problem Unable to autorestart docker containers after unsuccessful docker daemon restart

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1122445 - device-mapper problem Unable to autorestart docker containers after unsuccessful docker daemon restart

Summary: device-mapper problem Unable to autorestart docker containers after unsuccess...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	docker
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Vivek Goyal
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-23 09:37 UTC by Jiri Zupka
Modified:	2023-09-14 02:11 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-04-06 18:43:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jiri Zupka 2014-07-23 09:37:25 UTC

Description of problem:
Docker containers are not autorestarted after unsuccessful docker daemon restart.

key info are from step5 in section additional info.

Version-Release number of selected component (if applicable):
docker-0.10.0-10.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1.start docker daemon (systemctl start docker)
2.docker run -d -i <image> cat
     start process in container which doesn't register SIGTERM handler
  strace of started command in container missing something like this line:
     rt_sigaction(SIGTERM, {0x7fdb85c01bd0, [], SA_RESTORER|SA_INTERRUPT, 
                  0x7fdb85216a00}, NULL, 8) = 0
3.show started container (docker ps)
4.restart docker daemon (systemctl restart docker)
5.show restarted container (docker ps)

Actual results:
in step 3 are started containers
in step 5 are not started containers

Expected results:
in step 3 are started containers
in step 5 are restarted containers

Additional info:
What is happens during restart:
  1) systemd => SIGTERM => docker daemon and wait for some timeout 10s
  2) docker daemon try to finish.
     a) docker daemon => SIGTERM => docker containers
     b) docker daemon waits until all docker containers dies.
  3) systemd waiting for docker daemon termination finish with timeout.
  4) systemd sends to docker daemon SIGKILL
     a) but docker daemon still waiting for containers.
        debug] daemon.go:901 stopping 
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa

     b) device-mapper disks can't be umounted until containers aren't terminated but docker daemon was killed by SIGKILL
  5) systemd starts docker daemon
  6) docker daemon try to autorestart containers
    a) try to kill old containers
      [debug] daemon.go:191 killing old running container 
          5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6
    b) try to unmount container disk
      [debug] deviceset.go:992 [devmapper] UnmountDevice
         (hash=5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6)
      [debug] deviceset.go:1028 [devmapper] UnmountDevice END
      [error] driver.go:140 Warning: error unmounting device 
          5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6: 
      UnmountDevice: device not-mounted id 
          5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6
      [debug] daemon.go:221 Container 
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa
          was  supposed to be running but is not.
      [debug] daemon.go:223 Marking as restarting
      [debug] deviceset.go:992 [devmapper] UnmountDevice(
          hash=5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa)
      [debug] deviceset.go:1028 [devmapper] UnmountDevice END
      [error] driver.go:140 Warning: error unmounting device   
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa: 
      UnmountDevice: device not-mounted id 
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa
   c) try to mount device
     [debug] daemon.go:381 Failed to start container  
        5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa:
       Error getting container 
         5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa from 
         driver devicemapper: Error mounting '/dev/mapper/docker-253:0-638111-
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa'
       on '/var/lib/docker/devicemapper
       /mnt/5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa': 
          device or resource busy

Comment 2 Daniel Walsh 2014-09-12 19:19:54 UTC

Could you attempt this with docker-1.2?

Comment 3 Matthias Clasen 2014-09-30 14:25:56 UTC

moving docker bugs off alexl

Comment 4 Chris Evich 2014-11-12 19:17:10 UTC

Note: This problem still readily reproduces with 1.3.0 especially if there are a LOT of running containers at the time of shutdown.

Comment 6 Daniel Walsh 2015-01-19 15:05:55 UTC

Mike any idea on this one?

Comment 7 Daniel Walsh 2015-03-09 17:12:12 UTC

Vivek can you take a look?

Comment 8 Vivek Goyal 2015-04-02 19:36:37 UTC

Is this problem still reproducible. I am trying it on latest upstream docker, and I see that running container exits once daemon exits. So after daemon restarts, there are no containers running and one can start the container which was running previously.

Please try with latest bits and see if issue is still reproducible.

Comment 9 Chris Evich 2015-04-06 18:23:21 UTC

Not sure, I'll give it a try with latest 1.5 packages...

Comment 10 Chris Evich 2015-04-06 18:36:36 UTC

Yep, it's working fine in docker-1.5.0-28 on RHEL7 for me.  I tried it with and without the --restart option.  In both cases, after about a 15-20 second delay, the daemon restarts properly.  With --restart always, the container also restarts properly.  Here's what I did:

[root@dockertest ~]# docker run -d --restart always registry.access.redhat.com/rhel7:latest bash -c 'trap "" TERM; while true; do sleep 1m; done'
fd3be67755a4d90b6c2755db5b752e707cb741dec66d2c8109bd999b347cfe10
[root@dockertest ~]# docker ps -a
CONTAINER ID        IMAGE                                     COMMAND                CREATED             STATUS              PORTS               NAMES
fd3be67755a4        registry.access.redhat.com/rhel7:latest   "\"bash -c 'trap \"\   7 seconds ago       Up 3 seconds                            distracted_engelbart   
[root@dockertest ~]# systemctl restart docker
1...3...5...7...9...11...13...15
-bash: 1...13...15: command not found
[root@dockertest ~]# docker ps -a
CONTAINER ID        IMAGE                                     COMMAND                CREATED             STATUS              PORTS               NAMES
fd3be67755a4        registry.access.redhat.com/rhel7:latest   "\"bash -c 'trap \"\   45 seconds ago      Up 20 seconds                           distracted_engelbart   
[root@dockertest ~]# docker stop distracted_engelbart
1...3...5...7...9...11...distracted_engelbart


So seems to be working for me now.  I'm fine if you want to close this as CURRENTRELEASE.

Comment 11 Red Hat Bugzilla 2023-09-14 02:11:56 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.