Bug 1122445 - device-mapper problem Unable to autorestart docker containers after unsuccessful docker daemon restart [NEEDINFO]
Summary: device-mapper problem Unable to autorestart docker containers after unsuccess...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-23 09:37 UTC by Jiri Zupka
Modified: 2019-03-06 01:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-06 18:43:26 UTC
vgoyal: needinfo? (jzupka)


Attachments (Terms of Use)

Description Jiri Zupka 2014-07-23 09:37:25 UTC
Description of problem:
Docker containers are not autorestarted after unsuccessful docker daemon restart.

key info are from step5 in section additional info.

Version-Release number of selected component (if applicable):
docker-0.10.0-10.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1.start docker daemon (systemctl start docker)
2.docker run -d -i <image> cat
     start process in container which doesn't register SIGTERM handler
  strace of started command in container missing something like this line:
     rt_sigaction(SIGTERM, {0x7fdb85c01bd0, [], SA_RESTORER|SA_INTERRUPT, 
                  0x7fdb85216a00}, NULL, 8) = 0
3.show started container (docker ps)
4.restart docker daemon (systemctl restart docker)
5.show restarted container (docker ps)

Actual results:
in step 3 are started containers
in step 5 are not started containers

Expected results:
in step 3 are started containers
in step 5 are restarted containers

Additional info:
What is happens during restart:
  1) systemd => SIGTERM => docker daemon and wait for some timeout 10s
  2) docker daemon try to finish.
     a) docker daemon => SIGTERM => docker containers
     b) docker daemon waits until all docker containers dies.
  3) systemd waiting for docker daemon termination finish with timeout.
  4) systemd sends to docker daemon SIGKILL
     a) but docker daemon still waiting for containers.
        debug] daemon.go:901 stopping 
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa

     b) device-mapper disks can't be umounted until containers aren't terminated but docker daemon was killed by SIGKILL
  5) systemd starts docker daemon
  6) docker daemon try to autorestart containers
    a) try to kill old containers
      [debug] daemon.go:191 killing old running container 
          5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6
    b) try to unmount container disk
      [debug] deviceset.go:992 [devmapper] UnmountDevice
         (hash=5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6)
      [debug] deviceset.go:1028 [devmapper] UnmountDevice END
      [error] driver.go:140 Warning: error unmounting device 
          5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6: 
      UnmountDevice: device not-mounted id 
          5be28d9aa9c06403b6852a83cb14e9499702d996656c658df708758cad75acd6
      [debug] daemon.go:221 Container 
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa
          was  supposed to be running but is not.
      [debug] daemon.go:223 Marking as restarting
      [debug] deviceset.go:992 [devmapper] UnmountDevice(
          hash=5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa)
      [debug] deviceset.go:1028 [devmapper] UnmountDevice END
      [error] driver.go:140 Warning: error unmounting device   
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa: 
      UnmountDevice: device not-mounted id 
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa
   c) try to mount device
     [debug] daemon.go:381 Failed to start container  
        5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa:
       Error getting container 
         5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa from 
         driver devicemapper: Error mounting '/dev/mapper/docker-253:0-638111-
          5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa'
       on '/var/lib/docker/devicemapper
       /mnt/5db88b7e52bc4a9c839b81425dec3e96b2b040098b229c45128961eeb273caaa': 
          device or resource busy

Comment 2 Daniel Walsh 2014-09-12 19:19:54 UTC
Could you attempt this with docker-1.2?

Comment 3 Matthias Clasen 2014-09-30 14:25:56 UTC
moving docker bugs off alexl

Comment 4 Chris Evich 2014-11-12 19:17:10 UTC
Note: This problem still readily reproduces with 1.3.0 especially if there are a LOT of running containers at the time of shutdown.

Comment 6 Daniel Walsh 2015-01-19 15:05:55 UTC
Mike any idea on this one?

Comment 7 Daniel Walsh 2015-03-09 17:12:12 UTC
Vivek can you take a look?

Comment 8 Vivek Goyal 2015-04-02 19:36:37 UTC
Is this problem still reproducible. I am trying it on latest upstream docker, and I see that running container exits once daemon exits. So after daemon restarts, there are no containers running and one can start the container which was running previously.

Please try with latest bits and see if issue is still reproducible.

Comment 9 Chris Evich 2015-04-06 18:23:21 UTC
Not sure, I'll give it a try with latest 1.5 packages...

Comment 10 Chris Evich 2015-04-06 18:36:36 UTC
Yep, it's working fine in docker-1.5.0-28 on RHEL7 for me.  I tried it with and without the --restart option.  In both cases, after about a 15-20 second delay, the daemon restarts properly.  With --restart always, the container also restarts properly.  Here's what I did:

[root@dockertest ~]# docker run -d --restart always registry.access.redhat.com/rhel7:latest bash -c 'trap "" TERM; while true; do sleep 1m; done'
fd3be67755a4d90b6c2755db5b752e707cb741dec66d2c8109bd999b347cfe10
[root@dockertest ~]# docker ps -a
CONTAINER ID        IMAGE                                     COMMAND                CREATED             STATUS              PORTS               NAMES
fd3be67755a4        registry.access.redhat.com/rhel7:latest   "\"bash -c 'trap \"\   7 seconds ago       Up 3 seconds                            distracted_engelbart   
[root@dockertest ~]# systemctl restart docker
1...3...5...7...9...11...13...15
-bash: 1...13...15: command not found
[root@dockertest ~]# docker ps -a
CONTAINER ID        IMAGE                                     COMMAND                CREATED             STATUS              PORTS               NAMES
fd3be67755a4        registry.access.redhat.com/rhel7:latest   "\"bash -c 'trap \"\   45 seconds ago      Up 20 seconds                           distracted_engelbart   
[root@dockertest ~]# docker stop distracted_engelbart
1...3...5...7...9...11...distracted_engelbart


So seems to be working for me now.  I'm fine if you want to close this as CURRENTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.