Bug 1456138

Summary: devicemapper error dm_task_set_cookie failed
Product: Red Hat Enterprise Linux 7 Reporter: liujia <jiajliu>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: agk, amurdaca, anli, aos-bugs, bmeng, chaoyang, ddarrah, dmoessne, dwalsh, ghuang, giovanni.tirloni, gpei, hannsj_uhl, haowang, jhonce, jhou, jiajliu, jialiu, jligon, jokerman, lsm5, lsu, lxia, mmccomas, myllynen, nhorman, rkant, sdodson, vgoyal, wehe, wmeng, xtian, yuxzhu
Target Milestone: rcKeywords: Extras, TestBlocker
Target Release: 7.3   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: docker-1.12.6-32.git88a4867.el7_3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1463003 (view as bug list) Environment:
Last Closed: 2017-06-28 15:39:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1467350    
Description Flags
devicemapper: Can't set cookie dm_task_set_cookie failed none

Description liujia 2017-05-27 08:11:33 UTC
Description of problem:
Upgrade ocp 3.5(container install) failed at task [Restart master] for a devicemapper error: Can't set cookie dm_task_set_cookie failed.

fatal: [openshift-109.x.x.x]: FAILED! => {
    "changed": false,
    "failed": true,
    "invocation": {
        "module_args": {
            "daemon_reload": false,
            "enabled": null,
            "masked": null,
            "name": "atomic-openshift-master",
            "state": "restarted",
            "user": false


Unable to restart service atomic-openshift-master: Job for atomic-openshift-master.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master.service" and "journalctl -xe" for details.

# systemctl status atomic-openshift-master.service -l
● atomic-openshift-master.service
   Loaded: loaded (/etc/systemd/system/atomic-openshift-master.service; enabled; vendor preset: disabled)
   Active: activating (start-post) (Result: exit-code) since Sat 2017-05-27 01:31:53 EDT; 7s ago
  Process: 20005 ExecStop=/usr/bin/docker stop atomic-openshift-master (code=exited, status=1/FAILURE)
  Process: 20020 ExecStart=/usr/bin/docker run --rm --privileged --net=host --name atomic-openshift-master --env-file=/etc/sysconfig/atomic-openshift-master -v /var/lib/origin:/var/lib/origin -v /var/log:/var/log -v /var/run/docker.sock:/var/run/docker.sock -v /etc/origin:/etc/origin openshift3/ose:${IMAGE_VERSION} start master --config=${CONFIG_FILE} $OPTIONS (code=exited, status=125)
  Process: 20014 ExecStartPre=/usr/bin/docker rm -f atomic-openshift-master (code=exited, status=1/FAILURE)
 Main PID: 20020 (code=exited, status=125);         : 20021 (sleep)
   Memory: 92.0K
   CGroup: /system.slice/atomic-openshift-master.service
             └─20021 /usr/bin/sleep 10

May 27 01:31:53 openshift-109.x.x.x systemd[1]: Starting atomic-openshift-master.service...
May 27 01:31:53 openshift-109.x.x.x docker[20014]: Error response from daemon: No such container: atomic-openshift-master
May 27 01:31:54 openshift-109.x.x.x docker[20020]: /usr/bin/docker-current: Error response from daemon: devmapper: Error activating devmapper device for 'ed6dd8b37d073aedcb636d597c81437c02e84c3a9593923dc5ccd8569f01abab-init': devicemapper: Can't set cookie dm_task_set_cookie failed.
May 27 01:31:54 openshift-109.x.x.x docker[20020]: See '/usr/bin/docker-current run --help'.
May 27 01:31:54 openshift-109.x.x.x systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=125/n/a

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Container install ocp3.5(one master/node/etcd + one nfs)
2.Upgrade  ocp3.5 to ocp3.6
# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade.yml

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Tried to start master server manually, failed. Tried to restart docker,failed.   Reboot host, then master and docker services restored. Re-run upgrade playbook, failed again at the same task.

Comment 2 liujia 2017-05-27 09:33:05 UTC
Now, all upgrade tests against container env have been blocked. Add "TestBlocker" keywords.

Comment 3 Scott Dodson 2017-05-30 16:04:57 UTC

If you `yum downgrade docker-1.12.6-16.el7` and restart prior to performing the upgrade does it work? I suspect this may be a regression in docker.

Comment 4 liujia 2017-05-31 09:56:18 UTC
(In reply to Scott Dodson from comment #3)
> liujia,
> If you `yum downgrade docker-1.12.6-16.el7` and restart prior to performing
> the upgrade does it work? I suspect this may be a regression in docker.

I think you are right. If downgrade docker to 1.12.6-16, upgrade succeed. Then this bug will not block left test. Thx~

Comment 5 Scott Dodson 2017-05-31 13:39:39 UTC
Assigning to containers as this looks to be a docker regression.

Comment 6 Wang Haoran 2017-06-01 07:08:19 UTC
This is not restricted to upgrade scenario, I have a new cluster with the new docker installed, it works at first, but after running some testing, the docker is broken with the error:
failed at task [Restart master] for a devicemapper error: Can't set cookie dm_task_set_cookie failed.

Comment 7 Scott Dodson 2017-06-01 13:17:39 UTC
For the containers team, Restart master tasks just calls `systemctl restart atomic-openshift-master` and this is the unit definition.


ExecStartPre=-/usr/bin/docker rm -f atomic-openshift-master
ExecStart=/usr/bin/docker run --rm --privileged --net=host --name atomic-openshift-master --env-file=/etc/sysconfig/atomic-openshift-master -v /var/lib/origin:/var/lib/origin -v /var/log:/var/log -v /var/run/docker.sock:/var/run/docker.sock -v /etc/origin:/etcd/origin openshift3/ose:v3.6 start master --config=${CONFIG_FILE} $OPTIONS
ExecStartPost=/usr/bin/sleep 10
ExecStop=/usr/bin/docker stop atomic-openshift-master


Comment 8 Alasdair Kergon 2017-06-01 15:29:00 UTC
Anyone got a pointer to the bit of source code producing that message?

Comment 9 Alasdair Kergon 2017-06-01 15:36:14 UTC
(All failure modes of the dm_task_set_cookie function issue a low-level error message, so perhaps that can be extracted - or the logging fixed if it wasn't captured.)

Comment 10 Alasdair Kergon 2017-06-01 15:40:06 UTC
(I don't know what parameters are used by this caller's source code, but there can be several dependencies here including use of semaphores and /dev/urandom.)

Comment 11 Johnny Liu 2017-06-07 03:39:05 UTC
I also encounter this issue on 3.5 too.

containerized install on RHEL + openshift v3.5.5.24 + docker-1.12.6-28.git1398f24.el7.x86_64, failed at restart master.

containerized install on RHEL + openshift v3.4.1.32 + docker-1.12.6-28.git1398f24.el7.x86_64, PASS.

Comment 12 Johnny Liu 2017-06-14 07:45:39 UTC
This is blocking testing with latest docker version.

Comment 13 Vivek Goyal 2017-06-14 12:19:23 UTC
I am thinking this probably is semaphore leak issue where we have exhausted maximum number of semaphores on system.


Can you provide output of following commands.

- dmsetup udevcookies
- ipcs
- cat /proc/sys/kernel/sem

On the failing system, try running "dmsetup udevcomplete_all and see if that gets you going.

I will also need an easy way to reproduce this problem to figure out why leak is happening.

Can you attach journal logs of failing system. Want to see if there are any messages there which indicate towards possible udev issue or something else.

Comment 37 Giovanni Tirloni 2017-06-19 16:48:19 UTC
Just faced this issue after I stress tested a Kubernetes 1.6.5 cluster (asked it to scale a nginx deployment to 800 replicas). I noticed containers were failing to get created. I tried to restart Docker but the same error ("devicemapper: Can't set cookie dm_task_set_cookie failed") continued. Providing logs in case it's useful here.

Comment 38 Giovanni Tirloni 2017-06-19 16:49:47 UTC
Created attachment 1289156 [details]
devicemapper: Can't set cookie dm_task_set_cookie failed

kernel: 3.10.0-514.21.1.el7.x86_64


Comment 40 Giovanni Tirloni 2017-06-19 17:25:45 UTC
Increasing the semaphores limits fixed the issue for me. Thanks for the insights.

Comment 42 Daniel Walsh 2017-06-19 18:38:58 UTC
This is definitely a BLOCKER Bug.  Need to get this fixed as soon as possible.

Pull request is upstream


Hopefully merged soon, we will need this back ported to projectatomic/docker.

Comment 49 Luwen Su 2017-06-20 08:02:36 UTC
In docker-1.12.6-28.git1398f24.el7.x86_64

#docker run --it --rm rhel7 bash
#unshare bash


#  dmsetup udevcookies
Cookie       Semid      Value      Last semop time           Last change time
0xd4d95ea    1540096    1          Tue Jun 20 03:56:14 2017  Tue Jun 20 03:56:14 2017

and in docker-1.12.6-32.git88a4867.el7.x86_64

#dmsetup udevcookies shows nothing.

Comment 50 liujia 2017-06-20 08:34:33 UTC

scenario 1-pass:
1. Container install ocp3.5 on docker-1.12.6-32
2. New-app to trigger sti-build
3. restart atomic-openshift-master/atomic-openshift-node/docker service

scenario 2-pass:
1. Trigger upgrade above ocp3.5(with docker-1.12.6-32) to ocp3.6
2. New-app after upgrade

OCP 3.5 with docker-1.12.6-32 works well.
Upgrade ocp3.5 with docker-1.12.6-32 works well

Comment 51 Johnny Liu 2017-06-20 08:41:10 UTC
containerized install on RHEL + openshift v3.6.116 + docker-1.12.6-32.git88a4867.el7.x86_64, PASS.

Comment 53 errata-xmlrpc 2017-06-28 15:39:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.