Bug 1456138
| Summary: | devicemapper error dm_task_set_cookie failed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | liujia <jiajliu> | ||||
| Component: | docker | Assignee: | Daniel Walsh <dwalsh> | ||||
| Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.3 | CC: | agk, amurdaca, anli, aos-bugs, bmeng, chaoyang, ddarrah, dmoessne, dwalsh, ghuang, gpei, gtirloni, hannsj_uhl, haowang, jhonce, jhou, jiajliu, jialiu, jligon, jokerman, lsm5, lsu, lxia, mmccomas, myllynen, nhorman, rkant, sdodson, vgoyal, wehe, wmeng, xtian, yuxzhu | ||||
| Target Milestone: | rc | Keywords: | Extras, TestBlocker | ||||
| Target Release: | 7.3 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | docker-1.12.6-32.git88a4867.el7_3 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1463003 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-06-28 15:39:34 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1467350 | ||||||
| Attachments: |
|
||||||
Now, all upgrade tests against container env have been blocked. Add "TestBlocker" keywords. liujia, If you `yum downgrade docker-1.12.6-16.el7` and restart prior to performing the upgrade does it work? I suspect this may be a regression in docker. (In reply to Scott Dodson from comment #3) > liujia, > > If you `yum downgrade docker-1.12.6-16.el7` and restart prior to performing > the upgrade does it work? I suspect this may be a regression in docker. I think you are right. If downgrade docker to 1.12.6-16, upgrade succeed. Then this bug will not block left test. Thx~ Assigning to containers as this looks to be a docker regression. This is not restricted to upgrade scenario, I have a new cluster with the new docker installed, it works at first, but after running some testing, the docker is broken with the error: failed at task [Restart master] for a devicemapper error: Can't set cookie dm_task_set_cookie failed. For the containers team, Restart master tasks just calls `systemctl restart atomic-openshift-master` and this is the unit definition.
[Unit]
After=docker.service
Requires=docker.service
PartOf=docker.service
After=etcd_container.service
Wants=etcd_container.service
[Service]
EnvironmentFile=/etc/sysconfig/atomic-openshift-master
ExecStartPre=-/usr/bin/docker rm -f atomic-openshift-master
ExecStart=/usr/bin/docker run --rm --privileged --net=host --name atomic-openshift-master --env-file=/etc/sysconfig/atomic-openshift-master -v /var/lib/origin:/var/lib/origin -v /var/log:/var/log -v /var/run/docker.sock:/var/run/docker.sock -v /etc/origin:/etcd/origin openshift3/ose:v3.6 start master --config=${CONFIG_FILE} $OPTIONS
ExecStartPost=/usr/bin/sleep 10
ExecStop=/usr/bin/docker stop atomic-openshift-master
Restart=always
RestartSec=5s
[Install]
WantedBy=docker.service
Anyone got a pointer to the bit of source code producing that message? (All failure modes of the dm_task_set_cookie function issue a low-level error message, so perhaps that can be extracted - or the logging fixed if it wasn't captured.) (I don't know what parameters are used by this caller's source code, but there can be several dependencies here including use of semaphores and /dev/urandom.) I also encounter this issue on 3.5 too. containerized install on RHEL + openshift v3.5.5.24 + docker-1.12.6-28.git1398f24.el7.x86_64, failed at restart master. containerized install on RHEL + openshift v3.4.1.32 + docker-1.12.6-28.git1398f24.el7.x86_64, PASS. This is blocking testing with latest docker version. I am thinking this probably is semaphore leak issue where we have exhausted maximum number of semaphores on system. https://github.com/moby/moby/issues/33603 Can you provide output of following commands. - dmsetup udevcookies - ipcs - cat /proc/sys/kernel/sem On the failing system, try running "dmsetup udevcomplete_all and see if that gets you going. I will also need an easy way to reproduce this problem to figure out why leak is happening. Can you attach journal logs of failing system. Want to see if there are any messages there which indicate towards possible udev issue or something else. Just faced this issue after I stress tested a Kubernetes 1.6.5 cluster (asked it to scale a nginx deployment to 800 replicas). I noticed containers were failing to get created. I tried to restart Docker but the same error ("devicemapper: Can't set cookie dm_task_set_cookie failed") continued. Providing logs in case it's useful here.
Created attachment 1289156 [details]
devicemapper: Can't set cookie dm_task_set_cookie failed
kernel: 3.10.0-514.21.1.el7.x86_64
container-selinux-2.12-2.gite7096ce.el7.noarch
docker-1.12.6-28.git1398f24.el7.centos.x86_64
docker-client-1.12.6-28.git1398f24.el7.centos.x86_64
docker-common-1.12.6-28.git1398f24.el7.centos.x86_64
skopeo-containers-0.1.19-1.el7.x86_64
Increasing the semaphores limits fixed the issue for me. Thanks for the insights. This is definitely a BLOCKER Bug. Need to get this fixed as soon as possible. Pull request is upstream https://github.com/moby/moby/pull/33732 Hopefully merged soon, we will need this back ported to projectatomic/docker. In docker-1.12.6-28.git1398f24.el7.x86_64 #docker run --it --rm rhel7 bash #unshare bash #exit # dmsetup udevcookies Cookie Semid Value Last semop time Last change time 0xd4d95ea 1540096 1 Tue Jun 20 03:56:14 2017 Tue Jun 20 03:56:14 2017 and in docker-1.12.6-32.git88a4867.el7.x86_64 #dmsetup udevcookies shows nothing. Version: docker-1.12.6-32.git88a4867.el7.x86_64 scenario 1-pass: 1. Container install ocp3.5 on docker-1.12.6-32 2. New-app to trigger sti-build 3. restart atomic-openshift-master/atomic-openshift-node/docker service scenario 2-pass: 1. Trigger upgrade above ocp3.5(with docker-1.12.6-32) to ocp3.6 2. New-app after upgrade OCP 3.5 with docker-1.12.6-32 works well. Upgrade ocp3.5 with docker-1.12.6-32 works well containerized install on RHEL + openshift v3.6.116 + docker-1.12.6-32.git88a4867.el7.x86_64, PASS. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1620 |
Description of problem: Upgrade ocp 3.5(container install) failed at task [Restart master] for a devicemapper error: Can't set cookie dm_task_set_cookie failed. fatal: [openshift-109.x.x.x]: FAILED! => { "changed": false, "failed": true, "invocation": { "module_args": { "daemon_reload": false, "enabled": null, "masked": null, "name": "atomic-openshift-master", "state": "restarted", "user": false } } } MSG: Unable to restart service atomic-openshift-master: Job for atomic-openshift-master.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master.service" and "journalctl -xe" for details. # systemctl status atomic-openshift-master.service -l ● atomic-openshift-master.service Loaded: loaded (/etc/systemd/system/atomic-openshift-master.service; enabled; vendor preset: disabled) Active: activating (start-post) (Result: exit-code) since Sat 2017-05-27 01:31:53 EDT; 7s ago Process: 20005 ExecStop=/usr/bin/docker stop atomic-openshift-master (code=exited, status=1/FAILURE) Process: 20020 ExecStart=/usr/bin/docker run --rm --privileged --net=host --name atomic-openshift-master --env-file=/etc/sysconfig/atomic-openshift-master -v /var/lib/origin:/var/lib/origin -v /var/log:/var/log -v /var/run/docker.sock:/var/run/docker.sock -v /etc/origin:/etc/origin openshift3/ose:${IMAGE_VERSION} start master --config=${CONFIG_FILE} $OPTIONS (code=exited, status=125) Process: 20014 ExecStartPre=/usr/bin/docker rm -f atomic-openshift-master (code=exited, status=1/FAILURE) Main PID: 20020 (code=exited, status=125); : 20021 (sleep) Memory: 92.0K CGroup: /system.slice/atomic-openshift-master.service └─control └─20021 /usr/bin/sleep 10 May 27 01:31:53 openshift-109.x.x.x systemd[1]: Starting atomic-openshift-master.service... May 27 01:31:53 openshift-109.x.x.x docker[20014]: Error response from daemon: No such container: atomic-openshift-master May 27 01:31:54 openshift-109.x.x.x docker[20020]: /usr/bin/docker-current: Error response from daemon: devmapper: Error activating devmapper device for 'ed6dd8b37d073aedcb636d597c81437c02e84c3a9593923dc5ccd8569f01abab-init': devicemapper: Can't set cookie dm_task_set_cookie failed. May 27 01:31:54 openshift-109.x.x.x docker[20020]: See '/usr/bin/docker-current run --help'. May 27 01:31:54 openshift-109.x.x.x systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=125/n/a Version-Release number of selected component (if applicable): atomic-openshift-utils-3.6.85-1.git.0.109a54e.el7.noarch docker-1.12.6-28.git1398f24.el7.x86_64 How reproducible: always Steps to Reproduce: 1.Container install ocp3.5(one master/node/etcd + one nfs) 2.Upgrade ocp3.5 to ocp3.6 # ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade.yml Actual results: Upgrade failed. Expected results: Upgrade succeed. Additional info: Tried to start master server manually, failed. Tried to restart docker,failed. Reboot host, then master and docker services restored. Re-run upgrade playbook, failed again at the same task.