RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1368267 - [extras-rhel-7.2.7] docker-latest-1.12 daemon crashes unrecoverably
Summary: [extras-rhel-7.2.7] docker-latest-1.12 daemon crashes unrecoverably
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker-latest
Version: 7.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Antonio Murdaca
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-18 22:03 UTC by Ed Santiago
Modified: 2016-09-15 08:29 UTC (History)
6 users (show)

Fixed In Version: docker-latest-1.12.0-15.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-15 08:29:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1829 0 normal SHIPPED_LIVE docker-latest bug fix and enhancement update 2016-09-15 11:55:02 UTC

Comment 2 Ed Santiago 2016-08-19 01:26:24 UTC
Followup: one partially-successful docker-autotest run: various new errors, but no actual daemon crash. Now rerunning with thin pools. Will file BZs if new errors (cgroup-related, and possible problems with docker start) are reproducible.

Comment 3 Ed Santiago 2016-08-19 02:01:26 UTC
Another docker-autotest run without a daemon crash. I'll tentatively assume this is a problem with my old virt. But I'm starting another test run now.

Comment 4 Ed Santiago 2016-08-19 02:20:30 UTC
Spoke too soon (comment 3). This last run of docker-autotest killed the daemon. Similar symptoms as in description, but FWIW it was in a much earlier test within docker-autotest; i.e. it does not seem to be correlated to any specific subtest.

Comment 5 Daniel Walsh 2016-08-19 12:11:32 UTC
Can you attach the journal logs about docker?

Comment 7 Daniel Walsh 2016-08-19 12:54:02 UTC
Ed could you remove oci-register-machine.

Might have to rm -f /usr/libexec/oci/hooks.d/oci-register-machine

We are not going to support this for RHEL7,

ALso make sure docker-latest unit file is running with MountFlags=slave and docker-containerd.service file does not exist and docker-containerd is not running when you stop docker.service

Comment 8 Ed Santiago 2016-08-19 13:02:07 UTC
   # mv /usr/libexec/oci/hooks.d/oci-register-machine{,.DELETE-ME}
   # grep MountFlags /usr/lib/systemd/system/docker-latest.service 
   MountFlags=slave

   # ls -l /usr/lib/systemd/system/*contain*
   -rw-r--r--. 1 root root 791 Jun 14 09:36 /usr/lib/systemd/system/container-getty@.service
   # grep -ir containerd /usr/lib/systemd/system/
   # ps auxww|grep container
   root     22919  0.0  0.0 112648   980 pts/2    R+   08:59   0:00 grep --color=auto container

   # systemctl restart docker-latest

...and restarting docker-autotest. Normal run takes ~40m, expect followup then (or, if things crash, earlier).

Comment 11 Daniel Walsh 2016-08-19 14:13:39 UTC
I still see oci-register-machine in that log?

Comment 12 Ed Santiago 2016-08-19 14:35:26 UTC
I rebooted 35 minutes ago, wiped /var/lib/docker-latest, ran docker-latest-storage-setup --reset, started a new run of docker-autotest. It completed with errors but no actual docker daemon crashes. Second run killed the daemon. And even after reboot, I too am seeing:

   # journalctl -b | grep oci-register|tail
   ...
   Aug 19 10:29:56 esm-rhel7-d12-3.localdomain oci-register-machine[15436]: 2016/08/19 10:29:56 Register machine: prestart eff8cfca5e7cc2674bcc1688c6cbb148848a9eb4a0e67e710d18c9ee67303134 15425 /var/lib/docker-latest/devicemapper/mnt/36924cbac77b8cc89ebf829157c47dbddcd0bf21321213a5028d2a3aafc0421f/rootfs

   # find / -xdev  -name '*oci*register*'
   /var/lib/yum/yumdb/o/46369ab3dbeecdced9196135840b73c6f41f86d6-oci-register-machine-0-1.7.git31bbcd2.el7-x86_64
   /usr/share/doc/oci-register-machine-0
   /usr/share/doc/oci-register-machine-0/oci-register-machine.1.md
   /usr/share/man/man1/oci-register-machine.1.gz
   /usr/share/licenses/oci-register-machine-0
   /usr/libexec/oci/hooks.d/oci-register-machine.DELETE-ME

I've just tried rm -f that last one, restarted docker-latest, restarted docker-autotest.

Comment 13 Daniel Walsh 2016-08-19 14:39:08 UTC
Yes docker will execute any thing in the hooks.d directory.  So get rid of it.

Comment 15 Ed Santiago 2016-08-19 15:53:55 UTC
Three successful(*) docker-autotest runs since deleting oci-register-machine.

 (*) as in no daemon crashes

Comment 16 Daniel Walsh 2016-08-19 17:58:15 UTC
Yet another reason to remove oci-register-machine.

Comment 17 Ed Santiago 2016-08-19 20:19:44 UTC
Followup: many test runs later, on two machines, no further daemon crashes seen.

And FWIW no other instances of the sporadic "docker attach" failure that was happening on (some) runs. I'm tentatively going to conclude that those were related to oci-register-machine.

Comment 18 Daniel Walsh 2016-08-20 09:54:56 UTC
Sounds good, but my question is why?  Especially the crashes.  I can see oci-register-machine failing because of some race condition.  In docker-1.12 runc/oci-register-machine are running as separate processes from docker, docker-containerd, so how would they cause the daemon to crash.

Comment 19 Ed Santiago 2016-08-22 11:21:03 UTC
I wish I knew! But the daemon crash hasn't happened since removing oci-register-machine. I've kept running tests through the weekend, still no daemon crash. I haven't tried reinstalling oci-register-machine but suspect that if I do the crash will come back. Is there anything I can do on my end to try to track down the connection?

Comment 20 Daniel Walsh 2016-08-22 11:35:26 UTC
Antonio and Mrunal any ideas?

Comment 21 Ed Santiago 2016-08-22 13:59:56 UTC
   # yum reinstall oci-register-machine
   # systemctl restart docker-latest
   # ./autotest-local run docker
   ...

Failed within minutes.

   # systemctl status docker-latest -l
   ● docker-latest.service - Docker Application Container Engine
      Loaded: loaded (/usr/lib/systemd/system/docker-latest.service; disabled; vendor preset: disabled)
      Active: inactive (dead)
        Docs: http://docs.docker.com
   
   Aug 22 09:37:09 esm-rhel7-d12-3.localdomain dockerd-latest[15844]: time="2016-08-22T09:37:09.311440508-04:00" level=error msg="containerd: notify OOM events" error="no init process found"
   Aug 22 09:37:11 esm-rhel7-d12-3.localdomain dockerd-latest[15844]: time="2016-08-22T09:37:11.321688341-04:00" level=error msg="Create container failed with error: mkdir /var/run/docker/libcontainerd/containerd/ea859eefd708ee3b3827c50c73b7bc5368110113d4af2b541ef02e65b18f4864: file exists"
   Aug 22 09:37:11 esm-rhel7-d12-3.localdomain dockerd-latest[15844]: time="2016-08-22T09:37:11.986123078-04:00" level=error msg="Handler for POST /v1.24/containers/ea859eefd708ee3b3827c50c73b7bc5368110113d4af2b541ef02e65b18f4864/start returned error: mkdir /var/run/docker/libcontainerd/containerd/ea859eefd708ee3b3827c50c73b7bc5368110113d4af2b541ef02e65b18f4864: file exists"
   Aug 22 09:37:11 esm-rhel7-d12-3.localdomain dockerd-latest[15844]: time="2016-08-22T09:37:11.986836677-04:00" level=error msg="Handler for POST /v1.24/containers/ea859eefd708ee3b3827c50c73b7bc5368110113d4af2b541ef02e65b18f4864/start returned error: mkdir /var/run/docker/libcontainerd/containerd/ea859eefd708ee3b3827c50c73b7bc5368110113d4af2b541ef02e65b18f4864: file exists"
   Aug 22 09:37:11 esm-rhel7-d12-3.localdomain dockerd-latest[15844]: time="2016-08-22T09:37:11.988595236-04:00" level=info msg="{Action=remove, Username=root, LoginUID=0, PID=22083}"
   Aug 22 09:37:11 esm-rhel7-d12-3.localdomain dockerd-latest[15844]: time="2016-08-22T09:37:11.991408617-04:00" level=info msg="stopping containerd after receiving terminated"
   Aug 22 09:37:13 esm-rhel7-d12-3.localdomain systemd[1]: Stopped Docker Application Container Engine.
   Aug 22 09:49:38 esm-rhel7-d12-3.localdomain systemd[1]: [/usr/lib/systemd/system/docker-latest.service:19] Unknown lvalue 'TasksMax' in section 'Service'
   Aug 22 09:49:42 esm-rhel7-d12-3.localdomain systemd[1]: [/usr/lib/systemd/system/docker-latest.service:19] Unknown lvalue 'TasksMax' in section 'Service'
   Aug 22 09:58:53 esm-rhel7-d12-3.localdomain systemd[1]: [/usr/lib/systemd/system/docker-latest.service:19] Unknown lvalue 'TasksMax' in section 'Service'

Comment 23 Ed Santiago 2016-08-23 16:39:19 UTC
Removing the dependency on oci-register-machine isn't enough, because docker and atomic both bring it in so there will already be a /usr/libexec/oci/hooks.d/oci-register-machine when docker-latest gets  installed.

Confirmed by installing and testing docker-latest-1.12.0-12.el7 on fresh virt. (By "confirmed" I mean my sentence above. i.e. the crashing bug persists, with docker daemon dying in the middle of a test run, until I manually rm -f /usr/libexec/oci/hooks.d/oci-register-machine. I cannot yum remove oci-register-machine because that uninstalls docker and atomic).

Comment 24 Lokesh Mandvekar 2016-08-23 16:52:50 UTC
I can remove oci-register-machine from docker 1.10.3 if Dan agrees.

Also, atomic depends on docker (I think there's a separate bug to remove this dep) which then pulls in oci-register-machine, so removing it from docker should solve that.

Comment 25 Ed Santiago 2016-08-23 17:31:46 UTC
I don't think removing the Requires from docker 1.10 is enough: oci-register-machine will not be removed on upgrade. I suspect the only real solution[*] is:

    docker-1.10.spec:
      - Requires: oci-register-machine >= 1:0-1.7

    docker-latest-1.12.spec:
      + Conflicts: oci-register-machine

 [*] until someone figures out the true cause of the problem

Comment 26 Lokesh Mandvekar 2016-08-23 17:37:14 UTC
(In reply to Ed Santiago from comment #25)
> I don't think removing the Requires from docker 1.10 is enough:
> oci-register-machine will not be removed on upgrade. I suspect the only real
> solution[*] is:
> 
>     docker-1.10.spec:
>       - Requires: oci-register-machine >= 1:0-1.7
> 
>     docker-latest-1.12.spec:
>       + Conflicts: oci-register-machine
> 
>  [*] until someone figures out the true cause of the problem

True, I'll add this for now. Thanks

Dan, please let me know if you think this shouldn't be done

Comment 27 Ed Santiago 2016-08-23 19:41:29 UTC
No joy with -13 build. I guess Conflicts  doesn't work the way I thought it did:

   # yum update 
   Loaded plugins: product-id, search-disabled-repos, subscription-manager
   Resolving Dependencies
   --> Running transaction check
   ---> Package docker.x86_64 0:1.10.3-46.el7.11 will be updated
   ---> Package docker.x86_64 0:1.10.3-46.el7.12 will be an update
   ---> Package docker-common.x86_64 0:1.10.3-46.el7.11 will be updated
   ---> Package docker-common.x86_64 0:1.10.3-46.el7.12 will be an update
   ---> Package docker-latest.x86_64 0:1.12.0-12.el7 will be updated
   ---> Package docker-latest.x86_64 0:1.12.0-13.el7 will be an update
   ---> Package docker-rhel-push-plugin.x86_64 0:1.10.3-46.el7.11 will be updated
   ---> Package docker-rhel-push-plugin.x86_64 0:1.10.3-46.el7.12 will be an update
   ---> Package docker-selinux.x86_64 0:1.10.3-46.el7.11 will be updated
   ---> Package docker-selinux.x86_64 0:1.10.3-46.el7.12 will be an update
   --> Processing Conflict: docker-latest-1.12.0-13.el7.x86_64 conflicts oci-register-machine
   --> Finished Dependency Resolution
   Error: docker-latest conflicts with 1:oci-register-machine-0-1.7.git31bbcd2.el7.x86_64
    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest

Basically: oci-register-machine is already installed, because of docker. Even though the yum-updated docker no longer requires oci-register-machine, it's installed, and yum is not removing it automatically.

Comment 28 Daniel Walsh 2016-08-24 12:40:55 UTC
Yes oci-register-machine should be removed from all dependencies, until we figure out these problems.

Comment 29 Daniel Walsh 2016-08-24 12:43:08 UTC
So oci-register-machine was shipped in previous version of rhel. I think the easiest fix would be to 
obsoletes oci-register-machine
for now, which should cause it to get removed if installed.  We can remove this in a future release.

Comment 30 Lokesh Mandvekar 2016-08-24 14:09:10 UTC
(In reply to Ed Santiago from comment #27)
> No joy with -13 build. I guess Conflicts  doesn't work the way I thought it
> did:
> 
>    # yum update 
>    Loaded plugins: product-id, search-disabled-repos, subscription-manager
>    Resolving Dependencies
>    --> Running transaction check
>    ---> Package docker.x86_64 0:1.10.3-46.el7.11 will be updated
>    ---> Package docker.x86_64 0:1.10.3-46.el7.12 will be an update
>    ---> Package docker-common.x86_64 0:1.10.3-46.el7.11 will be updated
>    ---> Package docker-common.x86_64 0:1.10.3-46.el7.12 will be an update
>    ---> Package docker-latest.x86_64 0:1.12.0-12.el7 will be updated
>    ---> Package docker-latest.x86_64 0:1.12.0-13.el7 will be an update
>    ---> Package docker-rhel-push-plugin.x86_64 0:1.10.3-46.el7.11 will be
> updated
>    ---> Package docker-rhel-push-plugin.x86_64 0:1.10.3-46.el7.12 will be an
> update
>    ---> Package docker-selinux.x86_64 0:1.10.3-46.el7.11 will be updated
>    ---> Package docker-selinux.x86_64 0:1.10.3-46.el7.12 will be an update
>    --> Processing Conflict: docker-latest-1.12.0-13.el7.x86_64 conflicts
> oci-register-machine
>    --> Finished Dependency Resolution
>    Error: docker-latest conflicts with
> 1:oci-register-machine-0-1.7.git31bbcd2.el7.x86_64
>     You could try using --skip-broken to work around the problem
>     You could try running: rpm -Va --nofiles --nodigest
> 
> Basically: oci-register-machine is already installed, because of docker.
> Even though the yum-updated docker no longer requires oci-register-machine,
> it's installed, and yum is not removing it automatically.

Ha yup, we should be adding Obsoletes. I thought you meant to say "let the user deal with removing oci-register-machine if they want".

I'm adding Obsoletes in the next build.

Comment 31 Ed Santiago 2016-08-25 18:34:49 UTC
-14 : not quite right yet. On a fresh virt, yum install docker docker-latest is fine because it doesn't bring in oci-register-machine. On an existing install, though, oci-register-machine is not removed:

    # rpm -qa|grep oci-register
    oci-register-machine-0-1.7.git31bbcd2.el7.x86_64

    # rpm -q --obsoletes docker-latest-1.12.0-14.el7
    oci-register-machine <= 1:0-1.7
    docker-storage-setup <= 0.5-3

Comment 32 Daniel Walsh 2016-08-25 18:37:05 UTC
I want to move this back to assigned.  I want to update the version of oci-register-machine to include a config file which can be used to disable the hook.  Then we can remove this hacking around obsoletes and requires, Then we can turn it back on in a future release and we can continue debugging what is causing the problem.

Comment 33 Ed Santiago 2016-08-25 19:01:56 UTC
SGTM. To make sure I understand correctly:

  1) New build of oci-register-machine
     a) V-R = 0-1.8.el7 I presume?
     b) will the config file be set to disable by default? If not, will there be a prominent release note?

  2) New build of docker
     - Obsoletes: oci-register-machine <= 1:0-1.7
     + Requires: oci-register-machine >= 1:0-1.8
       ** or **
     + Obsoletes: oci-register-machine < 1:0-1.8 ?

     (The first keeps oci-register-machine mandatory; the second removes
     the non-config-file version but allows the config-file version without
     necessarily installing it. Assuming I've understood correctly). Which
     approach do you suggest?

Comment 34 Daniel Walsh 2016-08-26 10:38:16 UTC
Yes it should be set to disabled by default by the oci-register-machine package.

Use Requires not Obsoletes.

We want to keep this package in atomic host, but disable it by default, for now.

Comment 35 Ed Santiago 2016-08-26 11:53:28 UTC
I think there's a typo in the requires: <= should be >= :

   # yum install docker-latest
   ...
   Error: Package: docker-latest-1.12.0-15.el7.x86_64 (Internal-Extras-Repository-2)
              Requires: oci-register-machine <= 1:0-1.8
              Removing: 1:oci-register-machine-0-1.7.git31bbcd2.el7.x86_64 (@rhel-7-server-extras-rpms)
                  oci-register-machine = 1:0-1.7.git31bbcd2.el7
              Updated By: 1:oci-register-machine-0-1.8.gitaf6c129.el7.x86_64 (Internal-Extras-Repository-2)
                  oci-register-machine = 1:0-1.8.gitaf6c129.el7
              Available: oci-register-machine-1.10.3-44.el7.x86_64 (rhel-7-server-extras-rpms)
                  oci-register-machine = 1.10.3-44.el7

Comment 36 Ed Santiago 2016-08-26 13:21:28 UTC
-16 installed fine on update. Now running docker-autotest.

Comment 37 Daniel Walsh 2016-08-26 14:37:22 UTC
cat /etc/oci-register-machine.conf

Comment 38 Ed Santiago 2016-08-26 14:41:42 UTC
   # cat /etc/oci-register-machine.conf    
   # Disable oci-register-machine by setting the disabled field to true
   disabled : true

I also ran 'journalctl -f | grep oci-r' during docker-autotest. No hits. Test run successful, other than already-filed BZs.

Comment 39 Ed Santiago 2016-08-26 18:41:28 UTC
I've run the following tests, all starting with a fresh rhel-7.2-server-x86_64-updated image:

  1) Fresh install of docker-latest

       # [add Internal-Extras-Repository-2 repo]
       # yum install docker docker-latest

  2) Update of older docker-latest

      # yum install docker docker-latest
      # [add Internal-Extras-Repository-2 repo]
      # yum update

  3) Install new docker-latest on system with older docker

      # yum install docker   (brings in older docker & oci-register-machine)
      # [add Internal-Extras-Repository-2 repo]
      # yum install docker-latest   (updates docker, oci-register-machine)

In all cases, docker-autotest ran with only the known exceptions for existing docker-1.12 bugs.

Comment 40 Daniel Walsh 2016-08-26 19:05:39 UTC
And this is with oci-register-machine installed but disabled correct?

Comment 41 Ed Santiago 2016-08-26 19:20:50 UTC
Yes.

To be precise: oci-register-machine is always installed in all cases, it's brought in by docker. In all three situations listed in comment 39, upon completion of yum install or yum update, the file /etc/oci-register-machine.conf exists on the system with the line 'disabled : true' in it. In all cases, after completion of docker-autotest, 'journalctl | grep oci-r' produces no results, suggesting that oci-register-machine is not being run.

Comment 42 Daniel Walsh 2016-08-26 19:29:22 UTC
In truth it is being run long enough to read its config file which says do nothing.

BTW Kick off a case with the disabled: false and see if docker is still failing.

Comment 43 Ed Santiago 2016-08-26 20:23:10 UTC
> BTW Kick off a case with the disabled: false and see if docker is still failing.

Done. It took an hour to fail, but fail it did.

For precision: I did not set disabled: false, I commented out the 'disabled' line. I confirmed that oci-register-machine is being run via journalctl | grep oci-r (many many matches)

Comment 44 Daniel Walsh 2016-08-27 09:23:25 UTC
When it fails, it is the docker daemon that fails correct?  IE dockerd is not longer running, is docker-containerd running?

One last test, if you could look at it would be to take systemd-machined.service and remove the PrivateTmp and PrivateNetwork lines from the service.
systemctl daemon-reload
systemctl restart systemd-machined.service

Now rerun your tests and see if this kills the docker daemon.

Comment 45 Ed Santiago 2016-08-29 13:23:07 UTC
(In reply to Daniel Walsh from comment #44)
> When it fails, it is the docker daemon that fails correct?  IE dockerd is
> not longer running, is docker-containerd running?

After the crash, Neither dockerd-latest nor docker-containerd are running.

> One last test, if you could look at it would be to take
> systemd-machined.service and remove the PrivateTmp and PrivateNetwork lines
> from the service.
> systemctl daemon-reload
> systemctl restart systemd-machined.service
> 
> Now rerun your tests and see if this kills the docker daemon.

First test run completed, with unexpected failures in tests relating to docker start or docker restart. (I'm not planning to investigate those further unless requested; I'll chalk them up to the above changes).

Second run crashed with (what at first glance) looks like the usual crash. dockerd and docker-containerd are down.

Comment 48 Ed Santiago 2016-08-30 23:09:46 UTC
Verified as fixed in docker-latest-1.12.1-2.el7

Comment 50 errata-xmlrpc 2016-09-15 08:29:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1829.html


Note You need to log in before you can comment on or make changes to this bug.