Bug 1966968
Summary: | Updating docker package to 1.13.1-206.git7d71120.el7_9.x86_64 breaks host OS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Takashi Kajinami <tkajinam> |
Component: | docker | Assignee: | Jindrich Novy <jnovy> |
Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.9 | CC: | ajia, amurdaca, bhubbard, ddarrah, dornelas, dwalsh, fsayyed, hyunpark, jbiao, jnovy, jpretori, kir, knoha, lars, lbezdick, lsm5, mori, pthomas, qguo, rbarrott, rmanes, sathlang, tkimura, tkubota, tsweeney, vcojot, wwurzbac, yuokada |
Target Milestone: | rc | Keywords: | Extras, Regression, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | docker-1.13.1-208.git7d71120.el7_9 or newer | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-07 15:36:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1186913, 1945588 |
Description
Takashi Kajinami
2021-06-02 09:19:23 UTC
Looking at the current update steps in RHOSP, TripleO(Deployment tool) is supposed stop docker daemon before updating docker package. (which means the situation is a little bit different from the described steps to reproduce) We tried the same steps by hands but couldn't reproduce the issue so far. Hi, just FYI we're seeing the same problem, during an minor OSP13 update started on May31st, we managed to update controllers, ceph nodes and most computenodes. Two failed because of the OOM-killer. The next day we re-ran these two, failed again, now with neither nova, neutron, ssh or console login working. Reboot (actually reset) not helpful. I guess we were lucky to start this update when we did or maybe we would've bombed the entire (prod) cluster... After some non-trivial single user repairs we managed to downgrade to 205 to which re-enabled the mentioned features (and we're continuing the minor update). I would be interested in understanding how updating an 'extras' RPM like docker (and only a patch/build update of it at that) manages to disturb both sshd and console logins? It seems those features should be better isolated / stable than this, which is why I'm curious. Thanks, Håkan Hi Håkan,
Thank you for your information.
I reproduced the same issue in our local env(that's why I initially reported the bug) but
what I observed there was that a host OS lost access to critical files under /dev or /proc
after docker is updated to the current latest (-206).
It is unlikely that any process on the node can survive without these files, and I'm confident
that sshd or console login should be affected and become unfunctional in such case.
> I would be interested in understanding how updating an 'extras' RPM like docker
> (and only a patch/build update of it at that) manages to disturb both sshd and console logins?
I understand you expectation for "extras" RPMs, but unfortunately some tests indicate that
updating docker to the current latest (-206) causes an issue with vital files in RHOSP hosts.
My personal guess is that specific usage in RHOSP is results in the issue when combined
with the latest CVE fix in docker.
There are several settings like priviledged flag, bind mounts and so on we use in RHOSP
to allow applications in containers to have access to host resources, and it might be the factor.
Please keep in mind that this issue is still under investigation and the actual mechanism
is not yet revealed, so my guess can be completely wrong.
I think I found the issue -- the original CVE fix patch had this hunk: @@ -681,10 +692,6 @@ func pivotRoot(rootfs string) error { // Make oldroot rprivate to make sure our unmounts don't propagate to the // host (and thus bork the machine). - if err := syscall.Mount("", ".", "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil { - return err - } - // Preform the unmount. MNT_DETACH allows us to unmount /proc/self/cwd. if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil { return err } Surely, with this code absent the mounts are propagated to the host, exactly as described in the comment. Lukas is testing the updated patch (which I will attach). I think we need to remove docker-1.13.1-206.git7d71120 from the repo(s). The updated CVE fix is here: https://github.com/projectatomic/runc/pull/54 Plain text patch: https://github.com/projectatomic/runc/commit/2a572d2d825735493a981b03b48877940ea2c17a.patch Manual build with the patch and reboot - the node seems to be working fine now. Do we know what the trigger is? I can't reproduce # yum update Loaded plugins: product-id, search-disabled-repos, subscription-manager rhel-7-server-extras-rpms | 3.4 kB 00:00:00 rhel-7-server-rh-common-rpms | 3.8 kB 00:00:00 rhel-7-server-rpms | 3.5 kB 00:00:00 (1/4): rhel-7-server-extras-rpms/x86_64/primary_db | 673 kB 00:00:00 (2/4): rhel-7-server-extras-rpms/x86_64/updateinfo | 244 kB 00:00:00 (3/4): rhel-7-server-rpms/7Server/x86_64/updateinfo | 4.0 MB 00:00:00 (4/4): rhel-7-server-rpms/7Server/x86_64/primary_db | 81 MB 00:00:05 Resolving Dependencies --> Running transaction check ---> Package docker.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated ---> Package docker.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update ---> Package docker-client.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated ---> Package docker-client.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update ---> Package docker-common.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated ---> Package docker-common.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update ---> Package docker-rhel-push-plugin.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated ---> Package docker-rhel-push-plugin.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================================================================ Package Arch Version Repository Size ================================================================================================================================================================ Updating: docker x86_64 2:1.13.1-206.git7d71120.el7_9 rhel-7-server-extras-rpms 17 M docker-client x86_64 2:1.13.1-206.git7d71120.el7_9 rhel-7-server-extras-rpms 3.9 M docker-common x86_64 2:1.13.1-206.git7d71120.el7_9 rhel-7-server-extras-rpms 100 k docker-rhel-push-plugin x86_64 2:1.13.1-206.git7d71120.el7_9 rhel-7-server-extras-rpms 2.0 M Transaction Summary ================================================================================================================================================================ Upgrade 4 Packages Total download size: 23 M Is this ok [y/d/N]: y Downloading packages: Delta RPMs disabled because /usr/bin/applydeltarpm not installed. (1/4): docker-client-1.13.1-206.git7d71120.el7_9.x86_64.rpm | 3.9 MB 00:00:00 (2/4): docker-common-1.13.1-206.git7d71120.el7_9.x86_64.rpm | 100 kB 00:00:00 (3/4): docker-1.13.1-206.git7d71120.el7_9.x86_64.rpm | 17 MB 00:00:01 (4/4): docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64.rpm | 2.0 MB 00:00:00 ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 13 MB/s | 23 MB 00:00:01 Running transaction check Running transaction test Transaction test succeeded Running transaction Updating : 2:docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64 1/8 Updating : 2:docker-common-1.13.1-206.git7d71120.el7_9.x86_64 2/8 Updating : 2:docker-client-1.13.1-206.git7d71120.el7_9.x86_64 3/8 Updating : 2:docker-1.13.1-206.git7d71120.el7_9.x86_64 4/8 Cleanup : 2:docker-1.13.1-203.git0be3e21.el7_9.x86_64 5/8 Cleanup : 2:docker-client-1.13.1-203.git0be3e21.el7_9.x86_64 6/8 Cleanup : 2:docker-common-1.13.1-203.git0be3e21.el7_9.x86_64 7/8 Cleanup : 2:docker-rhel-push-plugin-1.13.1-203.git0be3e21.el7_9.x86_64 8/8 Verifying : 2:docker-1.13.1-206.git7d71120.el7_9.x86_64 1/8 Verifying : 2:docker-client-1.13.1-206.git7d71120.el7_9.x86_64 2/8 Verifying : 2:docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64 3/8 Verifying : 2:docker-common-1.13.1-206.git7d71120.el7_9.x86_64 4/8 Verifying : 2:docker-1.13.1-203.git0be3e21.el7_9.x86_64 5/8 Verifying : 2:docker-common-1.13.1-203.git0be3e21.el7_9.x86_64 6/8 Verifying : 2:docker-rhel-push-plugin-1.13.1-203.git0be3e21.el7_9.x86_64 7/8 Verifying : 2:docker-client-1.13.1-203.git0be3e21.el7_9.x86_64 8/8 Updated: docker.x86_64 2:1.13.1-206.git7d71120.el7_9 docker-client.x86_64 2:1.13.1-206.git7d71120.el7_9 docker-common.x86_64 2:1.13.1-206.git7d71120.el7_9 docker-rhel-push-plugin.x86_64 2:1.13.1-206.git7d71120.el7_9 Complete! # reboot # uptime 18:05:42 up 3 min, 1 user, load average: 0.08, 0.16, 0.08 # ls -ld /proc/ /sys/ dr-xr-xr-x. 141 root root 0 Jun 3 18:02 /proc/ dr-xr-xr-x. 13 root root 0 Jun 3 18:02 /sys/ # mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=977516k,nr_inodes=244379,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/mapper/rhel_docker1-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13150) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) debugfs on /sys/kernel/debug type debugfs (rw,relatime) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) /dev/sda1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sdb1 on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/mapper/rhel_docker1-root on /var/lib/docker/containers type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/mapper/rhel_docker1-root on /var/lib/docker/devicemapper type xfs (rw,relatime,seclabel,attr2,inode64,noquota) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=197864k,mode=700) # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 832512fde036 ubi7 "bash" 7 months ago Created test I think trying to start/stop a container should be sufficient to repro. Hi, thanks to Lukas and Kir, we have a new version of the cve patch[1] As far as I can tell the difference is that those four lines are not removed anymore: @@ -681,10 +692,6 @@ func pivotRoot(rootfs string) error { // Make oldroot rprivate to make sure our unmounts don't propagate to the // host (and thus bork the machine). - if err := syscall.Mount("", ".", "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil { - return err - } - // Preform the unmount. MNT_DETACH allows us to unmount /proc/self/cwd. if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil { return err } [1] see comments 23 and 24. This bug has been verified on docker-1.13.1-208.git7d71120.el7_9. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (docker bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2276 |