Bug 1966968

Summary: Updating docker package to 1.13.1-206.git7d71120.el7_9.x86_64 breaks host OS
Product: Red Hat Enterprise Linux 7 Reporter: Takashi Kajinami <tkajinam>
Component: dockerAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.9CC: ajia, amurdaca, bhubbard, ddarrah, dornelas, dwalsh, fsayyed, hyunpark, jbiao, jnovy, jpretori, kir, knoha, lars, lbezdick, lsm5, mori, pthomas, qguo, rbarrott, rmanes, sathlang, tkimura, tkubota, tsweeney, vcojot, wwurzbac, yuokada
Target Milestone: rcKeywords: Extras, Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.13.1-208.git7d71120.el7_9 or newer Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-07 15:36:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1945588    

Description Takashi Kajinami 2021-06-02 09:19:23 UTC
Description of problem:

While we tested minor update of RHOSP13, we noticed that update fails just after updating docker package
and host OS becomes no longer functional.

docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64
docker-common-1.13.1-206.git7d71120.el7_9.x86_64
docker-client-1.13.1-206.git7d71120.el7_9.x86_64
docker-1.13.1-206.git7d71120.el7_9.x86_64    

We found that the problem can be reproduced by the simple steps explained below, and it seems
the recent change in docker causes this problem.
(We didn't hit this issue before the lastest 206 was released)

We found that some vital filesystems are not accessible.

~~~
[root@controller-0 log]# ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
[root@controller-0 log]# ls /dev
null
[root@controller-0 log]# ls /proc
[root@controller-0 log]# ls /sys
[root@controller-0 log]# 
[root@controller-0 log]# mount
mount: failed to read mtab: No such file or directory 
~~~

Version-Release number of selected component (if applicable):
docker-1.13.1-206.git7d71120.el7_9.x86_64
kernel-3.10.0-1160.15.2.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. stop all docker containers
  # pcs cluster stop (This is because some containers are managed by pacemaker)
  # docker ps --format {{.ID}} | xargs docker stop
  (but keeps docker daemon running)

2. update docker packages
  # yum update -y docker


Actual results:
Package update completes but host OS becomes unfunctional

Expected results:
Package update completes and host OS stays functional

Additional info:

Comment 2 Takashi Kajinami 2021-06-02 09:21:21 UTC
Looking at the current update steps in RHOSP, TripleO(Deployment tool) is supposed stop docker daemon
before updating docker package. (which means the situation is a little bit different from the described
steps to reproduce)

We tried the same steps by hands but couldn't reproduce the issue so far.

Comment 12 Håkan Olsson 2021-06-03 09:37:22 UTC
Hi,

just FYI we're seeing the same problem, during an minor OSP13 update started on May31st, we managed to update controllers, ceph nodes and most computenodes. Two failed because of the OOM-killer.

The next day we re-ran these two, failed again, now with neither nova, neutron, ssh or console login working. Reboot (actually reset) not helpful.   I guess we were lucky to start this update when we did or maybe we would've bombed the entire (prod) cluster...

After some non-trivial single user repairs we managed to downgrade to 205 to which re-enabled the mentioned features (and we're continuing the minor update).


I would be interested in understanding how updating an 'extras' RPM like docker (and only a patch/build update of it at that) manages to disturb both sshd and console logins?  It seems those features should be better isolated / stable than this, which is why I'm curious.

Thanks,
  Håkan

Comment 14 Takashi Kajinami 2021-06-03 13:45:59 UTC
Hi Håkan,

Thank you for your information.

I reproduced the same issue in our local env(that's why I initially reported the bug) but
what I observed there was that a host OS lost access to critical files under /dev or /proc
after docker is updated to the current latest (-206).
It is unlikely that any process on the node can survive without these files, and I'm confident
that sshd or console login should be affected and become unfunctional in such case.


> I would be interested in understanding how updating an 'extras' RPM like docker
> (and only a patch/build update of it at that) manages to disturb both sshd and console logins? 

I understand you expectation for "extras" RPMs, but unfortunately some tests indicate that
updating docker to the current latest (-206) causes an issue with vital files in RHOSP hosts.

My personal guess is that specific usage in RHOSP is results in the issue when combined
with the latest CVE fix in docker.
There are several settings like priviledged flag, bind mounts and so on we use in RHOSP
to allow applications in containers to have access to host resources, and it might be the factor.

Please keep in mind that this issue is still under investigation and the actual mechanism
is not yet revealed, so my guess can be completely wrong.

Comment 22 Kir Kolyshkin 2021-06-03 21:29:55 UTC
I think I found the issue -- the original CVE fix patch had this hunk:

@@ -681,10 +692,6 @@ func pivotRoot(rootfs string) error {
 
        // Make oldroot rprivate to make sure our unmounts don't propagate to the
        // host (and thus bork the machine).
-       if err := syscall.Mount("", ".", "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil {
-               return err
-       }
-       // Preform the unmount. MNT_DETACH allows us to unmount /proc/self/cwd.
        if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil {
                return err
        }

Surely, with this code absent the mounts are propagated to the host, exactly
as described in the comment.

Lukas is testing the updated patch (which I will attach).

I think we need to remove docker-1.13.1-206.git7d71120 from the repo(s).

Comment 23 Kir Kolyshkin 2021-06-03 21:34:59 UTC
The updated CVE fix is here: https://github.com/projectatomic/runc/pull/54

Comment 25 Lukas Bezdicka 2021-06-03 21:59:41 UTC
Manual build with the patch and reboot - the node seems to be working fine now.

Comment 26 Derrick Ornelas 2021-06-03 22:08:11 UTC
Do we know what the trigger is? I can't reproduce


# yum update 
Loaded plugins: product-id, search-disabled-repos, subscription-manager
rhel-7-server-extras-rpms                                                                                                                | 3.4 kB  00:00:00     
rhel-7-server-rh-common-rpms                                                                                                             | 3.8 kB  00:00:00     
rhel-7-server-rpms                                                                                                                       | 3.5 kB  00:00:00     
(1/4): rhel-7-server-extras-rpms/x86_64/primary_db                                                                                       | 673 kB  00:00:00     
(2/4): rhel-7-server-extras-rpms/x86_64/updateinfo                                                                                       | 244 kB  00:00:00     
(3/4): rhel-7-server-rpms/7Server/x86_64/updateinfo                                                                                      | 4.0 MB  00:00:00     
(4/4): rhel-7-server-rpms/7Server/x86_64/primary_db                                                                                      |  81 MB  00:00:05     
Resolving Dependencies
--> Running transaction check
---> Package docker.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated
---> Package docker.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update
---> Package docker-client.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated
---> Package docker-client.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update
---> Package docker-common.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated
---> Package docker-common.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update
---> Package docker-rhel-push-plugin.x86_64 2:1.13.1-203.git0be3e21.el7_9 will be updated
---> Package docker-rhel-push-plugin.x86_64 2:1.13.1-206.git7d71120.el7_9 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================
 Package                                 Arch                   Version                                         Repository                                 Size
================================================================================================================================================================
Updating:
 docker                                  x86_64                 2:1.13.1-206.git7d71120.el7_9                   rhel-7-server-extras-rpms                  17 M
 docker-client                           x86_64                 2:1.13.1-206.git7d71120.el7_9                   rhel-7-server-extras-rpms                 3.9 M
 docker-common                           x86_64                 2:1.13.1-206.git7d71120.el7_9                   rhel-7-server-extras-rpms                 100 k
 docker-rhel-push-plugin                 x86_64                 2:1.13.1-206.git7d71120.el7_9                   rhel-7-server-extras-rpms                 2.0 M

Transaction Summary
================================================================================================================================================================
Upgrade  4 Packages

Total download size: 23 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
(1/4): docker-client-1.13.1-206.git7d71120.el7_9.x86_64.rpm                                                                              | 3.9 MB  00:00:00     
(2/4): docker-common-1.13.1-206.git7d71120.el7_9.x86_64.rpm                                                                              | 100 kB  00:00:00     
(3/4): docker-1.13.1-206.git7d71120.el7_9.x86_64.rpm                                                                                     |  17 MB  00:00:01     
(4/4): docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64.rpm                                                                    | 2.0 MB  00:00:00     
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                            13 MB/s |  23 MB  00:00:01     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : 2:docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64                                                                                 1/8 
  Updating   : 2:docker-common-1.13.1-206.git7d71120.el7_9.x86_64                                                                                           2/8 
  Updating   : 2:docker-client-1.13.1-206.git7d71120.el7_9.x86_64                                                                                           3/8 
  Updating   : 2:docker-1.13.1-206.git7d71120.el7_9.x86_64                                                                                                  4/8 
  Cleanup    : 2:docker-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                                  5/8 
  Cleanup    : 2:docker-client-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                           6/8 
  Cleanup    : 2:docker-common-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                           7/8 
  Cleanup    : 2:docker-rhel-push-plugin-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                 8/8 
  Verifying  : 2:docker-1.13.1-206.git7d71120.el7_9.x86_64                                                                                                  1/8 
  Verifying  : 2:docker-client-1.13.1-206.git7d71120.el7_9.x86_64                                                                                           2/8 
  Verifying  : 2:docker-rhel-push-plugin-1.13.1-206.git7d71120.el7_9.x86_64                                                                                 3/8 
  Verifying  : 2:docker-common-1.13.1-206.git7d71120.el7_9.x86_64                                                                                           4/8 
  Verifying  : 2:docker-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                                  5/8 
  Verifying  : 2:docker-common-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                           6/8 
  Verifying  : 2:docker-rhel-push-plugin-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                 7/8 
  Verifying  : 2:docker-client-1.13.1-203.git0be3e21.el7_9.x86_64                                                                                           8/8 

Updated:
  docker.x86_64 2:1.13.1-206.git7d71120.el7_9                               docker-client.x86_64 2:1.13.1-206.git7d71120.el7_9                                 
  docker-common.x86_64 2:1.13.1-206.git7d71120.el7_9                        docker-rhel-push-plugin.x86_64 2:1.13.1-206.git7d71120.el7_9                       

Complete!


# reboot

# uptime
 18:05:42 up 3 min,  1 user,  load average: 0.08, 0.16, 0.08

# ls -ld /proc/ /sys/
dr-xr-xr-x. 141 root root 0 Jun  3 18:02 /proc/
dr-xr-xr-x.  13 root root 0 Jun  3 18:02 /sys/


# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=977516k,nr_inodes=244379,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/rhel_docker1-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13150)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
/dev/sda1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/mapper/rhel_docker1-root on /var/lib/docker/containers type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/mapper/rhel_docker1-root on /var/lib/docker/devicemapper type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=197864k,mode=700)


# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
832512fde036        ubi7                "bash"              7 months ago        Created                                 test

Comment 27 Kir Kolyshkin 2021-06-03 22:14:09 UTC
I think trying to start/stop a container should be sufficient to repro.

Comment 33 Sofer Athlan-Guyot 2021-06-04 09:00:15 UTC
Hi,

thanks to Lukas and Kir, we have a new version of the cve patch[1]

As far as I can tell the difference is that those four lines are not removed anymore:

    @@ -681,10 +692,6 @@ func pivotRoot(rootfs string) error {
    
           // Make oldroot rprivate to make sure our unmounts don't propagate to the
           // host (and thus bork the machine).
    -      if err := syscall.Mount("", ".", "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil {
    -              return err
    -      }
    -      // Preform the unmount. MNT_DETACH allows us to unmount /proc/self/cwd.
           if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil {
                   return err
           }

[1] see comments 23 and 24.

Comment 49 Alex Jia 2021-06-07 09:35:46 UTC
This bug has been verified on docker-1.13.1-208.git7d71120.el7_9.

Comment 51 errata-xmlrpc 2021-06-07 15:36:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (docker bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2276