Description of problem: permission denied was reported when backup etcd database by 'docker exec etcd_container etcdctl backup --data-dir=/var/lib/etcd/ --backup-dir=/var/lib/etcd//openshift-backup-etcd_backup_tag20170613061409'.' if the etcd containerized databases is installed on masters, no such issue. Version-Release number of selected component (if applicable): openshift-ansible-3.6.99 How reproducible: always Steps to Reproduce: 1. install OCP with external containerized etcd database 2. run upgrade playbook Actual results: 2017-06-13 06:20:28.692395 I | failed creating backup snapshot dir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409/member/snap: mkdir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409: permission denied fatal: [qe-auto-etcd-3.0613-9xf.qe.rhcloud.com]: FAILED! => { "changed": true, "cmd": [ "docker", "exec", "etcd_container", "etcdctl", "backup", "--data-dir=/var/lib/etcd/", "--backup-dir=/var/lib/etcd//openshift-backup-etcd_backup_tag20170613061409" ], "delta": "0:00:00.157376", "end": "2017-06-13 02:20:28.707119", "failed": true, "rc": 1, "start": "2017-06-13 02:20:28.549743", "warnings": [] } STDERR: 2017-06-13 06:20:28.698878 I | failed creating backup snapshot dir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409/member/snap: mkdir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409: permission denied fatal: [qe-auto-etcd-1.0613-9xf.qe.rhcloud.com]: FAILED! => { "changed": true, "cmd": [ "docker", "exec", "etcd_container", "etcdctl", "backup", "--data-dir=/var/lib/etcd/", "--backup-dir=/var/lib/etcd//openshift-backup-etcd_backup_tag20170613061409" ], "delta": "0:00:00.126676", "end": "2017-06-13 02:20:28.703062", "failed": true, "rc": 1, "start": "2017-06-13 02:20:28.576386", "warnings": [] } STDERR: 2017-06-13 06:20:28.696791 I | failed creating backup snapshot dir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409/member/snap: mkdir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409: permission denied Expected results: Additional info:
Created attachment 1287194 [details] The inventory and upgrade logs Notes: the instance had been deleted.
The etcd migrade/upgrade with dedicated containerized etcd are blocked.
can you let us know if there are any selinux denials? ausearch -m AVC
[root@container--2 ~]# ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today ---- type=USER_AVC msg=audit(06/22/2017 06:27:21.592:934) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc: received policyload notice (seqno=2) exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?' ---- type=USER_AVC msg=audit(06/22/2017 06:27:21.592:935) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc: received policyload notice (seqno=3) exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?' ---- type=USER_AVC msg=audit(06/22/2017 06:29:22.142:4302) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc: denied { disable } for auid=cloud-user uid=root gid=root cmdline="/bin/systemctl mask etcd" scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:system_r:init_t:s0 tclass=service exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?' ---- type=SYSCALL msg=audit(06/22/2017 08:34:05.553:6707) : arch=x86_64 syscall=mkdirat success=no exit=EACCES(Permission denied) a0=0xffffffffffffff9c a1=0xc4201d1d40 a2=0700 a3=0x0 items=0 ppid=5637 pid=5651 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=etcdctl exe=/usr/bin/etcdctl subj=system_u:system_r:svirt_lxc_net_t:s0:c169,c201 key=(null) type=AVC msg=audit(06/22/2017 08:34:05.553:6707) : avc: denied { write } for pid=5651 comm=etcdctl name=etcd dev="dm-0" ino=848 scontext=system_u:system_r:svirt_lxc_net_t:s0:c169,c201 tcontext=system_u:object_r:var_lib_t:s0 tclass=dir
We either need etcd and etcdctl containers to run as etcd_t (like they would from an rpm install) or we need to make sure the directory where their files live are labeled in such a way that the etcd container can make these changes... Either way, I do think the fix is in the playbook... I'm sure dan walsh could help talk through either option. I tend to think running the container as etcd_t is the best idea, but I don't know what takes care of the labeling for /var/lib/etcd/ in a containerized install.
Upstream PR: https://github.com/openshift/openshift-ansible/pull/4653 Tested, running the etcd_container with the label solves the problem. Thanks Eric!!!
@Jan, the backup succeed when the etcd is running using label --security-opt label=type:spc_t. The concern is to ensure the containerized etcd is running with these labels before upgrade.
Once the etcd_container.service is deployed we can not simply replace it with a new one since it can be modified by a user. This is something that needs to be documented in 3.5 -> 3.6 OCP upgrade and done manually by an operator. Unfortunately, this affects all 3.5 -> 3.6 containerized upgrades. I will extend the pre-upgrade vefication to check if the etcd_container container is running with the proper label.
Upstream PR for the check: https://github.com/openshift/openshift-ansible/pull/4665
What is the SELinux label of /var/lib/etcd when this is failing (ls -lZ /var/lib/etcd/)? I am not sure how this can happen as we have :z for /var/lib/etcd and Docker relabels it when the container starts. How/when is it then changed?
[root@container--2 ~]# ls -laZ /var/lib/etcd drwxr-xr-x. etcd etcd system_u:object_r:var_lib_t:s0 . drwxr-xr-x. root root system_u:object_r:var_lib_t:s0 .. drwx------. root root system_u:object_r:svirt_sandbox_file_t:s0 member
Waiting PR pull/4665
that looks like the wrong label: /var/lib/etcd should have the label "system_u:object_r:svirt_sandbox_file_t:s0" not "system_u:object_r:var_lib_t:s0", so that the etcd container can write there. IMHO, it will be better to ensure /var/lib/etcd has the proper label instead of giving etcd more privileged when not required (with spc_t). Does a "systemctl restart etcd_container" change the label for "/var/lib/etcd"? Docker should relabel /var/lib/etcd when the container starts and the bind mount is created. If possible could you verify if this works? chcon -R system_u:object_r:svirt_sandbox_file_t:s0 /var/lib/etcd/ docker exec etcd_container etcdctl backup --data-dir=/var/lib/etcd/ --backup-dir=/var/lib/etcd/openshift-backup-etcd_backup_foo
@Giuseppe, If the etcd_container is on master. The initial label is system_u:object_r:svirt_sandbox_file_t:s0. drwx------. etcd etcd system_u:object_r:svirt_sandbox_file_t:s0 etcd If the etcd_container is on dedicated host. The initial label is system_u:object_r:var_lib_t:s0 [root@container--2 ~]# ls -laZ /var/lib/etcd/ drwxr-xr-x. etcd etcd system_u:object_r:var_lib_t:s0 . drwxr-xr-x. root root system_u:object_r:var_lib_t:s0 .. drwx------. root root system_u:object_r:svirt_sandbox_file_t:s0 member The backup succeed once I labelled /var/lib/etcd with system_u:object_r:svirt_sandbox_file_t:s0
The backup succeed too when the label is system_u:object_r:var_lib_t:s0
@Anping, does the backup succeed as well when etcd runs as spc_t? If we don't specify spc_t to etcd wouldn't the backup fail when /var/lib/etcd is "system_u:object_r:var_lib_t:s0"?
I don't think the etcd is running as as spc_t. ExecStart=/usr/bin/docker run --name etcd_container --rm -v /var/lib/etcd/:/var/lib/etcd/:z -v /etc/etcd:/etc/etcd:ro --env-file=/etc/etcd/etcd.conf --net=host --entrypoint=/usr/bin/etcd registry.access.redhat.com/rhel7/etcd [root@container--2 ~]# docker inspect f087fabe4e34 |grep ProcessLabel "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c292,c680", This ENV isn't exactly that hit this issue. I will prepare another Env and update the comment later
Upstream PR fixing the original issue: https://github.com/openshift/openshift-ansible/pull/4674 Removing the spc_t label as it is no longer needed. Still, wondering why it has showed up now. The etcdcetl dropping task is present since 3.4 deployment. Before running the upgrade we will need to check if the /var/lib/etcd is properly labelled and re-label it if not.
@Anping, Jan could reproduce the issue and https://github.com/openshift/openshift-ansible/pull/4674 fix the cause of having the wrong label for /var/lib/etcd. Could you try with that change?
More changes merged.
@Jan, scott, Giuseppe The fix won't take affect to upgrade. For this issue is only exit on docker-1.12.6-28.git1398f24.el7.x86_64. So I set the Severity to medium.
With https://github.com/openshift/openshift-ansible/pull/4680 merged, the etcd working directory is now re-labeled with svirt_sandbox_file_t before the docker exec etcdctl backup command is run.
@Jan. The fix work well, the etcd data can be backed during upgrade. and the etcd data_dir is labeled as system_u:object_r:svirt_sandbox_file_t:s0 by install playbook
Moving back to ON_QA, should be fixed in openshift-ansible-3.6.137-1
Pass on openshift-ansible-3.6.139
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716