1460959 – Failed to backup external containerized etcd database

Bug 1460959 - Failed to backup external containerized etcd database

Summary: Failed to backup external containerized etcd database

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jan Chaloupka
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-13 09:07 UTC by Anping Li
Modified:	2017-08-16 19:51 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-10 05:26:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The inventory and upgrade logs (140.00 KB, application/x-tar) 2017-06-13 09:13 UTC, Anping Li	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1716	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.6 RPM Release Advisory	2017-08-10 09:02:50 UTC

Description Anping Li 2017-06-13 09:07:11 UTC

Description of problem:
permission denied was reported when backup etcd database by 'docker exec etcd_container etcdctl backup --data-dir=/var/lib/etcd/ --backup-dir=/var/lib/etcd//openshift-backup-etcd_backup_tag20170613061409'.'

if the etcd containerized databases is installed on masters, no such issue.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.99

How reproducible:
always

Steps to Reproduce:
1. install OCP with external containerized etcd database
2. run upgrade playbook 

Actual results:
2017-06-13 06:20:28.692395 I | failed creating backup snapshot dir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409/member/snap: mkdir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409: permission denied
fatal: [qe-auto-etcd-3.0613-9xf.qe.rhcloud.com]: FAILED! => {
    "changed": true, 
    "cmd": [
        "docker", 
        "exec", 
        "etcd_container", 
        "etcdctl", 
        "backup", 
        "--data-dir=/var/lib/etcd/", 
        "--backup-dir=/var/lib/etcd//openshift-backup-etcd_backup_tag20170613061409"
    ], 
    "delta": "0:00:00.157376", 
    "end": "2017-06-13 02:20:28.707119", 
    "failed": true, 
    "rc": 1, 
    "start": "2017-06-13 02:20:28.549743", 
    "warnings": []
}

STDERR:

2017-06-13 06:20:28.698878 I | failed creating backup snapshot dir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409/member/snap: mkdir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409: permission denied
fatal: [qe-auto-etcd-1.0613-9xf.qe.rhcloud.com]: FAILED! => {
    "changed": true, 
    "cmd": [
        "docker", 
        "exec", 
        "etcd_container", 
        "etcdctl", 
        "backup", 
        "--data-dir=/var/lib/etcd/", 
        "--backup-dir=/var/lib/etcd//openshift-backup-etcd_backup_tag20170613061409"
    ], 
    "delta": "0:00:00.126676", 
    "end": "2017-06-13 02:20:28.703062", 
    "failed": true, 
    "rc": 1, 
    "start": "2017-06-13 02:20:28.576386", 
    "warnings": []
}

STDERR:

2017-06-13 06:20:28.696791 I | failed creating backup snapshot dir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409/member/snap: mkdir /var/lib/etcd/openshift-backup-etcd_backup_tag20170613061409: permission denied


Expected results:


Additional info:

Comment 1 Anping Li 2017-06-13 09:13:19 UTC

Created attachment 1287194 [details]
The inventory and upgrade logs

Notes: the instance had been deleted.

Comment 2 Anping Li 2017-06-21 02:14:25 UTC

The etcd migrade/upgrade with dedicated containerized etcd  are blocked.

Comment 4 Eric Paris 2017-06-21 18:30:10 UTC

can you let us know if there are any selinux denials? 

ausearch -m AVC

Comment 5 Anping Li 2017-06-22 12:41:32 UTC

[root@container--2 ~]# ausearch -m avc -m user_avc -m selinux_err -m user_selinux_err -i -ts today 
----
type=USER_AVC msg=audit(06/22/2017 06:27:21.592:934) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc:  received policyload notice (seqno=2)  exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?' 
----
type=USER_AVC msg=audit(06/22/2017 06:27:21.592:935) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc:  received policyload notice (seqno=3)  exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?' 
----
type=USER_AVC msg=audit(06/22/2017 06:29:22.142:4302) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc:  denied  { disable } for auid=cloud-user uid=root gid=root cmdline="/bin/systemctl mask etcd" scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:system_r:init_t:s0 tclass=service  exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?' 
----
type=SYSCALL msg=audit(06/22/2017 08:34:05.553:6707) : arch=x86_64 syscall=mkdirat success=no exit=EACCES(Permission denied) a0=0xffffffffffffff9c a1=0xc4201d1d40 a2=0700 a3=0x0 items=0 ppid=5637 pid=5651 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=etcdctl exe=/usr/bin/etcdctl subj=system_u:system_r:svirt_lxc_net_t:s0:c169,c201 key=(null) 
type=AVC msg=audit(06/22/2017 08:34:05.553:6707) : avc:  denied  { write } for  pid=5651 comm=etcdctl name=etcd dev="dm-0" ino=848 scontext=system_u:system_r:svirt_lxc_net_t:s0:c169,c201 tcontext=system_u:object_r:var_lib_t:s0 tclass=dir

Comment 6 Eric Paris 2017-06-22 19:39:37 UTC

We either need etcd and etcdctl containers to run as etcd_t (like they would from an rpm install) or we need to make sure the directory where their files live are labeled in such a way that the etcd container can make these changes...

Either way, I do think the fix is in the playbook...

I'm sure dan walsh could help talk through either option. I tend to think running the container as etcd_t is the best idea, but I don't know what takes care of the labeling for /var/lib/etcd/ in a containerized install.

Comment 14 Jan Chaloupka 2017-06-30 11:20:58 UTC

Upstream PR: https://github.com/openshift/openshift-ansible/pull/4653

Tested, running the etcd_container with the label solves the problem. Thanks Eric!!!

Comment 15 Anping Li 2017-07-03 06:16:21 UTC

@Jan, the backup succeed when the etcd is running using label --security-opt label=type:spc_t. The concern is to ensure the containerized etcd is running with these labels before upgrade.

Comment 17 Jan Chaloupka 2017-07-03 12:12:05 UTC

Once the etcd_container.service is deployed we can not simply replace it with a new one since it can be modified by a user. This is something that needs to be documented in 3.5 -> 3.6 OCP upgrade and done manually by an operator.

Unfortunately, this affects all 3.5 -> 3.6 containerized upgrades.

I will extend the pre-upgrade vefication to check if the etcd_container container is running with the proper label.

Comment 18 Jan Chaloupka 2017-07-03 12:41:13 UTC

Upstream PR for the check: https://github.com/openshift/openshift-ansible/pull/4665

Comment 19 Giuseppe Scrivano 2017-07-03 16:13:18 UTC

What is the SELinux label of /var/lib/etcd when this is failing (ls -lZ /var/lib/etcd/)?

I am not sure how this can happen as we have :z for /var/lib/etcd and Docker relabels it when the container starts.  How/when is it then changed?

Comment 20 Anping Li 2017-07-04 04:31:49 UTC

[root@container--2 ~]# ls -laZ /var/lib/etcd
drwxr-xr-x. etcd etcd system_u:object_r:var_lib_t:s0   .
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   ..
drwx------. root root system_u:object_r:svirt_sandbox_file_t:s0 member

Comment 21 Anping Li 2017-07-04 04:51:19 UTC

Waiting PR pull/4665

Comment 22 Giuseppe Scrivano 2017-07-04 08:04:18 UTC

that looks like the wrong label: /var/lib/etcd should have the label "system_u:object_r:svirt_sandbox_file_t:s0" not "system_u:object_r:var_lib_t:s0", so that the etcd container can write there.

IMHO, it will be better to ensure /var/lib/etcd has the proper label instead of giving etcd more privileged when not required (with spc_t).

Does a "systemctl restart etcd_container" change the label for "/var/lib/etcd"?  Docker should relabel /var/lib/etcd when the container starts and the bind mount is created.

If possible could you verify if this works?

chcon -R system_u:object_r:svirt_sandbox_file_t:s0 /var/lib/etcd/
docker exec etcd_container etcdctl backup --data-dir=/var/lib/etcd/ --backup-dir=/var/lib/etcd/openshift-backup-etcd_backup_foo

Comment 23 Anping Li 2017-07-04 10:24:03 UTC

@Giuseppe, 
If the etcd_container is on master.
The initial label is  system_u:object_r:svirt_sandbox_file_t:s0. 
drwx------. etcd    etcd    system_u:object_r:svirt_sandbox_file_t:s0 etcd


If the etcd_container is on dedicated host.
The initial label is system_u:object_r:var_lib_t:s0 

[root@container--2 ~]# ls -laZ /var/lib/etcd/
drwxr-xr-x. etcd etcd system_u:object_r:var_lib_t:s0   .
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   ..
drwx------. root root system_u:object_r:svirt_sandbox_file_t:s0 member

The backup succeed once I labelled /var/lib/etcd with system_u:object_r:svirt_sandbox_file_t:s0

Comment 24 Anping Li 2017-07-04 10:31:27 UTC

The backup succeed too when the label is system_u:object_r:var_lib_t:s0

Comment 25 Giuseppe Scrivano 2017-07-04 11:03:29 UTC

@Anping, does the backup succeed as well when etcd runs as spc_t?

If we don't specify spc_t to etcd wouldn't the backup fail when /var/lib/etcd is "system_u:object_r:var_lib_t:s0"?

Comment 26 Anping Li 2017-07-04 11:20:14 UTC

I don't think the etcd is running as as spc_t.

ExecStart=/usr/bin/docker run --name etcd_container --rm -v /var/lib/etcd/:/var/lib/etcd/:z -v /etc/etcd:/etc/etcd:ro --env-file=/etc/etcd/etcd.conf --net=host --entrypoint=/usr/bin/etcd registry.access.redhat.com/rhel7/etcd


[root@container--2 ~]# docker inspect f087fabe4e34 |grep ProcessLabel
        "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c292,c680",


This ENV isn't exactly that hit this issue. I will prepare another Env and update the comment later

Comment 27 Jan Chaloupka 2017-07-04 15:26:50 UTC

Upstream PR fixing the original issue: https://github.com/openshift/openshift-ansible/pull/4674

Removing the spc_t label as it is no longer needed.

Still, wondering why it has showed up now. The etcdcetl dropping task is present since 3.4 deployment.

Before running the upgrade we will need to check if the /var/lib/etcd is properly labelled and re-label it if not.

Comment 30 Giuseppe Scrivano 2017-07-05 13:02:24 UTC

@Anping, Jan could reproduce the issue and https://github.com/openshift/openshift-ansible/pull/4674 fix the cause of having the wrong label for /var/lib/etcd.

Could you try with that change?

Comment 31 Scott Dodson 2017-07-05 16:44:08 UTC

More changes merged.

Comment 32 Anping Li 2017-07-06 06:39:09 UTC

@Jan, scott, Giuseppe The fix won't take affect to upgrade. 

For this issue is only exit on docker-1.12.6-28.git1398f24.el7.x86_64. So I set the Severity to medium.

Comment 33 Jan Chaloupka 2017-07-06 06:50:01 UTC

With https://github.com/openshift/openshift-ansible/pull/4680 merged, the etcd working directory is now re-labeled with svirt_sandbox_file_t before the docker exec etcdctl backup command is run.

Comment 35 Anping Li 2017-07-07 05:24:17 UTC

@Jan. The fix work well, the etcd data can be backed during upgrade. and the etcd data_dir is labeled as system_u:object_r:svirt_sandbox_file_t:s0 by install playbook

Comment 36 Scott Dodson 2017-07-07 14:12:08 UTC

Moving back to ON_QA, should be fixed in openshift-ansible-3.6.137-1

Comment 37 Anping Li 2017-07-10 02:59:01 UTC

Pass on openshift-ansible-3.6.139

Comment 39 errata-xmlrpc 2017-08-10 05:26:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.