Created attachment 1636583 [details] simple-full-utilization.yml file (for reproducer) Description of problem ====================== When I run a k8s Job on OCP/OCS cluster, in it's container I see additional rbd block devices in lsblk output, which comes from the worker node where the pod is running. This bug is created based on BZ 1769037 and eng. feedback I got: From https://bugzilla.redhat.com/show_bug.cgi?id=1769037#c11 > 'lsblk' reads from sysfs, so if "/sys/dev/block" is available in the > container, you will get to see all the host-level details. This is not an > OCS-unique issue -- this is a general issue w/ mounting sysfs within the > container. From https://bugzilla.redhat.com/show_bug.cgi?id=1769037#c13: > At least in some configurations prometheus pods are given host access (and in > particular read-only /sys and /proc) on purpose, to allow it to collect > metrics from the host. We shouldn't be allowing that for workload pods. If you believe this is problem on OCS side, reopen BZ 1769037 with a suggestion what is broken. Version-Release number of selected component ============================================ cluster channel: stable-4.2 cluster version: 4.2.0-0.nightly-2019-11-13-203727 cluster image: registry.svc.ci.openshift.org/ocp/release@sha256:008d3abd2d12fa4cebf2eed2784faabd8b4ed304cdef9694ad86acb638d36d09 storage namespace openshift-cluster-storage-operator image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e758e330315ad3022be0ec4b077f91dbda38f8d7b2659f3bb1d89e9787f70b6 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e758e330315ad3022be0ec4b077f91dbda38f8d7b2659f3bb1d89e9787f70b6 storage namespace openshift-storage image quay.io/rhceph-dev/cephcsi:4.2-206.72ac53b6.release_4.2 * quay.io/rhceph-dev/cephcsi@sha256:0571be219fc6537c2ce8b1343ae511182f699b8439d2593f77c7dbfc681d6c00 image quay.io/openshift/origin-csi-node-driver-registrar:4.2 * quay.io/openshift/origin-csi-node-driver-registrar@sha256:6671a4a02a9bf5abfa58087b6d2fea430278d7bc5017aab6b88a84203c0dad06 image quay.io/openshift/origin-csi-external-provisioner:4.2 * quay.io/openshift/origin-csi-external-provisioner@sha256:dbe8b5e1bebfed1e8f68be1d968ff0356270a0311c21f3920e0f3bec3b4d97ea image quay.io/openshift/origin-csi-external-snapshotter:4.2 * quay.io/openshift/origin-csi-external-snapshotter@sha256:910f54893ac1ae42b80735b0def153ee62aa7a73d5090d2955afc007b663ec79 image registry.redhat.io/openshift4/ose-local-storage-operator@sha256:f9fd77be0aeb4d61137c5b7d03519e495b31bc253c1b5275f536ae94a54643dd * registry.redhat.io/openshift4/ose-local-storage-operator@sha256:f9fd77be0aeb4d61137c5b7d03519e495b31bc253c1b5275f536ae94a54643dd image quay.io/rhceph-dev/mcg-core:v5.2.9-14.5ab57cf51.5.2 * quay.io/rhceph-dev/mcg-core@sha256:e5edb6f3b5bcbfd2c98a7a3ba2515d96335ea5821a814e41cd5d75b056b6fc07 image registry.access.redhat.com/rhscl/mongodb-36-rhel7:latest * registry.access.redhat.com/rhscl/mongodb-36-rhel7@sha256:d8e9c9738b248bf20f9649e5cb6b564a00079e449df32f0bd304519c5dd0739e image quay.io/rhceph-dev/mcg-operator:v2.0.7-23.ebea58a.2.0 * quay.io/rhceph-dev/mcg-operator@sha256:6928c5f48175d001ca507685aeccabbf3cdaad0491ad5fada078f1e027f83956 image quay.io/rhceph-dev/ocs-operator:4.2-237.d00f48f.release_4.2 * quay.io/rhceph-dev/ocs-operator@sha256:444bc9ae9c313c45c34f894023fea881aa8e0abfa126d0cdef90fc09d242003e image quay.io/rhceph-dev/rook:4.2-234.c907d72d.ocs_4.2 * quay.io/rhceph-dev/rook@sha256:429d2fb249b097a1bacd6dd1e1a05be07f30dfe761736ccd18e595cf0eccf020 image quay.io/rhceph-dev/rhceph:4-20191113.ci.1 * quay.io/rhceph-dev/rhceph@sha256:9e5e6455a655e5128f515fde77f460ae92de24bcdeedc21a11c4172421f31afd How reproducible ================ 100% Steps to Reproduce ================== 1. Install OCP/OCS cluster (I did this via red-hat-storage/ocs-ci, using downstream OCS images, ocs-ci commit b913e40) 2. Create new project "fio" 3. Create a simple Job which runs fio on RBD volume in this namespace: ``` oc create -f simple-full-utilization.yml -n fio ``` The full yaml is attached to this BZ. 4. While the job is running, connect to it's container and run `lsblk` command. Actual results ============== When I check container of the job: ``` $ oc rsh -n fio fio-zwf8x bash bash-4.2$ lsblk lsblk: dm-0: failed to get device path lsblk: dm-0: failed to get device path NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT rbd0 252:0 0 50G 0 disk xvda 202:0 0 120G 0 disk |-xvda2 202:2 0 1G 0 part |-xvda3 202:3 0 119G 0 part /dev/termination-log `-xvda1 202:1 0 1M 0 part xvdcz 202:26368 0 100G 0 disk rbd3 252:48 0 10G 0 disk /target rbd1 252:16 0 40G 0 disk loop0 7:0 0 100G 0 loop xvdco 202:23552 0 10G 0 disk rbd2 252:32 0 40G 0 disk ``` I see that there are multiple extra rbd block devices visible in output of lsblk. These devices are there because of current state of worker node where the pod is running. Expected results ================ There is just single rbd block device visible in the container, because there is just a single RBD based volume specified in the yaml file. Additional info =============== The device is not actually present in the container: ``` bash-4.2$ ls -l /dev/rbd0 ls: cannot access /dev/rbd0: No such file or directory ``` And can't be created (but I wasn't trying hard enough to check this): ``` bash-4.2$ mknod rbd0 b 252 0 mknod: 'rbd0': Permission denied bash-4.2$ cd /tmp bash-4.2$ mknod rbd0 b 252 0 mknod: 'rbd0': Operation not permitted ``` So this is mere information leak about the block devices attached to a worker node to the container pod. Full output of mount command: ``` overlay on / type overlay (rw,relatime,context="system_u:object_r:container_file_t:s0:c22,c23",lowerdir=/var/lib/containers/storage/overlay/l/UGUA2N7SIOLQUKSP6KIFB5333R:/var/lib/containers/storage/overlay/l/3PIL5HU553D6WMMFF2BSGSZROH:/var/lib/containers/storage/overlay/l/ANJXBP5QG7LMQEFBAN4F4OF42H:/var/lib/containers/storage/overlay/l/67AGGYPSDDQIGL5OPVN4D7PHKB:/var/lib/containers/storage/overlay/l/NC3RQN6LWBKXWHQ2ZG4B7WHTQG:/var/lib/containers/storage/overlay/l/ER4RUCFSOFHFVBULRFDKXG5YLQ:/var/lib/containers/storage/overlay/l/57DDZNBXO4VONMSOQN2ZWVL7MM:/var/lib/containers/storage/overlay/l/XEWZFMMH2ONZFZXAGQHKLRRKMP,upperdir=/var/lib/containers/storage/overlay/6ff58d4013614306489b5f6b5aebaa9c968034905f37b316343e2ee6d898d89e/diff,workdir=/var/lib/containers/storage/overlay/6ff58d4013614306489b5f6b5aebaa9c968034905f37b316343e2ee6d898d89e/work) proc on /proc type proc (rw,relatime) tmpfs on /dev type tmpfs (rw,nosuid,context="system_u:object_r:container_file_t:s0:c22,c23",size=65536k,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,context="system_u:object_r:container_file_t:s0:c22,c23",gid=5,mode=620,ptmxmode=666) mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime,seclabel) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime,seclabel) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c22,c23",mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,pids) cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,hugetlb) cgroup on /sys/fs/cgroup/rdma type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,rdma) cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,freezer) cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,devices) cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,perf_event) cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,blkio) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct) cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,cpuset) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio) cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,seclabel,memory) shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c22,c23",size=65536k) tmpfs on /etc/resolv.conf type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755) tmpfs on /etc/hostname type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /etc/passwd type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755) /dev/rbd3 on /target type ext4 (rw,relatime,seclabel,stripe=1024) /dev/xvda3 on /etc/fio type xfs (ro,relatime,seclabel,attr2,inode64,prjquota) /dev/xvda3 on /etc/hosts type xfs (rw,relatime,seclabel,attr2,inode64,prjquota) /dev/xvda3 on /dev/termination-log type xfs (rw,relatime,seclabel,attr2,inode64,prjquota) tmpfs on /run/secrets type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /run/secrets type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,relatime,seclabel) proc on /proc/bus type proc (ro,relatime) proc on /proc/fs type proc (ro,relatime) proc on /proc/irq type proc (ro,relatime) proc on /proc/sys type proc (ro,relatime) proc on /proc/sysrq-trigger type proc (ro,relatime) tmpfs on /proc/acpi type tmpfs (ro,relatime,context="system_u:object_r:container_file_t:s0:c22,c23") tmpfs on /proc/kcore type tmpfs (rw,nosuid,context="system_u:object_r:container_file_t:s0:c22,c23",size=65536k,mode=755) tmpfs on /proc/keys type tmpfs (rw,nosuid,context="system_u:object_r:container_file_t:s0:c22,c23",size=65536k,mode=755) tmpfs on /proc/timer_list type tmpfs (rw,nosuid,context="system_u:object_r:container_file_t:s0:c22,c23",size=65536k,mode=755) tmpfs on /proc/sched_debug type tmpfs (rw,nosuid,context="system_u:object_r:container_file_t:s0:c22,c23",size=65536k,mode=755) tmpfs on /proc/scsi type tmpfs (ro,relatime,context="system_u:object_r:container_file_t:s0:c22,c23") tmpfs on /sys/firmware type tmpfs (ro,relatime,context="system_u:object_r:container_file_t:s0:c22,c23") ```
I think the problem has been explained in the bug #1769037: Cri-o bind-mounts host /sys into the container, lsblk then displays whatever it finds there. I think this should belong to the Container runtimes.
PR with fix https://github.com/cri-o/cri-o/pull/4072
@Martin Bukatovic Can you tell me how you install OCP/OCS in detail? like jenkins job? image site? Or other install method like installing repo via command-line? I can't find OCS operator in latest OCP 4.6 operatorHub, perhaps OCS4.6 not released yet.
(In reply to MinLi from comment #6) > Can you tell me how you install OCP/OCS in detail? like jenkins job? image > site? Or other install method like installing repo via command-line? > I can't find OCS operator in latest OCP 4.6 operatorHub, perhaps OCS4.6 not > released yet. I'm asking about OCS operator availability in OCP 4.6 on OCS eng list: http://post-office.corp.redhat.com/archives/rhocs-eng/2020-August/msg00218.html based on the answer, I can help you with using OCS CI builds for 4.5 or 4.6.
Hi, Martin Bukatovic do you know what is the OCS 4.5 CI build image tag? As in the https://ocs-ci.readthedocs.io/en/latest/docs/deployment_without_ocs.html#enabling-catalog-source-with-development-builds-of-ocs described, I need to replace tag latest with an available version in catalog-source.yaml
(In reply to MinLi from comment #8) > do you know what is the OCS 4.5 CI build image tag? As in the > https://ocs-ci.readthedocs.io/en/latest/docs/deployment_without_ocs. > html#enabling-catalog-source-with-development-builds-of-ocs described, I > need to replace tag latest with an available version in catalog-source.yaml To get latest OCS 4.5 CI build, use latest-4.5 tag.
the bug reproduce on OCP4.6 and OCS-CI 4.5. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-01-070508 True False 164m Cluster version is 4.6.0-0.nightly-2020-09-01-070508 $ oc get csv --all-namespaces NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-storage ocs-operator.v4.5.0-543.ci OpenShift Container Storage 4.5.0-543.ci Succeeded $ oc rsh fio-8lksh sh-4.2$ df -h Filesystem Size Used Avail Use% Mounted on overlay 120G 11G 110G 9% / tmpfs 64M 0 64M 0% /dev tmpfs 16G 0 16G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm tmpfs 16G 51M 16G 1% /etc/passwd /dev/rbd1 9.8G 6.4G 3.5G 65% /target /dev/mapper/coreos-luks-root-nocrypt 120G 11G 110G 9% /etc/fio tmpfs 16G 28K 16G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 16G 0 16G 0% /proc/acpi tmpfs 16G 0 16G 0% /proc/scsi tmpfs 16G 0 16G 0% /sys/firmware sh-4.2$ lsblk lsblk: dm-0: failed to get device path lsblk: dm-0: failed to get device path NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT rbd0 252:0 0 50G 0 disk xvda 202:0 0 120G 0 disk |-xvda4 202:4 0 119.5G 0 part |-xvda2 202:2 0 127M 0 part |-xvda3 202:3 0 1M 0 part `-xvda1 202:1 0 384M 0 part xvdcs 202:24576 0 2T 0 disk xvdbg 202:14848 0 10G 0 disk rbd1 252:16 0 10G 0 disk /target loop0 7:0 0 2T 0 loop sh-4.2$ ls -l /dev/rbd1 ls: cannot access /dev/rbd1: No such file or directory sh-4.2$ ls -l /dev/rbd0 ls: cannot access /dev/rbd0: No such file or directory
I was unable to reproduce this with crictl: ```container_config.json { "metadata": { "name": "container1", "attempt": 1 }, "image": { "image": "registry.fedoraproject.org/fedora-minimal:latest" }, "command": [ "lsblk" ], "args": [], "working_dir": "/", "envs": [ { "key": "PATH", "value": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" }, { "key": "TERM", "value": "xterm" }, { "key": "TESTDIR", "value": "test/dir1" }, { "key": "TESTFILE", "value": "test/file1" } ], "labels": { "type": "small", "batch": "no" }, "annotations": { "owner": "dragon", "daemon": "crio" }, "log_path": "", "stdin": false, "stdin_once": false, "tty": false, "linux": { "resources": { "cpu_period": 10000, "cpu_quota": 20000, "cpu_shares": 512, "oom_score_adj": 30, "memory_limit_in_bytes": 268435456 }, "security_context": { "run_as_user":{ "value": 0 }, "namespace_options": { "pid": 1 }, "readonly_rootfs": false, "selinux_options": { "user": "system_u", "role": "system_r", "type": "svirt_lxc_net_t", "level": "s0:c4,c5" }, "capabilities": { "add_capabilities": [ "setuid", "setgid" ], "drop_capabilities": [ ] } } } } ``` ```sandbox_config.json { "metadata": { "name": "podsandbox1", "uid": "redhat-test-crio", "namespace": "redhat.test.crio", "attempt": 1 }, "hostname": "crictl_host", "log_directory": "", "dns_config": { "searches": [ "8.8.8.8" ] }, "port_mappings": [], "resources": { "cpu": { "limits": 3, "requests": 2 }, "memory": { "limits": 50000000, "requests": 2000000 } }, "labels": { "group": "test" }, "annotations": { "owner": "hmeng", "security.alpha.kubernetes.io/seccomp/pod": "unconfined" }, "linux": { "cgroup_parent": "pod_123-456.slice", "security_context": { "namespace_options": { "network": 0, "pid": 1, "ipc": 0 }, "selinux_options": { "user": "system_u", "role": "system_r", "type": "svirt_lxc_net_t", "level": "s0:c4,c5" } } } } ``` and running: $ sudo crictl run container_config.json sandbox_config.json 10f976a96da86ee57b4bcecc8472990b281082153c832ecf878b2bd50d2bddb2 yields: $ sudo crictl logs 10f976a96da86ee57b4bcecc8472990b281082153c832ecf878b2bd50d2bddb2 lsblk: failed to access sysfs directory: /sys/dev/block: No such file or directory note the lack of "privileged" in both `linux` objects. Adding `privileged: true` to both json files makes this work. Are we sure the container in question is not privileged? Moving back to modified, if it still doesn't work on 4.8 I'll need the pod manifest of the failing pod :)
verified with 4.8.0-0.nightly-2021-06-16-020345 $ oc rsh fio-fclhn bash bash-4.2$ df -h Filesystem Size Used Avail Use% Mounted on overlay 120G 12G 109G 10% / tmpfs 64M 0 64M 0% /dev tmpfs 16G 0 16G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm tmpfs 16G 56M 16G 1% /etc/passwd /dev/rbd0 9.8G 8.8G 1.1G 90% /target /dev/nvme0n1p4 120G 12G 109G 10% /etc/fio tmpfs 16G 20K 16G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 16G 0 16G 0% /proc/acpi tmpfs 16G 0 16G 0% /proc/scsi tmpfs 16G 0 16G 0% /sys/firmware bash-4.2$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop1 7:1 0 512G 0 loop nvme0n1 259:0 0 120G 0 disk |-nvme0n1p3 259:3 0 384M 0 part |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p4 259:4 0 119.5G 0 part /dev/termination-log `-nvme0n1p2 259:2 0 127M 0 part rbd0 252:0 0 10G 0 disk /target nvme2n1 259:6 0 512G 0 disk nvme1n1 259:5 0 10G 0 disk There is only one rbd device rbd0 visible.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438