Description of problem: When trying to install OCP42 UPI Baremetal cluster using RHCOS 4.2 as OS on baremetal nodes with vdb1 partition for /var/lib/containers cri-o storage, cri-o service is unable to start on installed nodes. The following is the ignition config used in order to use vdb1 partition mounted at /var/lib/containers for cri-o storage on bootstrap host: # cat bootstrap.ign ... "storage": { ... "disks": [ { "device": "/dev/vdb", "wipeTable": true, "partitions": [ { "label": "data01", "number": 1, "size": 0 } ] } ], "filesystems": [ { "mount": { "device": "/dev/vdb1", "format": "xfs", "label": "data01" } } ] }, "systemd": { "units": [ ... { "name": "var-lib-containers.mount", "enabled": true, "contents": "[Mount]\nWhat=/dev/vdb1\nWhere=/var/lib/containers\nType=xfs\nOptions=defaults\n\n[Install]\nWantedBy=local-fs.target" }, { "name": "var-lib-containers-relabel.service", "enabled": true, "contents": "[Unit]\nAfter=var-lib-containers.mount\n[Service]\nType=oneshot\nExecStart=/sbin/restorecon /var/lib/containers\n\n[Install]\nWantedBy=local-fs.target" } ... RHCOS installation finish successfully on bootstrap node and vdb1 is mounted on /var/lib/containers as expected: [core@ocp4-bootstrap ~]$ mount | grep vdb1 /dev/vdb1 on /var/lib/containers type xfs (rw,relatime,seclabel,attr2,inode64,noquota) [core@ocp4-bootstrap ~]$ sudo df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 7.9G 6.6M 7.9G 1% /run tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/vda3 49G 2.3G 47G 5% /sysroot /dev/vda2 976M 69M 841M 8% /boot /dev/vda1 94M 6.6M 88M 8% /boot/efi /dev/vdb1 20G 176M 20G 1% /var/lib/containers tmpfs 1.6G 0 1.6G 0% /run/user/1000 BUT cri-o service is unable to start on bootstrap installed node because crio-wipe service (dependence) is unable to start: [core@ocp4-bootstrap ~]$ sudo systemctl status cri-o ● crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: https://github.com/cri-o/cri-o Oct 11 14:20:00 ocp4-bootstrap.info.net systemd[1]: Dependency failed for Open Container Initiative Daemon. Oct 11 14:20:00 ocp4-bootstrap.info.net systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'. [core@ocp4-bootstrap ~]$ sudo systemctl status crio-wipe ● crio-wipe.service - CRI-O Auto Update Script Loaded: loaded (/usr/lib/systemd/system/crio-wipe.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2019-10-11 14:26:25 UTC; 16s ago Process: 2929 ExecStart=/bin/bash /usr/libexec/crio/crio-wipe/crio-wipe.bash (code=exited, status=1/FAILURE) Main PID: 2929 (code=exited, status=1/FAILURE) Oct 11 14:26:25 ocp4-bootstrap.info.net systemd[1]: Starting CRI-O Auto Update Script... Oct 11 14:26:25 ocp4-bootstrap.info.net bash[2929]: Old version not found Oct 11 14:26:25 ocp4-bootstrap.info.net bash[2929]: Wiping storage Oct 11 14:26:25 ocp4-bootstrap.info.net bash[2929]: rm: cannot remove '/var/lib/containers': Device or resource busy Oct 11 14:26:25 ocp4-bootstrap.info.net systemd[1]: crio-wipe.service: Main process exited, code=exited, status=1/FAILURE Oct 11 14:26:25 ocp4-bootstrap.info.net systemd[1]: crio-wipe.service: Failed with result 'exit-code'. Oct 11 14:26:25 ocp4-bootstrap.info.net systemd[1]: Failed to start CRI-O Auto Update Script. As can be seen crio-wipe.service can not start because the script /usr/libexec/crio/crio-wipe/crio-wipe.bash tries to remove /var/lib/containers directory, which is the mount point where vdb1 is mounted. I believe that crio-wipe expects that /var/lib/containers be a regular directory and not a mount point (that can not be removed while mounted). This inconsistence makes that OCP42 cluster fails when trying to configure /var/lib/containers storage using a secondary disk (vdb) on master and workers. I believe that the use case described in this BZ will be common when installing OCP42 on baremetal servers. Possible solution: Use 'rm -rf /var/lib/containers/*' instead of 'rm -rf /var/lib/containers' on crio-wipe.service Version-Release number of selected component (if applicable): rhcos-4.2.0-0.nightly-2019-08-28-152644-x86_64 [core@ocp4-bootstrap ~]$ cat /etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.2 [core@ocp4-bootstrap ~]$ rpm -q cri-o cri-o-1.14.10-0.8.dev.rhaos4.2.gitaf00350.el8.x86_64 How reproducible: Steps to Reproduce: 1. Follow the standard procedure to install OCP42 UPI Baremetal 2. Modify bootstrap, worker and master ignition files in order to use custom storage for cri-o (/var/lib/containers) as described in this case. 3. OCP42 installation fails on the bootstrap node because cri-o and crio-wipe services can not start (crio-wipe can not remove /var/lib/containers, which is a mount point) Actual results: OCP42 installation fails on the bootstrap Expected results: OCP42 installation success Additional info: Related bugzillas https://bugzilla.redhat.com/show_bug.cgi?id=1699107 https://bugzilla.redhat.com/show_bug.cgi?id=1692513
> Use 'rm -rf /var/lib/containers/*' instead of 'rm -rf /var/lib/containers' on crio-wipe.service Agreed!
This should be fixed as of about a month ago, it seems your build is a bit older than that. Can you try a newer OCP version? I can't find the exact date it got merged, but every release listed here https://releases-rhcos-art.cloud.privileged.psi.redhat.com/ has it (it merged somewhere before 1.14.10-0.18.dev.rhaos4.2.git3725006.el8 but after cri-o-1.14.10-0.8.dev.rhaos4.2.gitaf00350.el8.x86_64)
(In reply to Peter Hunt from comment #2) > This should be fixed as of about a month ago, it seems your build is a bit > older than that. Can you try a newer OCP version? I can't find the exact > date it got merged, but every release listed here > https://releases-rhcos-art.cloud.privileged.psi.redhat.com/ has it (it > merged somewhere before 1.14.10-0.18.dev.rhaos4.2.git3725006.el8 but after > cri-o-1.14.10-0.8.dev.rhaos4.2.gitaf00350.el8.x86_64) Tested on the newest version of OCP42 and RHCOS42 releases and working as expected: https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/4.2.0-rc.5 https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.2.0-rc.5 Also the workaround used to relabel mount point /var/lib/containers is not needed, service ignition-relabel is doing the trick. So the ignition config need is: # cat bootstrap.ign ... "storage": { ... "disks": [ { "device": "/dev/vdb", "wipeTable": true, "partitions": [ { "label": "data01", "number": 1, "size": 0 } ] } ], "filesystems": [ { "mount": { "device": "/dev/vdb1", "format": "xfs", "label": "data01" } } ] }, "systemd": { "units": [ ... { "name": "var-lib-containers.mount", "enabled": true, "contents": "[Mount]\nWhat=/dev/vdb1\nWhere=/var/lib/containers\nType=xfs\nOptions=defaults\n\n[Install]\nWantedBy=local-fs.target" }, ... From my point of view this case can be closed. Many thanks !.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922