Bug 1934113 - mcd panic when there's not enough free disk space
Summary: mcd panic when there's not enough free disk space
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.8.0
Assignee: Yu Qi Zhang
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-02 14:28 UTC by Yuval Kashtan
Modified: 2021-07-27 22:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:49:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
mcd log (8.12 KB, text/plain)
2021-03-02 14:28 UTC, Yuval Kashtan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2449 0 None open Bug 1934113: Improve error handling for os updates 2021-03-03 00:57:07 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:51:26 UTC

Description Yuval Kashtan 2021-03-02 14:28:29 UTC
Created attachment 1760219 [details]
mcd log

Description of problem:
when enabling NBBDE, according to doc https://github.com/openshift/openshift-docs/blob/enterprise-4.7/modules/installation-special-config-encrypt-disk-tang.adoc
rootfs is too small (3G)

which then causes MCD to panic when trying to rebase the os (see attached log)

the hidden msg is:
```
# rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-257500977/srv/repo:c6ccbc4764826ef8ddecf083945a3b0172b015494d7b59c09b7c840045bd4565 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96be060a7824bed1eae6431f2209457a0263aa7cb70f495a68d23f734ba384d3 --custom-origin-description "Managed by machine-config-operator"
error: Pulling commit c6ccbc4764826ef8ddecf083945a3b0172b015494d7b59c09b7c840045bd4565 from local repo: Writing content object: min-free-space-percent '3%' would be exceeded, at least 5.3 MB requested
```

Version-Release number of selected component (if applicable):
4.8-nightly

How reproducible:
everytime

Steps to Reproduce:
1. follow the TANG encryption doc

Comment 1 Yu Qi Zhang 2021-03-02 20:58:50 UTC
Will look to fix the panic, in the meantime reassigning to RHCOS to see if the script is correct for general use

Comment 2 Yuval Kashtan 2021-03-02 21:29:09 UTC
see that I've also opened:
https://bugzilla.redhat.com/show_bug.cgi?id=1934174

Comment 3 Yu Qi Zhang 2021-03-02 22:50:32 UTC
Ah sorry, moving back, panic fix in https://github.com/openshift/machine-config-operator/pull/2449

Comment 5 Michael Nguyen 2021-03-08 17:22:38 UTC
Verified on 4.8.0-0.nightly-2021-03-08-092651.  MCD no longer panics when hitting the error.  The RHCOS resize bz mentioned 
https://bugzilla.redhat.com/show_bug.cgi?id=1934113#c2 is still present so I was able to capture the error (with no panic) but once that fixed is in a build, this error should not happen anymore.

Verification steps:
- Create a Tang Server
- openshift install create manifests
- Add the following two files, replacing tang server and thumbprint with yours

$ cat << EOF > ./99-openshift-master-tang-encryption.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: worker-tang
  labels:
    machineconfiguration.openshift.io/role: worker
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      luks:
        - name: root
          device: /dev/disk/by-partlabel/root
          clevis:
            tang:
              - url: https://tang.example.com
                thumbprint: PLjNyRdGw03zlRoGjQYMahSZGu9
          options: [--cipher, aes-cbc-essiv:sha256]
          wipeVolume: true
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          wipeFilesystem: true
          label: root
    kernelArguments:
      - rd.neednet=1
EOF

$ cat << EOF > ./99-openshift-master-tang-encryption.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: master-tang
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      luks:
        - name: root
          device: /dev/disk/by-partlabel/root
          clevis:
            tang:
              - url: https://tang.example.com
                thumbprint: PLjNyRdGw03zlRoGjQYMahSZGu9
          options: [--cipher, aes-cbc-essiv:sha256]
          wipeVolume: true
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          wipeFilesystem: true
          label: root
    kernelArguments:
      - rd.neednet=1
EOF

- openshift install create cluster
- verify there are no panics in the mcd logs

-- Logs begin at Mon 2021-03-08 16:14:27 UTC, end at Mon 2021-03-08 17:09:42 UTC. --
Mar 08 16:24:18 ip-10-0-131-245 systemd[1]: Starting Machine Config Daemon Firstboot...
Mar 08 16:24:18 ip-10-0-131-245 sh[4198]: sed: can't read /etc/yum.repos.d/*.repo: No such file or directory
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.094481    4200 rpm-ostree.go:258] Running captured: rpm-ostree status --json
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.317745    4200 daemon.go:218] Booted osImageURL:  (47.83.202102090044-0)
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.319066    4200 update.go:597] Checking Reconcilable for config mco-empty-mc to rendered-master-123e469c42eaa65bce01288e5c7aa6fc
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.319601    4200 update.go:1905] Starting update from mco-empty-mc to rendered-master-123e469c42eaa65bce01288e5c7aa6fc: &{osUpdate:true kargs:true fips:false passwd:false files:false units:false kernelType:false extensions:false}
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.323664    4200 update.go:1220] Updating files
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.324298    4200 update.go:1293] Deleting stale data
Mar 08 16:24:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:24:18.326720    4200 run.go:18] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-965836767 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41
Mar 08 16:25:16 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:16.601183    4200 update.go:1783] Updating OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41
Mar 08 16:25:16 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:16.601396    4200 rpm-ostree.go:258] Running captured: rpm-ostree status --json
Mar 08 16:25:16 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:16.630625    4200 rpm-ostree.go:184] Current origin is not custom
Mar 08 16:25:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:18.221356    4200 rpm-ostree.go:211] Pivoting to: 48.83.202103080317-0 (5633f70d06713fab5da5b884c1637b2bc6b0de7cc76967e9b7d75fcde315692e)
Mar 08 16:25:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:18.221377    4200 rpm-ostree.go:243] Executing rebase from repo path /run/mco-machine-os-content/os-content-965836767/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41 and checksum 5633f70d06713fab5da5b884c1637b2bc6b0de7cc76967e9b7d75fcde315692e
Mar 08 16:25:18 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:18.221391    4200 rpm-ostree.go:258] Running captured: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-965836767/srv/repo:5633f70d06713fab5da5b884c1637b2bc6b0de7cc76967e9b7d75fcde315692e --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41 --custom-origin-description Managed by machine-config-operator
Mar 08 16:25:19 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:19.153797    4200 update.go:1220] Updating files
Mar 08 16:25:19 ip-10-0-131-245 machine-config-daemon[4200]: I0308 16:25:19.154211    4200 update.go:1293] Deleting stale data
Mar 08 16:25:19 ip-10-0-131-245 machine-config-daemon[4200]: error: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41 : error running rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-965836767/srv/repo:5633f70d06713fab5da5b884c1637b2bc6b0de7cc76967e9b7d75fcde315692e --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e6a29805478181c58ee8922085fc919cd19a15617f6e32ca9b0580c086fcfb41 --custom-origin-description Managed by machine-config-operator: error: Pulling commit 5633f70d06713fab5da5b884c1637b2bc6b0de7cc76967e9b7d75fcde315692e from local repo: Writing content object: min-free-space-percent '3%' would be exceeded, at least 123.8 MB requested
Mar 08 16:25:19 ip-10-0-131-245 machine-config-daemon[4200]: : exit status 1
Mar 08 16:25:19 ip-10-0-131-245 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=exited, status=1/FAILURE
Mar 08 16:25:19 ip-10-0-131-245 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'exit-code'.
Mar 08 16:25:19 ip-10-0-131-245 systemd[1]: Failed to start Machine Config Daemon Firstboot.
Mar 08 16:25:19 ip-10-0-131-245 systemd[1]: machine-config-daemon-firstboot.service: Consumed 16.260s CPU time

Comment 8 errata-xmlrpc 2021-07-27 22:49:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.