Bug 2104619 - Upgrade from 4.11.0-rc0 -> 4.11.0-rc.1 failed. rpm-ostree status shows No space left on device
Summary: Upgrade from 4.11.0-rc0 -> 4.11.0-rc.1 failed. rpm-ostree status shows No spa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.11
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: ---
: 4.12.0
Assignee: Timothée Ravier
QA Contact: Douglas Slavens
URL:
Whiteboard:
Depends On:
Blocks: 2106723
TreeView+ depends on / blocked
 
Reported: 2022-07-06 17:34 UTC by pdsilva
Modified: 2023-01-17 19:52 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:51:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 3243 0 None open Bug 2104619: Remove rollback deployment 2022-07-11 15:09:58 UTC
IBM Linux Technology Center 198945 0 None None None 2022-07-11 18:48:34 UTC
Red Hat Issue Tracker MULTIARCH-2693 0 None None None 2022-07-06 17:40:15 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:52:07 UTC

Description pdsilva 2022-07-06 17:34:02 UTC
Description of problem:
We're testing upgrades from 4.9.41 -> 4.10.21 -> 4.11.0-rc.0 -> 4.11.0-rc.1

Failure is seen during upgrade from 4.11.0-rc.0 to 4.11.0-rc.1. The 4.9 -> 4.10 -> 4.11-rc.0 upgrades had no issues. 

Must gather logs: https://drive.google.com/file/d/1vLMStKyA6z1yyRYwOV9eUaPU53rVo8Pw/view?usp=sharing

The error we're seeing in mcp is:
    message: 'Node master-0 is reporting: "unexpected on-disk state validating against
      rendered-master-c75d108f5b1bd9301a67a1fbd8be19cc: expected target osImageURL
      \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e657044831a9bd296b20d37b693698adb1d7eb2d3cd9090db7724779ecbf608\",
      have \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba\""'
    reason: 1 nodes are reporting degraded status on sync

    message: 'Node worker-1 is reporting: "unexpected on-disk state validating against
      rendered-worker-44da19ca71d1d08ed7aa28e94b7421e1: expected target osImageURL
      \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e657044831a9bd296b20d37b693698adb1d7eb2d3cd9090db7724779ecbf608\",
      have \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded


and the rpm-ostree status command shows No space left on device. The node has 120GB which is the same size always used and no issues have been seen previously.

sh-4.4# rpm-ostree status
State: idle
Warning: failed to finalize previous deployment
         error: Installing kernel: regfile copy: No space left on device
         check `journalctl -b -1 -u ostree-finalize-staged.service`
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206280018-0 (2022-06-28T00:25:24Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:737dda6647e7925bbe4fb5e4b8cddebedcf83c8bc233fee38bb4c5cdd97a1cd5
              CustomOrigin: Managed by machine-config-operator
                   Version: 410.84.202206240419-0 (2022-06-24T04:25:30Z)

Version-Release number of selected component (if applicable):
# oc version
Client Version: 4.9.41
Server Version: 4.11.0-rc.1
Kubernetes Version: v1.24.0+2dd8bb1

 
Steps to Reproduce:
1.Perform upgrades from 4.9.41 -> 4.10.21 -> 4.11.0-rc.0 -> 4.11.0-rc.1
4.9.41 -> 4.10.21 -> 4.11.0-rc.0 was successful but failure seen when upgrading to 4.11.0-rc.1


Actual results:
ClusterVersion: Updating to "4.11.0-rc.1" from "4.11.0-rc.0" for 11 hours: Unable to apply 4.11.0-rc.1: an unknown error has occurred: MultipleErrors

Expected results:
Upgrade should be successful

Additional info:

# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.0   True        True          10h     Unable to apply 4.11.0-rc.1: an unknown error has occurred: MultipleErrors

# oc get nodes
NAME       STATUS                     ROLES    AGE    VERSION
master-0   Ready,SchedulingDisabled   master   2d8h   v1.24.0+9ddc8b1
master-1   Ready                      master   2d8h   v1.24.0+9ddc8b1
master-2   Ready                      master   2d8h   v1.24.0+9ddc8b1
worker-0   Ready                      worker   2d8h   v1.24.0+9ddc8b1
worker-1   Ready,SchedulingDisabled   worker   2d8h   v1.24.0+9ddc8b1

# oc get co
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.0-rc.1   True        False         True       130m    APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
baremetal                                  4.11.0-rc.1   True        False         False      2d8h
cloud-controller-manager                   4.11.0-rc.1   True        False         False      2d8h
cloud-credential                           4.11.0-rc.1   True        False         False      2d8h
cluster-autoscaler                         4.11.0-rc.1   True        False         False      2d8h
config-operator                            4.11.0-rc.1   True        False         False      2d8h
console                                    4.11.0-rc.1   True        False         False      9h
csi-snapshot-controller                    4.11.0-rc.1   True        False         False      2d2h
dns                                        4.11.0-rc.1   True        False         False      2d8h
etcd                                       4.11.0-rc.1   True        False         False      2d8h
image-registry                             4.11.0-rc.1   True        False         False      10h
ingress                                    4.11.0-rc.1   True        False         True       29h     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-56d77b4f4d-b9mvv" cannot be scheduled: 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
insights                                   4.11.0-rc.1   True        False         False      2d8h
kube-apiserver                             4.11.0-rc.1   True        False         False      2d8h
kube-controller-manager                    4.11.0-rc.1   True        False         False      2d8h
kube-scheduler                             4.11.0-rc.1   True        False         False      2d8h
kube-storage-version-migrator              4.11.0-rc.1   True        False         False      10h
machine-api                                4.11.0-rc.1   True        False         False      2d8h
machine-approver                           4.11.0-rc.1   True        False         False      2d8h
machine-config                             4.11.0-rc.0   True        True          True       9h      Unable to apply 4.11.0-rc.1: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 2, ready 0, updated: 0, unavailable: 1)]
marketplace                                4.11.0-rc.1   True        False         False      2d8h
monitoring                                 4.11.0-rc.1   False       True          True       9h      Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
network                                    4.11.0-rc.1   True        False         False      2d8h
node-tuning                                4.11.0-rc.1   True        False         False      10h
openshift-apiserver                        4.11.0-rc.1   True        False         True       2d8h    APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager               4.11.0-rc.1   True        False         False      2d8h
openshift-samples                          4.11.0-rc.1   True        False         False      10h
operator-lifecycle-manager                 4.11.0-rc.1   True        False         False      2d8h
operator-lifecycle-manager-catalog         4.11.0-rc.1   True        False         False      2d8h
operator-lifecycle-manager-packageserver   4.11.0-rc.1   True        False         False      30h
service-ca                                 4.11.0-rc.1   True        False         False      2d8h
storage                                    4.11.0-rc.1   True        False         False



# oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-679e46b83af12a9f90a61c64f086cbc6   False     True       True       3              0                   0                     1                      2d8h
worker   rendered-worker-ce0f663b8f5c37a8bde873ecbb8311e8   False     True       True       2              0                   0                     1                      2d8h


# oc get mcp master -oyaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  creationTimestamp: "2022-07-04T08:45:58Z"
  generation: 7
  labels:
    machineconfiguration.openshift.io/mco-built-in: ""
    operator.machineconfiguration.openshift.io/required-for-upgrade: ""
    pools.operator.machineconfiguration.openshift.io/master: ""
  name: master
  resourceVersion: "1360931"
  uid: 7ffe8daa-08de-4abb-b38d-55db9a5adee8
spec:
  configuration:
    name: rendered-master-c75d108f5b1bd9301a67a1fbd8be19cc
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-master
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-chrony-configuration
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-generated-crio-seccomp-use-default
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-generated-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-ssh
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: master
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/master: ""
  paused: false
status:
  conditions:
  - lastTransitionTime: "2022-07-04T08:47:31Z"
    message: ""
    reason: ""
    status: "False"
    type: RenderDegraded
  - lastTransitionTime: "2022-07-06T07:09:12Z"
    message: ""
    reason: ""
    status: "False"
    type: Updated
  - lastTransitionTime: "2022-07-06T07:09:12Z"
    message: All nodes are updating to rendered-master-c75d108f5b1bd9301a67a1fbd8be19cc
    reason: ""
    status: "True"
    type: Updating
  - lastTransitionTime: "2022-07-06T07:16:11Z"
    message: 'Node master-0 is reporting: "unexpected on-disk state validating against
      rendered-master-c75d108f5b1bd9301a67a1fbd8be19cc: expected target osImageURL
      \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e657044831a9bd296b20d37b693698adb1d7eb2d3cd9090db7724779ecbf608\",
      have \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded
  - lastTransitionTime: "2022-07-06T07:16:11Z"
    message: ""
    reason: ""
    status: "True"
    type: Degraded
  configuration:
    name: rendered-master-679e46b83af12a9f90a61c64f086cbc6
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-master
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-chrony-configuration
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-generated-crio-seccomp-use-default
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-generated-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-ssh
  degradedMachineCount: 1
  machineCount: 3
  observedGeneration: 7
  readyMachineCount: 0
  unavailableMachineCount: 1
  updatedMachineCount: 0


The master node which shows SchedulingDisabled, rpm-ostree shows "No space left on device"

sh-4.4# rpm-ostree status
State: idle
Warning: failed to finalize previous deployment
         error: Installing kernel: regfile copy: No space left on device
         check `journalctl -b -1 -u ostree-finalize-staged.service`
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206280018-0 (2022-06-28T00:25:24Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:737dda6647e7925bbe4fb5e4b8cddebedcf83c8bc233fee38bb4c5cdd97a1cd5
              CustomOrigin: Managed by machine-config-operator
                   Version: 410.84.202206240419-0 (2022-06-24T04:25:30Z)


# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda4       120G   29G   92G  24% /
tmpfs           8.0G     0  8.0G   0% /sys/fs/cgroup
devtmpfs        7.9G     0  7.9G   0% /dev
tmpfs           8.0G  128K  8.0G   1% /dev/shm
tmpfs           8.0G   93M  7.9G   2% /run
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/7649fea8a9f22c2fa4d7d454380c261bca15a34b3b2c75e8503919cdb65755ae/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/7dbcf0b1e7c0725f91eed644fbfa3f30c4d41165ccb43d560faf995da79b1451/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/9b39f3ef0687628c5f7d6639a18b38147f31013e455ab22b68043418c2e953c3/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/c7b6f9393e55d8270550ab8a4ccb2e42a4784623e1d2a0901f30d662a8291a5f/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/23dcaaa0cf8459be2034ef76a658782b51da69d2979c6e06f96e534dc7fce55d/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/d106800bf4ea805fe5df5d2dc6a1416f17b983d1eac08d6ca41addba72a5f2b6/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/6fd657ea28e3335de406979dd4e052f3ceef5bf62e50073fb51331dc0cd9f858/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/c0127e0d16e86b97f802921c336b2549246f967b03ce642b62e00e14dfbdcddb/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/13248f323be7115007c5797fc79374d8383fb2c824bee4e10879ed89979ae325/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/ae24555c8ac30a5689a6b2a12e2a51a69b40bddb7b69267747c194cffcc42104/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/072999891e0af7625c4a69456ad28003d94214a8929c6d631cd09062595e98e4/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/6c502811e9dc0ed12c3768964a5d5bd5a06d673265dccfc972ecc22dea816047/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/9c875c2647fb15f438654895d77b297e40a4baa4f53278499c8a12de9e5d0071/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/2067db47b62476ce457bfcb0f2b36b944d9191991fed5efa225bf2581c5e92b1/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/3f33deb9494424a0d0653dbd171243785a65a535a36960ee92924c27813a670b/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/6d68b8669f7c956a1c399e315b931087c0a306d6875d4a3f54eeece93dee90b8/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/d613e523b5ae6ce901224f4c018d8691e1a0b196f097d078d96dc5c611f64dac/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/a9db656e85403abf52c9c8b940621d92ae1c0291355cea50beb9762dc56f212e/userdata/shm
shm              64M     0   64M   0% /run/containers/storage/overlay-containers/7849996a3ba2e0c582f1d561638bb6e79d4d7b04a109ef7893d4653a51d5ab9d/userdata/shm
shm              64M     0   64M   0% /var/lib/containers/storage/overlay/58c70b88804fe53bb3f6abd09068d797d60b1de50718c8c9a7317336080f4cd9/merged/dev/shm
tmpfs           8.0G   64K  8.0G   1% /tmp
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/74cbd66f09a019daa6b380fad95e1cf137588f9427fa7a09e609c17cb4913d5a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/a2bf51318544322cdab4dbdec432c6a44fdbfe1941521ded6e42de8f20dcc9a5/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/0d6a12c4cedd397ba548756a3394ea3fa0a8d92e293c74ac894a69bd255bfd8a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/207d6350ff9cddc1bc927d7246ccf1cdf8d4d2a5da8d4e7fda40fc7f5c586eb1/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/35fa96660585c22869604bc2adfe7ba5dae5e98264248a22618ba3a77b9b01c8/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/c78ba99d0466bbb0fdc527097516bc3afa394b976d1af2bf98f6a1a00f0d36d5/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/44e54c4bdc8256882dc6c552ff9bcd13b14c8c9137d5735685c877bae0b16898/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/9246569565c2f6853fbae1459e0bdfde24dc70ee705f2b7959ca3a4d580f8614/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/d1784f9508fd316e038f3de86a3a0d94671710978b442f7320a6dba35c59d097/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/7d49aa9d690330efbca62b29382d8b98ac37c212ba246991f919a8f7122a8f9a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/810a309f9c28435dccf28b6bbf2e0b478987da168f302d5c38ffc63cec0cca1d/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/4ebd116b4fd0c1b6dcfee481ee99e79e6f58363a3339ba8a6e9856afc91ada1d/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/87c0507e33fdf3aee4bf360e1dfd6e8cac8d47c4366c415c8cb5223e15818cda/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/36e44bf7bd05c1f29d09f6b5631813113d5d7c7ba9b1974fd1babf18682e0ce7/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/6d64637eaa711cb4adccd09d798176088e2b5c5c350216bdb81913ef28c5a26a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/1e42f32a853da1d1aa69064d87e1ba80a30512d101b60c8032e954809a50e2e3/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/38bcadb922b144e48d3c2468d4d907b1003271d7827ecde6d0152642a7aff9dd/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/8860a2acce06c63616753f70297dc6f7a535dc4b057619b2f9c3d54306e28f11/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/219f50bd78b55073f67c07d5c42f61504b694baae4c4b2a71729a561ae7dc754/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/76a57b9167e99079d28d8cbd1d4b6c272dbc739dec5d14aa35d4f9595eb56fe7/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/db5c1db06cacb60a21d7a837b1b2dd95ce649b01c0fa815a74784d7a2770ab4b/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/617ca0bf2b66921ab96c449b36cd5795993f699148b39bfc21c8c8b0d7182463/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/d7825f691375e46d90e8a4f8f274d991714851f1b28de8adda3509d6cae5582e/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/7fcad1d9aedd38ce29648ed42ceb2700af56269f00f48a1e1075fb5a6ae791d4/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/a5bba405f1ade715b1c1c262d8a9fc1996260bb2a57f72d591105195ff053ff0/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/b597c10fa290a32262f1a234f0c3459006db360dd59fffcb8ad018e1e13481d1/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/5cd0896a0cea72fe0d9d8f5e4541ac85d04c0bc7223458c6e0ecd28b429af92b/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/e113aef1a3925c4addd4f4dee2c9c4c869d13d1f4a9254abdbe9f75d73200761/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/cccf8df999427e00b0cfe9251eb42c480c82934e5c9a067fca7fc598a5fcc65a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/6a09208e93c9640981d81df25adf39e3e3aa1442c40829bb436a0d449b8e6789/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/c28a84fa2dd2d7e1b316088575a9ba95091106b3ae2fec81f1011b42b4988d76/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/d49a521d159caae9e35c5cd61027c37cedbe31d90773c353444023964fccbddc/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/2628c2ddb7cada7f45414ceec1ac7f6d838533988bd84d97bc06b280677ea90a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/2381711c2666ebce5e0d843d048b4e72451405b9938666f21f51eb782a62dd9b/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/1651cd13c13c71211bd8e1bd00b03e524184f7cd70a342d11842546c7b41a316/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/008b9c6a387186aefacb6e66efb0b62ad2522ee3e82fff13f1154743589b17ec/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/6fcf532d999fe7a8c5f650fb1732f49c1fd05d32860dbb863dc238b5efdb71df/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/0a60f82a8e4302128e9649673f5a2960bf096e4e9c1ba75684d0decefa17474e/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/1d39309225432518d77f9d88350c386a9a588f883aaefd53d82a6f47af23f3cc/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/862d52a13a97bda79f6b0034466d6fd57860dfea9491813c86faef825d5ecd6e/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/a61929c80480b2b73cd9ec4473ebf2a7bbc1679f0210b38e1c73e8156460cd9a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/5930d82feba22075c77a790431fef5e6b328d505b5dcf4722203e61cb9c08c7e/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/699a26ddb9cc963785865db12ec43df8300d9fa163de5eee2e0acfa7ef01c757/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/e4ccccb65ad4582606f0f2f3c47a0e171cad51f635aef7be1ff7d9ec1607efde/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/0f7459e495f8a413bdeca09c4f0b308c027e99233b65ff2ca1c4cc00173d4d42/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/2e98e461a51e752ea04fa6a4a05fb90f83a3a3b1b0d098138d5716890d4b8c9e/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/6e802c011f43f0365f05bdccfa3ee63f1be9aa18bcb6cf8e72e820d7be26e53c/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/570e86287b39f0243fc3e2f4f5d5de2ae9928f980690c00b7063d738bc5feb7b/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/ed6379bc5468fc0e5cd99c5d2a66d9db90d54a5d0d30627aca51845be950a6b8/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/390c28bc760426445be7e1134fac7752cff70ac131b8a8174931051b6f3ee28c/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/e39468566dc0728b058ff80692671c936636dc1d928bf1d2c2576365dcf8602e/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/94d7dbf34119f65b24d8f79cb09218ba5a14ae03ddcd4609ea00c2f1f8ec34b3/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/99671df43387654349867ad680bbd4e089430d2830fa568bb3d91d2c24deb645/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/a19026519e1fe43740b0a3aa08bd927f9d85293ff33461a61d2e83573feb5e4c/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/40e29e08a6714713ea91ab9bac89f16700d9a97dc4285b6d06e6e20e55bc9033/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/748afa3ae2a928d646c3b8168f9e210f4db662b1a344f2998a49b6b0367d7c3f/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/f12a7cbe53d3e3c439a4137e2307032c77108d527fd7affd4de38c109ac07b12/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/58af3dbb8abd5aaaa9cce160da06a1193deb8cf7b0c01992255b1a301d348675/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/bdfbd69d4745a66d8990598d97d90b8beee3bd22f90290514b3be036df5bbcf9/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/b42ae7e7e33f5ddc9700806f4aa45870984ee60d3efe834f4744017b8b934234/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/c3d0d89cea7e5429e9088a8cf4789b516841837771a1c08b7400204cd1ea0829/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/a37d1b59abe259b5cfe29dc00d3af0bd4d256eb01fa917573c1210272551a299/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/9e0601986463bdf98ed4c811c5021324330d18380e4f2b68f58e8cd2190e6adf/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/db5bbea59e6a897a0d7afb9fdb1dee3d8a9444c82b731d43f5a9511528947921/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/b75af4b99cd974864d0794f2c94f0068d0229836df659c38f36630187ad463a1/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/2810ba50f210316dbf2ec97bbc4a1b1ccfd77c2b17c7a0f8b8637ffed365e582/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/63bcdf65d9e066ea62a4068c739c715f4556a344cbb8e327670ac6470c867d2c/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/9a056c95c186e8a86c5a1d54cf38d6ee82bf150cf71711398d9950ae63e965b2/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/10216edeb11cd9086ce89ccff3b94e5c651b8ff7928b2bed06e2735e75e0d790/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/af8804ab535ee457b8b2d97e085b3e6b65278c9fa896045ffed398047f4b647a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/7b581a408892eb681cbc05c8d480cdc78a8bffe628f30efa2ce8b66909b520e3/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/d1792accc713b7f3c5e47aa77b13b4b385947a4349765cbd7f5afccd39b7d707/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/33e2a26543733f65c7eb5b3e7f8c43dac6471a8976ea9a36d4de673ddf0b74ba/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/555aab703ec997f75d71738cf30587b42fb99890d001d217fcefd3a8ff27fa80/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/4543abaef27655d54e25b08d867075f3bfa164137d130f94e9593aa32b3a7833/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/68eb567e7678693b66ff1d378bfc065fb8f50958b0158df9e06fa3526aed6d4a/merged
overlay         120G   29G   92G  24% /var/lib/containers/storage/overlay/58c70b88804fe53bb3f6abd09068d797d60b1de50718c8c9a7317336080f4cd9/merged
tmpfs            64M     0   64M   0% /var/lib/containers/storage/overlay/58c70b88804fe53bb3f6abd09068d797d60b1de50718c8c9a7317336080f4cd9/merged/dev
tmpfs           8.0G     0  8.0G   0% /var/lib/containers/storage/overlay/58c70b88804fe53bb3f6abd09068d797d60b1de50718c8c9a7317336080f4cd9/merged/sys/fs/cgroup
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/45651ff9-3f73-4925-b696-8255c59bf8c2/volumes/kubernetes.io~secret/node-bootstrap-token
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/45651ff9-3f73-4925-b696-8255c59bf8c2/volumes/kubernetes.io~secret/certs
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/af9ccda7-a5f1-47fe-b359-3bab5d9a7b3f/volumes/kubernetes.io~projected/kube-api-access-92wl5
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/45651ff9-3f73-4925-b696-8255c59bf8c2/volumes/kubernetes.io~projected/kube-api-access-t5tk9
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/6145dbf0-605f-4380-9358-5f7bf971f25e/volumes/kubernetes.io~projected/kube-api-access-vclcj
tmpfs            15G   64K   15G   1% /var/lib/kubelet/pods/6145dbf0-605f-4380-9358-5f7bf971f25e/volumes/kubernetes.io~secret/node-exporter-kube-rbac-proxy-config
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/6145dbf0-605f-4380-9358-5f7bf971f25e/volumes/kubernetes.io~secret/node-exporter-tls
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/0d0e1319-1f01-4423-9541-90d95791a8d0/volumes/kubernetes.io~secret/sdn-controller-metrics-certs
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/5a5b814d-7d1e-49e4-8007-5b655371fc1c/volumes/kubernetes.io~secret/metrics-tls
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/58cf6b33-d5f1-496e-bd8b-fbf49c4aac73/volumes/kubernetes.io~secret/serving-cert
tmpfs            15G   64K   15G   1% /var/lib/kubelet/pods/c7581191-e489-488b-9e17-8e148a024820/volumes/kubernetes.io~secret/cookie-secret
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/4d231826-0fe8-4bab-91a4-9380094b92e0/volumes/kubernetes.io~secret/webhook-certs
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/02c26524-bbc7-44c2-bfb0-6187b4f2abe0/volumes/kubernetes.io~secret/metrics-certs
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/c7581191-e489-488b-9e17-8e148a024820/volumes/kubernetes.io~secret/proxy-tls
tmpfs            15G  128K   15G   1% /var/lib/kubelet/pods/91517de1-83fd-4bc1-9329-91b436f18088/volumes/kubernetes.io~secret/sdn-metrics-certs
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/02c26524-bbc7-44c2-bfb0-6187b4f2abe0/volumes/kubernetes.io~projected/kube-api-access-v82nt
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/4d231826-0fe8-4bab-91a4-9380094b92e0/volumes/kubernetes.io~projected/kube-api-access-skhct
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/c7581191-e489-488b-9e17-8e148a024820/volumes/kubernetes.io~projected/kube-api-access-kg4xg
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/fbe012b0-d7d6-4647-a29d-17e832cdbcec/volumes/kubernetes.io~projected/kube-api-access-j5hbr
tmpfs           1.0G  256K  1.0G   1% /var/lib/kubelet/pods/996dbdf7-9f61-4d1d-82ed-44514836a980/volumes/kubernetes.io~projected/kube-api-access-5rlll
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/aa36428d-9ec2-4601-a993-f7e34a335d7e/volumes/kubernetes.io~projected/kube-api-access-qjxxj
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/91517de1-83fd-4bc1-9329-91b436f18088/volumes/kubernetes.io~projected/kube-api-access-q54ch
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/a876c756-76ac-460b-915e-43b23f5fe01e/volumes/kubernetes.io~projected/kube-api-access-qw5qz
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/df98d7d5-1ee5-4a73-8f46-cb7851b59c6a/volumes/kubernetes.io~projected/kube-api-access-xmqbr
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/0d0e1319-1f01-4423-9541-90d95791a8d0/volumes/kubernetes.io~projected/kube-api-access-kbs75
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/58cf6b33-d5f1-496e-bd8b-fbf49c4aac73/volumes/kubernetes.io~projected/kube-api-access-4wdhz
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/e49617fc-8b23-40d3-9699-6a32e460d993/volumes/kubernetes.io~projected/kube-api-access-jn2bz
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/5a5b814d-7d1e-49e4-8007-5b655371fc1c/volumes/kubernetes.io~projected/kube-api-access-kpnkr
tmpfs            15G  256K   15G   1% /var/lib/kubelet/pods/2a00c1c3-2818-4fd7-8040-7810d495bdf6/volumes/kubernetes.io~projected/kube-api-access-lkrmz
/dev/sda3       364M  277M   65M  82% /boot


sh-4.4# ls -lart /boot/*
lrwxrwxrwx. 1 root root     1 Oct  8  2021 /boot/boot -> .
lrwxrwxrwx. 1 root root     8 Jul  5 11:40 /boot/loader -> loader.0

/boot/lost+found:
total 14
drwx------. 2 root root 12288 Oct  8  2021 .
drwxr-xr-x. 7 root root  1024 Jul  6 07:13 ..

/boot/grub2:
total 185
drwxr-xr-x. 2 root root   8192 Oct  8  2021 powerpc-ieee1275
drwxr-xr-x. 2 root root   1024 Oct  8  2021 locale
-rw-r--r--. 1 root root   1024 Oct  8  2021 grubenv
-rwxr-xr-x. 1 root root   2625 Oct  8  2021 grub.cfg
-rw-r--r--. 1 root root 164668 Oct  8  2021 grub
drwxr-xr-x. 2 root root   1024 Oct  8  2021 fonts
drwxr-xr-x. 5 root root   1024 Oct  8  2021 .
drwxr-xr-x. 7 root root   1024 Jul  6 07:13 ..

/boot/loader.0:
total 6
drwxr-xr-x. 3 root root 1024 Jul  5 11:40 .
drwxr-xr-x. 2 root root 1024 Jul  5 11:40 entries
drwxr-xr-x. 7 root root 1024 Jul  6 07:13 ..

/boot/ostree:
total 10
drwxr-xr-x. 2 root root 1024 Jul  4 14:44 rhcos-0bfcbbf7c37c79bff574f3829112a64ecee6c084597b1d93e90bc00f07954a73
drwxr-xr-x. 2 root root 1024 Jul  5 11:40 rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f
drwxr-xr-x. 2 root root 1024 Jul  6 07:13 rhcos-1d101bb28fbd71daf2b6ffb8cf1261576cfd670cec4604a91b20a713523370a2
drwxr-xr-x. 7 root root 1024 Jul  6 07:13 ..
drwxr-xr-x. 5 root root 1024 Jul  6 07:13 .

/boot/loader.1:
total 6
drwxr-xr-x. 2 root root 1024 Jul  6 07:13 entries
drwxr-xr-x. 7 root root 1024 Jul  6 07:13 ..
drwxr-xr-x. 3 root root 1024 Jul  6 07:13 .

Must gather logs: https://drive.google.com/file/d/1vLMStKyA6z1yyRYwOV9eUaPU53rVo8Pw/view?usp=sharing

Comment 1 pdsilva 2022-07-06 17:49:02 UTC
Also adding the output of journalctl -b -1 -u ostree-finalize-staged.service

sh-4.4# journalctl -b -1 -u ostree-finalize-staged.service
-- Logs begin at Mon 2022-07-04 08:40:42 UTC, end at Wed 2022-07-06 17:47:26 UTC. --
Jul 06 07:12:22 master-0 systemd[1]: Started OSTree Finalize Staged Deployment.
Jul 06 07:13:50 master-0 systemd[1]: Stopping OSTree Finalize Staged Deployment...
Jul 06 07:13:50 master-0 ostree[1256867]: Finalizing staged deployment
Jul 06 07:13:56 master-0 ostree[1256867]: Copying /etc changes: 15 modified, 0 removed, 1471 added
Jul 06 07:13:56 master-0 ostree[1256867]: Copying /etc changes: 15 modified, 0 removed, 1471 added
Jul 06 07:13:59 master-0 ostree[1256867]: error: Installing kernel: regfile copy: No space left on device
Jul 06 07:13:59 master-0 systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited status=1
Jul 06 07:13:59 master-0 systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'.
Jul 06 07:13:59 master-0 systemd[1]: Stopped OSTree Finalize Staged Deployment.
Jul 06 07:13:59 master-0 systemd[1]: ostree-finalize-staged.service: Consumed 2.164s CPU time

Comment 2 Prashanth Sundararaman 2022-07-06 18:12:08 UTC
looks like  preparing the kernel and copying it into /boot/ostree is erroring out because of space being used up in /boot:

/dev/sda3       364M  277M   65M  82% /boot

moving to coreos team for further investigation.

Comment 3 Joseph Marrero 2022-07-06 20:06:06 UTC
Looks like a real issue with /boot running out of space.

rpm-ostree cleanup -rp

would cleanup the pending and rollback deployments and freeup space. I went from 28% to 18% on my environment by running this.

I would guess that if you run a upgrade directly from

 4.11.0-rc.0 -> 4.11.0-rc.1

this would not happen because there is space in the device?

Comment 4 Prashanth Sundararaman 2022-07-06 20:59:53 UTC
the default size of the boot partition is 384M (https://github.com/coreos/coreos-assembler/blob/main/src/create_disk.sh) - should that be increased?

Comment 5 Timothée Ravier 2022-07-07 10:27:40 UTC
We've moved the discussion about the size of the boot partition in https://github.com/coreos/fedora-coreos-tracker/issues/1247.
There are however 3 boot entries here when there should only be two if I'm not mistaken. Could we get the full journal log for the node?

Comment 6 pdsilva 2022-07-07 11:26:27 UTC
Master-0 journalctl log: https://drive.google.com/file/d/1FyftBN9WWzYjtBlACYQLpAnRows-FeWW/view?usp=sharing


On master-1 which does not have any error, there are only 2 entries in /boot/ostree

master-1:
sh-4.4# df -h | grep /boot
/dev/sdb3       364M  243M   99M  72% /boot


sh-4.4# ls -lart /boot/*
lrwxrwxrwx. 1 root root     1 Oct  8  2021 /boot/boot -> .
lrwxrwxrwx. 1 root root     8 Jul  5 11:49 /boot/loader -> loader.0

/boot/lost+found:
total 14
drwx------. 2 root root 12288 Oct  8  2021 .
drwxr-xr-x. 6 root root  1024 Jul  5 11:49 ..

/boot/grub2:
total 185
drwxr-xr-x. 2 root root   8192 Oct  8  2021 powerpc-ieee1275
drwxr-xr-x. 2 root root   1024 Oct  8  2021 locale
-rw-r--r--. 1 root root   1024 Oct  8  2021 grubenv
-rwxr-xr-x. 1 root root   2625 Oct  8  2021 grub.cfg
-rw-r--r--. 1 root root 164668 Oct  8  2021 grub
drwxr-xr-x. 2 root root   1024 Oct  8  2021 fonts
drwxr-xr-x. 5 root root   1024 Oct  8  2021 .
drwxr-xr-x. 6 root root   1024 Jul  5 11:49 ..

/boot/loader.0:
total 6
drwxr-xr-x. 3 root root 1024 Jul  5 11:49 .
drwxr-xr-x. 2 root root 1024 Jul  5 11:49 entries
drwxr-xr-x. 6 root root 1024 Jul  5 11:49 ..

/boot/ostree:
total 8
drwxr-xr-x. 2 root root 1024 Jul  4 14:53 rhcos-0bfcbbf7c37c79bff574f3829112a64ecee6c084597b1d93e90bc00f07954a73
drwxr-xr-x. 2 root root 1024 Jul  5 11:49 rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f
drwxr-xr-x. 6 root root 1024 Jul  5 11:49 ..
drwxr-xr-x. 4 root root 1024 Jul  5 11:50 .

Comment 7 Joseph Marrero 2022-07-07 13:19:17 UTC
Yeah the default behavior is two have two boot entries. I can't seem to replicate getting 3 on my system without pinning a deployment. But there is none here. I wonder if there was an error during one of the upgrades cleaning up one of them. I am trying to see if there is any indication on the logs of this.

Comment 8 Colin Walters 2022-07-07 18:24:41 UTC
ostree doesn't clean up until *after* a new upgrade is complete, so transiently we will have 3.

Another way to say this is we keep two deployments always by default; which we implement by temporarily having 3.  We don't remove the rollback before trying to pull the new version.

Comment 9 pdsilva 2022-07-07 19:20:47 UTC
This time I did a direct deployment of 4.11.0-rc.0 and could reproduce this issue while doing an upgrade to 4.11.0-rc.1

=========================BEFORE UPGRADE===================================
# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.0   True        False         12m     Cluster version is 4.11.0-rc.0

# oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-e40f9114a24d4f2d98b9b9581bcd07c5   True      False      False      3              3                   3                     0                      136m
worker   rendered-worker-7cc9dba1f9fcc7271b0f999e2284ea1e   True      False      False      3              3                   3                     0                      136m

On one of the masters:
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206280018-0 (2022-06-28T00:25:24Z)

  02573b714e58cac3fce9c9d97f8d10227de38dd9c46d197b47619afd0a7ad57d
                   Version: 411.85.202203090210-0 (2022-03-09T02:17:14Z)

sh-4.4# df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdg3       362M  243M   96M  72% /boot

/boot/ostree:
total 8
drwxr-xr-x. 2 root root 1024 Mar  9 02:20 rhcos-7022fef0ce6fe2dbc74dc84855f917bc677928b7d1f75ac0eb422b3d52472e29
drwxr-xr-x. 2 root root 1024 Jul  7 15:21 rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f
drwxr-xr-x. 4 root root 1024 Jul  7 15:21 .
drwxr-xr-x. 6 root root 1024 Jul  7 15:21 ..


============================================AFTER UPGRADE========================================
# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.0   True        True          106m    Unable to apply 4.11.0-rc.1: an unknown error has occurred: MultipleErrors

# oc get nodes
NAME                                               STATUS                     ROLES    AGE     VERSION
tor01-master-0.rdr-ocp-pravin-upi-707.redhat.com   Ready                      master   3h42m   v1.24.0+9ddc8b1
tor01-master-1.rdr-ocp-pravin-upi-707.redhat.com   Ready                      master   3h36m   v1.24.0+9ddc8b1
tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com   Ready,SchedulingDisabled   master   3h43m   v1.24.0+9ddc8b1
tor01-worker-0.rdr-ocp-pravin-upi-707.redhat.com   Ready,SchedulingDisabled   worker   3h6m    v1.24.0+9ddc8b1
tor01-worker-1.rdr-ocp-pravin-upi-707.redhat.com   Ready                      worker   3h3m    v1.24.0+9ddc8b1
tor01-worker-2.rdr-ocp-pravin-upi-707.redhat.com   Ready                      worker   3h3m    v1.24.0+9ddc8b1

# oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-e40f9114a24d4f2d98b9b9581bcd07c5   False     True       True       3              0                   0                     1                      3h39m
worker   rendered-worker-7cc9dba1f9fcc7271b0f999e2284ea1e   False     True       True       3              0                   0                     1                      3h39m

    message: 'Node tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com is reporting:
      "unexpected on-disk state validating against rendered-master-2d9c1ab445025b7d3ff2bdffd1875f7b:
      expected target osImageURL \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e657044831a9bd296b20d37b693698adb1d7eb2d3cd9090db7724779ecbf608\",
      have \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba\""'
    reason: 1 nodes are reporting degraded status on sync

on master-2:

sh-4.4# rpm-ostree status
State: idle
Warning: failed to finalize previous deployment
         error: Installing kernel: regfile copy: No space left on device
         check `journalctl -b -1 -u ostree-finalize-staged.service`
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206280018-0 (2022-06-28T00:25:24Z)

  02573b714e58cac3fce9c9d97f8d10227de38dd9c46d197b47619afd0a7ad57d
                   Version: 411.85.202203090210-0 (2022-03-09T02:17:14Z)

sh-4.4# df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdh3       362M  277M   62M  82% /boot

sh-4.4# ls -lart /boot/ostree/
total 10
drwxr-xr-x. 2 root root 1024 Mar  9 02:20 rhcos-7022fef0ce6fe2dbc74dc84855f917bc677928b7d1f75ac0eb422b3d52472e29
drwxr-xr-x. 2 root root 1024 Jul  7 15:20 rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f
drwxr-xr-x. 7 root root 1024 Jul  7 18:13 ..
drwxr-xr-x. 2 root root 1024 Jul  7 18:13 rhcos-1d101bb28fbd71daf2b6ffb8cf1261576cfd670cec4604a91b20a713523370a2
drwxr-xr-x. 5 root root 1024 Jul  7 18:13 .



Journalctl of failing master-2 node: https://drive.google.com/file/d/1jkptdv1eDF9mA1toeovKYQw4RS2UzAR3/view?usp=sharing
Must gather logs: https://drive.google.com/file/d/1h_2i8SnNQWLV00jZBZAU1daB9XxbUhkv/view?usp=sharing

Comment 10 Joseph Marrero 2022-07-07 19:38:23 UTC
Interesting that before the upgrade you already have two boot entries and /boot is at 72%. Looking at the new logs to see if there is anything else I can identify there. I have been "grepping" for:
"ostree/rhcos-" & "Resolved OSTree target to" to try and see if I can identify what is going wrong. Nothing jumps at me yet.

Comment 11 Joseph Marrero 2022-07-07 20:48:57 UTC
On the Journalctl of failing master-2 node journalctl-master-2-rc0-rc1.log

I see an rhcos update happen here:

Jul 07 15:20:03 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com machine-config-daemon[2859]: I0707 15:20:03.786114    2859 rpm-ostree.go:296] Executing rebase from repo path /run/mco-machine-os-content/os-content-1715853007/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b8a768d5b354263cab962106cc7faa391014acfd4897356de625667ebac5ba and checksum 22d10ba43839f4f4f5f7ab51cb112a1b96448a3371e0adfe592201e8f78fe332


The kernel upgrade:
Jul 07 15:20:14 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com machine-config-daemon[2859]:   kernel 4.18.0-348.12.2.el8_5 -> 4.18.0-372.9.1.el8

Followed by a first deployment that is successful here that adds a new boot entry:
```
Jul 07 15:20:15 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[3803]: Finalizing staged deployment
Jul 07 15:20:15 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com kernel: EXT4-fs (sdh3): re-mounted. Opts: 
Jul 07 15:20:17 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[3803]: Copying /etc changes: 14 modified, 0 removed, 110 added
Jul 07 15:20:17 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[3803]: Copying /etc changes: 14 modified, 0 removed, 110 added
Jul 07 15:20:18 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[3803]: Bootloader updated; bootconfig swap: yes; bootversion: boot.0.1, deployment count change: 1
Jul 07 15:20:18 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[3803]: Bootloader updated; bootconfig swap: yes; bootversion: boot.0.1, deployment count change: 1
Jul 07 15:20:18 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com systemd[1]: ostree-finalize-staged.service: Succeeded.
```

Then I see a second deployment that fails here:
```
Jul 07 18:13:36 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[335838]: Finalizing staged deployment
Jul 07 18:13:38 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[335838]: Copying /etc changes: 15 modified, 0 removed, 1431 added
Jul 07 18:13:38 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[335838]: Copying /etc changes: 15 modified, 0 removed, 1431 added
Jul 07 18:13:39 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com ostree[335838]: error: Installing kernel: regfile copy: No space left on device
Jul 07 18:13:39 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited status=1
Jul 07 18:13:39 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'.
Jul 07 18:13:39 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com systemd[1]: Stopped OSTree Finalize Staged Deployment.
Jul 07 18:13:39 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com systemd[1]: ostree-finalize-staged.service: Consumed 1.867s CPU time
Jul 07 18:13:39 tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com systemd[1]: ostree-finalize-staged.path: Succeeded.
```

Still looking

Comment 12 Joseph Marrero 2022-07-08 11:56:21 UTC
I don't see any issues in the journal, other than the failure we already know that happens because of lack of space. The initial package updates finish successfully and looks normal.
Unless the one initramfs is bigger than we expect or something else has changed sizes significantly the 2 deployments should be OK. The third one we see after the failure is expected as Colin explained but we should have enough space for that one too.

Can you please share the full sizes of the files in the directory?

To get the full picture you might need to be root.

for example:

➜  ~ sudo -i
[root@silverblue ~]# find /boot -type f | xargs du -b | sort -g
110	/boot/efi/EFI/fedora/BOOTX64.CSV
112	/boot/efi/EFI/fedora/BOOTIA32.CSV
144	/boot/efi/EFI/fedora/grub.cfg
161	/boot/ostree/fedora-c47d0c01d852ff47ad9699dfdd4348e0ae25b7ea639dfe4a8bdea1ebb0eeb6a1/.vmlinuz-5.17.9-300.fc36.x86_64.hmac
547	/boot/loader.1/entries/ostree-1-fedora.conf
547	/boot/loader.1/entries/ostree-2-fedora.conf
1024	/boot/grub2/grubenv
8206	/boot/loader.1/grub.cfg
61561	/boot/efi/EFI/fedora/fwupdx64.efi
68136	/boot/efi/EFI/BOOT/fbia32.efi
87152	/boot/efi/EFI/BOOT/fbx64.efi
676040	/boot/efi/EFI/fedora/mmia32.efi
740344	/boot/efi/EFI/BOOT/BOOTIA32.EFI
740344	/boot/efi/EFI/fedora/shimia32.efi
850032	/boot/efi/EFI/fedora/mmx64.efi
928592	/boot/efi/EFI/BOOT/BOOTX64.EFI
928592	/boot/efi/EFI/fedora/shim.efi
928592	/boot/efi/EFI/fedora/shimx64.efi
1639688	/boot/efi/EFI/fedora/grubia32.efi
2394108	/boot/grub2/fonts/unicode.pf2
2598152	/boot/efi/EFI/fedora/grubx64.efi
11802608	/boot/ostree/fedora-c47d0c01d852ff47ad9699dfdd4348e0ae25b7ea639dfe4a8bdea1ebb0eeb6a1/vmlinuz-5.17.9-300.fc36.x86_64
25039792	/boot/efi/EFI/fedora/fw/fwupd-55d04ffc-714a-4457-b982-d244343e1958.cap
71097429	/boot/ostree/fedora-c47d0c01d852ff47ad9699dfdd4348e0ae25b7ea639dfe4a8bdea1ebb0eeb6a1/initramfs-5.17.9-300.fc36.x86_64.img
[root@silverblue ~]#

Comment 13 pdsilva 2022-07-08 12:04:55 UTC
On master-2 = tor01-master-2.rdr-ocp-pravin-upi-707.redhat.com 

[root@tor01-master-2 core]#  find /boot -type f | xargs du -b | sort -g
12	/boot/grub2/powerpc-ieee1275/video.lst
17	/boot/grub2/powerpc-ieee1275/parttool.lst
37	/boot/.root_uuid
53	/boot/grub2/bootuuid.cfg
54	/boot/grub2/powerpc-ieee1275/terminal.lst
111	/boot/grub2/powerpc-ieee1275/partmap.lst
165	/boot/ostree/rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f/.vmlinuz-4.18.0-372.9.1.el8.ppc64le.hmac
168	/boot/ostree/rhcos-7022fef0ce6fe2dbc74dc84855f917bc677928b7d1f75ac0eb422b3d52472e29/.vmlinuz-4.18.0-348.12.2.el8_5.ppc64le.hmac
219	/boot/grub2/powerpc-ieee1275/fs.lst
472	/boot/grub2/powerpc-ieee1275/all_video.mod
661	/boot/loader.0/entries/ostree-2-rhcos.conf
667	/boot/loader.0/entries/ostree-1-rhcos.conf
788	/boot/grub2/powerpc-ieee1275/setjmp.mod
936	/boot/grub2/powerpc-ieee1275/crypto.lst
1024	/boot/grub2/grubenv
1388	/boot/grub2/powerpc-ieee1275/pkcs1_v15.mod
1420	/boot/grub2/powerpc-ieee1275/hello.mod
1500	/boot/grub2/powerpc-ieee1275/div.mod
1556	/boot/grub2/powerpc-ieee1275/suspend.mod
1628	/boot/grub2/powerpc-ieee1275/halt.mod
1640	/boot/grub2/powerpc-ieee1275/reboot.mod
1656	/boot/grub2/powerpc-ieee1275/trig.mod
1712	/boot/grub2/powerpc-ieee1275/test_blockarg.mod
1732	/boot/grub2/powerpc-ieee1275/true.mod
1776	/boot/grub2/powerpc-ieee1275/exfctest.mod
1780	/boot/grub2/powerpc-ieee1275/pbkdf2.mod
1856	/boot/grub2/powerpc-ieee1275/adler32.mod
1896	/boot/grub2/powerpc-ieee1275/raid5rec.mod
2044	/boot/grub2/powerpc-ieee1275/eval.mod
2048	/boot/grub2/powerpc-ieee1275/cmosdump.mod
2048	/boot/grub2/powerpc-ieee1275/offsetio.mod
2124	/boot/grub2/powerpc-ieee1275/read.mod
2168	/boot/grub2/powerpc-ieee1275/afsplitter.mod
2272	/boot/grub2/powerpc-ieee1275/part_dvh.mod
2272	/boot/grub2/powerpc-ieee1275/time.mod
2372	/boot/grub2/powerpc-ieee1275/setjmp_test.mod
2408	/boot/grub2/powerpc-ieee1275/version.mod
2484	/boot/grub2/powerpc-ieee1275/lsmmap.mod
2492	/boot/grub2/powerpc-ieee1275/part_sun.mod
2512	/boot/grub2/powerpc-ieee1275/datehook.mod
2604	/boot/grub2/powerpc-ieee1275/part_acorn.mod
2646	/boot/grub2/powerpc-ieee1275/modinfo.sh
2672	/boot/grub2/powerpc-ieee1275/keystatus.mod
2692	/boot/grub2/powerpc-ieee1275/part_sunpc.mod
2740	/boot/grub2/powerpc-ieee1275/command.lst
2744	/boot/grub2/powerpc-ieee1275/password.mod
2760	/boot/grub2/powerpc-ieee1275/gcry_arcfour.mod
2772	/boot/grub2/powerpc-ieee1275/part_amiga.mod
2820	/boot/grub2/powerpc-ieee1275/sleep.mod
2828	/boot/grub2/powerpc-ieee1275/xnu_uuid_test.mod
2831	/boot/grub2/grub.cfg
2860	/boot/grub2/powerpc-ieee1275/dm_nv.mod
2860	/boot/grub2/powerpc-ieee1275/part_dfly.mod
2868	/boot/grub2/powerpc-ieee1275/gcry_rsa.mod
2868	/boot/grub2/powerpc-ieee1275/priority_queue.mod
2920	/boot/grub2/powerpc-ieee1275/pbkdf2_test.mod
2928	/boot/grub2/powerpc-ieee1275/crc64.mod
2992	/boot/grub2/powerpc-ieee1275/mdraid09_be.mod
3024	/boot/grub2/powerpc-ieee1275/echo.mod
3028	/boot/grub2/powerpc-ieee1275/strtoull_test.mod
3040	/boot/grub2/powerpc-ieee1275/part_plan.mod
3068	/boot/grub2/powerpc-ieee1275/cmp.mod
3112	/boot/grub2/powerpc-ieee1275/mdraid09.mod
3112	/boot/grub2/powerpc-ieee1275/part_apple.mod
3216	/boot/grub2/powerpc-ieee1275/videotest_checksum.mod
3220	/boot/grub2/powerpc-ieee1275/mul_test.mod
3240	/boot/grub2/powerpc-ieee1275/testspeed.mod
3248	/boot/grub2/powerpc-ieee1275/mdraid1x.mod
3312	/boot/grub2/powerpc-ieee1275/configfile.mod
3324	/boot/grub2/powerpc-ieee1275/gcry_dsa.mod
3360	/boot/grub2/powerpc-ieee1275/memdisk.mod
3372	/boot/grub2/powerpc-ieee1275/date.mod
3380	/boot/grub2/powerpc-ieee1275/increment.mod
3396	/boot/grub2/powerpc-ieee1275/tr.mod
3412	/boot/grub2/powerpc-ieee1275/xnu_uuid.mod
3572	/boot/grub2/powerpc-ieee1275/blocklist.mod
3636	/boot/grub2/powerpc-ieee1275/ctz_test.mod
3680	/boot/grub2/powerpc-ieee1275/msdospart.mod
3732	/boot/grub2/powerpc-ieee1275/boot.mod
3736	/boot/grub2/powerpc-ieee1275/sleep_test.mod
3788	/boot/grub2/powerpc-ieee1275/cmostest.mod
3800	/boot/grub2/powerpc-ieee1275/part_gpt.mod
3884	/boot/grub2/powerpc-ieee1275/procfs.mod
3908	/boot/grub2/powerpc-ieee1275/cmdline_cat_test.mod
3908	/boot/grub2/powerpc-ieee1275/progress.mod
3930	/boot/grub2/powerpc-ieee1275/moddep.lst
3944	/boot/grub2/powerpc-ieee1275/bufio.mod
3980	/boot/grub2/powerpc-ieee1275/raid6rec.mod
4044	/boot/grub2/powerpc-ieee1275/help.mod
4120	/boot/grub2/powerpc-ieee1275/part_msdos.mod
4224	/boot/grub2/powerpc-ieee1275/testload.mod
4372	/boot/grub2/powerpc-ieee1275/cat.mod
4388	/boot/grub2/powerpc-ieee1275/gfxterm_background.mod
4400	/boot/grub2/powerpc-ieee1275/bitmap.mod
4428	/boot/grub2/powerpc-ieee1275/memrw.mod
4504	/boot/grub2/powerpc-ieee1275/bswap_test.mod
4612	/boot/grub2/powerpc-ieee1275/hexdump.mod
4860	/boot/grub2/powerpc-ieee1275/cpio_be.mod
4860	/boot/grub2/powerpc-ieee1275/odc.mod
4872	/boot/grub2/powerpc-ieee1275/password_pbkdf2.mod
4900	/boot/grub2/powerpc-ieee1275/disk.mod
4940	/boot/grub2/powerpc-ieee1275/shift_test.mod
4956	/boot/grub2/powerpc-ieee1275/loopback.mod
4992	/boot/grub2/powerpc-ieee1275/part_bsd.mod
4996	/boot/grub2/powerpc-ieee1275/cpio.mod
5180	/boot/grub2/powerpc-ieee1275/search.mod
5184	/boot/grub2/powerpc-ieee1275/gcry_rfc2268.mod
5212	/boot/grub2/powerpc-ieee1275/search_fs_file.mod
5224	/boot/grub2/powerpc-ieee1275/gcry_md4.mod
5256	/boot/grub2/powerpc-ieee1275/search_label.mod
5320	/boot/grub2/powerpc-ieee1275/newc.mod
5416	/boot/grub2/powerpc-ieee1275/hfspluscomp.mod
5452	/boot/grub2/powerpc-ieee1275/archelp.mod
5532	/boot/grub2/powerpc-ieee1275/search_fs_uuid.mod
5544	/boot/grub2/powerpc-ieee1275/videotest.mod
5704	/boot/grub2/powerpc-ieee1275/macbless.mod
5744	/boot/grub2/powerpc-ieee1275/cbfs.mod
5840	/boot/grub2/powerpc-ieee1275/escc.mod
5864	/boot/grub2/powerpc-ieee1275/gcry_idea.mod
5872	/boot/grub2/powerpc-ieee1275/tar.mod
6120	/boot/grub2/powerpc-ieee1275/ls.mod
6192	/boot/grub2/powerpc-ieee1275/probe.mod
6280	/boot/grub2/powerpc-ieee1275/gptsync.mod
6360	/boot/grub2/powerpc-ieee1275/gcry_md5.mod
6444	/boot/grub2/powerpc-ieee1275/fshelp.mod
6476	/boot/grub2/powerpc-ieee1275/videoinfo.mod
6680	/boot/grub2/powerpc-ieee1275/minicmd.mod
6796	/boot/grub2/powerpc-ieee1275/gcry_sha256.mod
6856	/boot/grub2/powerpc-ieee1275/luks.mod
7132	/boot/grub2/powerpc-ieee1275/gfxterm_menu.mod
7776	/boot/grub2/powerpc-ieee1275/json.mod
7832	/boot/grub2/powerpc-ieee1275/video_colors.mod
8080	/boot/grub2/powerpc-ieee1275/bitmap_scale.mod
8128	/boot/grub2/powerpc-ieee1275/terminal.mod
8240	/boot/grub2/powerpc-ieee1275/romfs.mod
8252	/boot/grub2/powerpc-ieee1275/signature_test.mod
8288	/boot/grub2/powerpc-ieee1275/parttool.mod
8296	/boot/grub2/powerpc-ieee1275/minix_be.mod
8608	/boot/grub2/powerpc-ieee1275/datetime.mod
8668	/boot/grub2/powerpc-ieee1275/minix2_be.mod
8704	/boot/grub2/powerpc-ieee1275/minix3_be.mod
8752	/boot/grub2/powerpc-ieee1275/tga.mod
8820	/boot/grub2/powerpc-ieee1275/minix.mod
8864	/boot/grub2/powerpc-ieee1275/hashsum.mod
9140	/boot/grub2/powerpc-ieee1275/minix3.mod
9168	/boot/grub2/powerpc-ieee1275/minix2.mod
9280	/boot/grub2/powerpc-ieee1275/tftp.mod
9288	/boot/grub2/powerpc-ieee1275/gcry_sha1.mod
9556	/boot/grub2/powerpc-ieee1275/gettext.mod
9620	/boot/grub2/powerpc-ieee1275/ntfscomp.mod
9660	/boot/grub2/powerpc-ieee1275/crypto.mod
9876	/boot/grub2/powerpc-ieee1275/scsi.mod
9996	/boot/grub2/powerpc-ieee1275/cmp_test.mod
10048	/boot/grub2/powerpc-ieee1275/zfscrypt.mod
10344	/boot/grub2/powerpc-ieee1275/ieee1275_fb.mod
10568	/boot/grub2/powerpc-ieee1275/geli.mod
10600	/boot/grub2/powerpc-ieee1275/zfsinfo.mod
10776	/boot/grub2/powerpc-ieee1275/lzopio.mod
10876	/boot/grub2/powerpc-ieee1275/sfs.mod
10888	/boot/grub2/powerpc-ieee1275/affs.mod
10928	/boot/grub2/powerpc-ieee1275/elf.mod
11184	/boot/grub2/powerpc-ieee1275/http.mod
11236	/boot/grub2/powerpc-ieee1275/ofnet.mod
11644	/boot/grub2/powerpc-ieee1275/test.mod
11720	/boot/grub2/powerpc-ieee1275/gcry_sha512.mod
11848	/boot/grub2/powerpc-ieee1275/div_test.mod
12176	/boot/grub2/powerpc-ieee1275/gcry_rmd160.mod
12388	/boot/grub2/powerpc-ieee1275/ufs1_be.mod
12528	/boot/grub2/powerpc-ieee1275/loadenv.mod
12676	/boot/grub2/powerpc-ieee1275/gcry_crc.mod
13124	/boot/grub2/powerpc-ieee1275/ufs2.mod
13256	/boot/grub2/powerpc-ieee1275/ufs1.mod
13296	/boot/grub2/powerpc-ieee1275/exfat.mod
13568	/boot/grub2/powerpc-ieee1275/serial.mod
13712	/boot/grub2/powerpc-ieee1275/hfs.mod
13812	/boot/grub2/powerpc-ieee1275/fat.mod
13948	/boot/grub2/powerpc-ieee1275/video.mod
14576	/boot/grub2/powerpc-ieee1275/gcry_blowfish.mod
14768	/boot/grub2/powerpc-ieee1275/jfs.mod
14864	/boot/grub2/powerpc-ieee1275/afs.mod
14956	/boot/grub2/powerpc-ieee1275/ext2.mod
15376	/boot/grub2/powerpc-ieee1275/ldm.mod
15428	/boot/grub2/powerpc-ieee1275/jpeg.mod
15936	/boot/grub2/powerpc-ieee1275/lvm.mod
16092	/boot/grub2/powerpc-ieee1275/nilfs2.mod
16144	/boot/grub2/powerpc-ieee1275/bfs.mod
16336	/boot/grub2/powerpc-ieee1275/hfsplus.mod
16392	/boot/grub2/powerpc-ieee1275/png.mod
16628	/boot/grub2/powerpc-ieee1275/f2fs.mod
16824	/boot/grub2/powerpc-ieee1275/squash4.mod
17044	/boot/grub2/powerpc-ieee1275/gzio.mod
17228	/boot/grub2/powerpc-ieee1275/xfs.mod
17364	/boot/grub2/powerpc-ieee1275/linux.mod
18120	/boot/grub2/powerpc-ieee1275/macho.mod
19016	/boot/grub2/powerpc-ieee1275/iso9660.mod
19476	/boot/grub2/powerpc-ieee1275/appended_signature_test.mod
20048	/boot/grub2/powerpc-ieee1275/udf.mod
20172	/boot/grub2/powerpc-ieee1275/reiserfs.mod
22864	/boot/grub2/powerpc-ieee1275/gcry_tiger.mod
23332	/boot/grub2/powerpc-ieee1275/pgp.mod
23348	/boot/grub2/powerpc-ieee1275/gfxterm.mod
23480	/boot/grub2/powerpc-ieee1275/blscfg.mod
23876	/boot/grub2/powerpc-ieee1275/ntfs.mod
24376	/boot/grub2/powerpc-ieee1275/diskfilter.mod
24448	/boot/grub2/powerpc-ieee1275/cryptodisk.mod
27516	/boot/grub2/powerpc-ieee1275/font.mod
27604	/boot/grub2/powerpc-ieee1275/gcry_rijndael.mod
28056	/boot/grub2/powerpc-ieee1275/relocator.mod
28408	/boot/grub2/powerpc-ieee1275/luks2.mod
32472	/boot/grub2/powerpc-ieee1275/file.mod
32516	/boot/grub2/powerpc-ieee1275/gcry_cast5.mod
34836	/boot/grub2/powerpc-ieee1275/gcry_seed.mod
36756	/boot/grub2/powerpc-ieee1275/syslinuxcfg.mod
37108	/boot/grub2/powerpc-ieee1275/gcry_whirlpool.mod
37504	/boot/grub2/powerpc-ieee1275/xzio.mod
45120	/boot/grub2/powerpc-ieee1275/functional_test.mod
49780	/boot/grub2/powerpc-ieee1275/gcry_des.mod
50980	/boot/grub2/powerpc-ieee1275/test_asn1.mod
55644	/boot/grub2/powerpc-ieee1275/video_fb.mod
58880	/boot/grub2/powerpc-ieee1275/gcry_serpent.mod
58908	/boot/grub2/powerpc-ieee1275/asn1.mod
60344	/boot/grub2/powerpc-ieee1275/appendedsig.mod
64664	/boot/grub2/powerpc-ieee1275/btrfs.mod
69020	/boot/grub2/powerpc-ieee1275/mpi.mod
88536	/boot/grub2/powerpc-ieee1275/gfxmenu.mod
90436	/boot/grub2/powerpc-ieee1275/zfs.mod
90548	/boot/grub2/powerpc-ieee1275/gcry_camellia.mod
95652	/boot/grub2/powerpc-ieee1275/gcry_twofish.mod
127780	/boot/grub2/powerpc-ieee1275/net.mod
129948	/boot/grub2/powerpc-ieee1275/regexp.mod
141492	/boot/grub2/powerpc-ieee1275/zstd.mod
165660	/boot/grub2/grub
165660	/boot/grub2/powerpc-ieee1275/core.elf
235748	/boot/grub2/powerpc-ieee1275/normal.mod
2394108	/boot/grub2/fonts/unicode.pf2
30914365	/boot/ostree/rhcos-7022fef0ce6fe2dbc74dc84855f917bc677928b7d1f75ac0eb422b3d52472e29/vmlinuz-4.18.0-348.12.2.el8_5.ppc64le
35634133	/boot/ostree/rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f/vmlinuz-4.18.0-372.9.1.el8.ppc64le
35634157	/boot/ostree/rhcos-1d101bb28fbd71daf2b6ffb8cf1261576cfd670cec4604a91b20a713523370a2/vmlinuz-4.18.0-372.13.1.el8_6.ppc64le
89933506	/boot/ostree/rhcos-952c9677e944c268239aec2429e421d2a4fd4c751550a1c5f74d83bd2c5ddb1f/initramfs-4.18.0-372.9.1.el8.ppc64le.img
91958667	/boot/ostree/rhcos-7022fef0ce6fe2dbc74dc84855f917bc677928b7d1f75ac0eb422b3d52472e29/initramfs-4.18.0-348.12.2.el8_5.ppc64le.img

Comment 14 Joseph Marrero 2022-07-08 14:02:44 UTC
It looks like space is consumed by the kernel binaries and current initramfs at 284.074828 megabytes. The grub and other smaller stuff modules seem to take about 6.001679 megabyte.

If the size of the new initramfs is equal to the current ones it would not fit since it's around 91mb and the size of the new kernel binary is a bit bigger which will likely mean a bigger initramfs too. Even if there were no additional files than the vmlinuz and initramfs there would not be space for a third initramfs to be there even transiently because we would be sitting at about ~78mb.

It looks like the kernel grew about 5mb from 4.18.0-348 to 4.18.0-372 and the initramfs 2mb. Which means there is a new 14mb we need that might be the cause of us running out of space here.

Comment 15 Joseph Marrero 2022-07-08 17:07:22 UTC
I think the path forward will likely be: https://github.com/ostreedev/ostree/issues/2670

Comment 16 Colin Walters 2022-07-11 14:35:03 UTC
The ostree side is the most elegant fix.

However, as https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1180469010 notes it will need to be deployed on the version we're upgrading *from*.  Concretely we'd need to ship an updated ostree back to RHEL 8.4.z, get that shipped in OCP 4.10.Z, and then ensure the upgrade graph requires going through 4.10.Z before 4.11.

Whereas, if we patch the MCO like https://github.com/openshift/machine-config-operator/pull/2302#issuecomment-1179091466
then we will not need any upgrade graph changes; the updated MCD will land and operate on older systems *before* we upgrade the host.

While I personally think the ostree fix is most elegant, will be most long term sustainable; the above may argue for reassigning this to the MCO for now.

Comment 17 Joseph Marrero 2022-07-11 14:55:37 UTC
I am OK with that, I can keep the ostree change on my plate and we can solve this for now with: https://github.com/openshift/machine-config-operator/pull/3243/

Comment 20 Timothée Ravier 2022-07-18 11:36:36 UTC
A workaround for this issue has been pushed in the MCO. Could you retry a 4.10 -> 4.11 and 4.11 updates to the latest releases? Thanks!

Comment 21 pdsilva 2022-07-19 12:52:25 UTC
Thanks, Yes I verified that upgrade works fine with the latest RC - 4.11.0-rc.3
The test scenario was 4.10.23 -> 4.11.0-rc.2 -> 4.11.0-rc.3 - PASS

Post upgrade:
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0615fd76754f79d3da5d5aa3627d5321e9846e276e87bb8567e32e1ac65f4fdb
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202207150037-0 (2022-07-15T00:44:41Z)

# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.3   True        False         3h51m   Cluster version is 4.11.0-rc.3

# oc get nodes
NAME       STATUS   ROLES    AGE     VERSION
master-0   Ready    master   7h54m   v1.24.0+9546431
master-1   Ready    master   7h55m   v1.24.0+9546431
master-2   Ready    master   7h54m   v1.24.0+9546431
worker-0   Ready    worker   7h43m   v1.24.0+9546431
worker-1   Ready    worker   7h46m   v1.24.0+9546431

# oc get clusterversion -o json |jq ".items[0].status.history"
[
  {
    "completionTime": "2022-07-19T08:51:59Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:e870c897fd9d87a6839a7747c640cbe862d81f377eb2315b166bb489f2f5abf6",
    "startedTime": "2022-07-19T08:11:35Z",
    "state": "Completed",
    "verified": true,
    "version": "4.11.0-rc.3"
  },
  {
    "completionTime": "2022-07-19T08:01:28Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:1d1a712818baf48944e248922c1be350535347d64d8a9e27c74d38a1a84c9846",
    "startedTime": "2022-07-19T06:51:54Z",
    "state": "Completed",
    "verified": true,
    "version": "4.11.0-rc.2"
  },
  {
    "completionTime": "2022-07-19T05:02:47Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:0b5822f70610ff8f624a8b83378000908f25701dfea83c377abc84d82d636099",
    "startedTime": "2022-07-19T04:41:07Z",
    "state": "Completed",
    "verified": false,
    "version": "4.10.23"
  }
]

# oc get mcp -A
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-aa68b9dd37f3324224072db1bc89b694   True      False      False      3              3                   3                     0                      8h
worker   rendered-worker-00ce21b598e67079e83b697268e00b14   True      False      False      2              2                   2                     0                      8h

Comment 22 Timothée Ravier 2022-07-20 14:49:29 UTC
Setting as verified as we have https://bugzilla.redhat.com/show_bug.cgi?id=2104619#c21

Comment 26 errata-xmlrpc 2023-01-17 19:51:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.