Bug 1873383 - [4.4] Need to upgrade host and kernel-rt layer atomically
Summary: [4.4] Need to upgrade host and kernel-rt layer atomically
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.4.z
Assignee: Sinny Kumari
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1861026
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-28 05:40 UTC by Sinny Kumari
Modified: 2020-09-22 06:58 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1861026
Environment:
Last Closed: 2020-09-22 06:58:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2036 0 None closed Bug 1873383: daemon: perform other rpm-ostree operations after OS rebase 2021-02-10 18:57:31 UTC
Red Hat Product Errata RHBA-2020:3715 0 None None None 2020-09-22 06:58:45 UTC

Comment 1 Sinny Kumari 2020-09-02 08:52:56 UTC
We have seen another version of this issue in ci test run:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.4/1300515922993221632
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.4/1300334532586639360
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.4/1300153085259157504

In these tests, we are doing realtime kernel switch as day1. rt-kernel switch occurred fine but next operation which is OS rebase failed with selinux denail issue
/var/log/audit/audit.log:type=AVC msg=audit(1599035650.660:160): avc:  denied  { nnp_transition nosuid_transition } for  pid=25425 comm="dracut" scontext=system_u:system_r:install_t:s0 tcontext=system_u:system_r:setfiles_mac_t:s0 tclass=process2 permissive=0
/var/log/audit/audit.log:type=SELINUX_ERR msg=audit(1599035650.660:160): op=security_bounded_transition seresult=denied oldcontext=system_u:system_r:install_t:s0 newcontext=system_u:system_r:setfiles_mac_t:s0

https://github.com/openshift/machine-config-operator/pull/2036 should fix the issue.
Since ci issue which we are hitting here is a day1 relatime switch, we will need to update bootimage as well when we have m-c-d package containing fixes from https://github.com/openshift/machine-config-operator/pull/2036

Comment 4 Michael Nguyen 2020-09-17 18:32:46 UTC
Verified on 4.4.0-0.nightly-2020-09-17-023939

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         37m     Cluster version is 4.4.5
$ cat rt-kernel.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-kerneltype
spec:
  config:
    ignition:
      version: 2.2.0
  kernelType: realtime
$ oc create -f rt-kernel.yaml 
machineconfig.machineconfiguration.openshift.io/99-worker-kerneltype created
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
00-worker                                                   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-master-container-runtime                                 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-master-kubelet                                           480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-worker-container-runtime                                 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-worker-kubelet                                           480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
99-master-7fc006b6-c3b7-46f2-80d0-1cc83ffbee9c-registries   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
99-master-ssh                                                                                          2.2.0             53m
99-worker-aa0cefe2-2628-4ece-b672-830e624f1e5b-registries   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
99-worker-kerneltype                                                                                   2.2.0             3s
99-worker-ssh                                                                                          2.2.0             53m
rendered-master-e15c9746ca76a23bf464876183b10b6b            480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
rendered-worker-5fdac24dfa23ffc8c93186cb77a2f10d            480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-5fdac24dfa23ffc8c93186cb77a2f10d   True      False      False      3              3                   3                     0                      53m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-5fdac24dfa23ffc8c93186cb77a2f10d   False     True       False      3              0                   0                     0                      53m
$ watch oc get mcp/worker
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-65ceeb8c35e9a1a85b7b43bf851e1105   True      False      False      3              3                   3                     0                      86m
$ oc get nodes
NAME                                                    STATUS   ROLES    AGE   VERSION
mnguye-xrzfw-m-0.c.openshift-gce-devel.internal         Ready    master   91m   v1.17.1
mnguye-xrzfw-m-1.c.openshift-gce-devel.internal         Ready    master   91m   v1.17.1
mnguye-xrzfw-m-2.c.openshift-gce-devel.internal         Ready    master   91m   v1.17.1
mnguye-xrzfw-w-a-fsxd4.c.openshift-gce-devel.internal   Ready    worker   79m   v1.17.1
mnguye-xrzfw-w-b-bgzw9.c.openshift-gce-devel.internal   Ready    worker   79m   v1.17.1
mnguye-xrzfw-w-c-lmm94.c.openshift-gce-devel.internal   Ready    worker   79m   v1.17.1
$ oc debug node/mnguye-xrzfw-w-a-fsxd4.c.openshift-gce-devel.internal -- chroot /host rpm -qa kernel*
Starting pod/mnguye-xrzfw-w-a-fsxd4copenshift-gce-develinternal-debug ...
To use host binaries, run `chroot /host`
kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64

Removing debug pod ...
$ oc adm upgrade --to=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939 --force
error: --to must be a semantic version (e.g. 4.0.1 or 4.1.0-nightly-20181104): Invalid character(s) found in major number "registry"
$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939 --force
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         73m     Cluster version is 4.4.5
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         73m     Cluster version is 4.4.5
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        True          1s      Working towards registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939: downloading update
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-09-17-023939   True        False         74s     Cluster version is 4.4.0-0.nightly-2020-09-17-023939
$ oc debug node/mnguye-xrzfw-w-a-fsxd4.c.openshift-gce-devel.internal -- chroot /host rpm -qa kernel*
Starting pod/mnguye-xrzfw-w-a-fsxd4copenshift-gce-develinternal-debug ...
To use host binaries, run `chroot /host`
kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64

Removing debug pod ...

Comment 6 errata-xmlrpc 2020-09-22 06:58:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.23 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3715


Note You need to log in before you can comment on or make changes to this bug.