Bug 1873383

Summary: [4.4] Need to upgrade host and kernel-rt layer atomically
Product: OpenShift Container Platform Reporter: Sinny Kumari <skumari>
Component: Machine Config OperatorAssignee: Sinny Kumari <skumari>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: alukiano, amurdaca, bbreard, dshchedr, imcleod, jlebon, jligon, lcapitulino, mnguyen, mtosatti, nilal, nstielau, ocohen, skumari, smilner, walters
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1861026 Environment:
Last Closed: 2020-09-22 06:58:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1861026    
Bug Blocks:    

Comment 1 Sinny Kumari 2020-09-02 08:52:56 UTC
We have seen another version of this issue in ci test run:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.4/1300515922993221632
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.4/1300334532586639360
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.4/1300153085259157504

In these tests, we are doing realtime kernel switch as day1. rt-kernel switch occurred fine but next operation which is OS rebase failed with selinux denail issue
/var/log/audit/audit.log:type=AVC msg=audit(1599035650.660:160): avc:  denied  { nnp_transition nosuid_transition } for  pid=25425 comm="dracut" scontext=system_u:system_r:install_t:s0 tcontext=system_u:system_r:setfiles_mac_t:s0 tclass=process2 permissive=0
/var/log/audit/audit.log:type=SELINUX_ERR msg=audit(1599035650.660:160): op=security_bounded_transition seresult=denied oldcontext=system_u:system_r:install_t:s0 newcontext=system_u:system_r:setfiles_mac_t:s0

https://github.com/openshift/machine-config-operator/pull/2036 should fix the issue.
Since ci issue which we are hitting here is a day1 relatime switch, we will need to update bootimage as well when we have m-c-d package containing fixes from https://github.com/openshift/machine-config-operator/pull/2036

Comment 4 Michael Nguyen 2020-09-17 18:32:46 UTC
Verified on 4.4.0-0.nightly-2020-09-17-023939

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         37m     Cluster version is 4.4.5
$ cat rt-kernel.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-kerneltype
spec:
  config:
    ignition:
      version: 2.2.0
  kernelType: realtime
$ oc create -f rt-kernel.yaml 
machineconfig.machineconfiguration.openshift.io/99-worker-kerneltype created
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
00-worker                                                   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-master-container-runtime                                 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-master-kubelet                                           480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-worker-container-runtime                                 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
01-worker-kubelet                                           480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
99-master-7fc006b6-c3b7-46f2-80d0-1cc83ffbee9c-registries   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
99-master-ssh                                                                                          2.2.0             53m
99-worker-aa0cefe2-2628-4ece-b672-830e624f1e5b-registries   480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
99-worker-kerneltype                                                                                   2.2.0             3s
99-worker-ssh                                                                                          2.2.0             53m
rendered-master-e15c9746ca76a23bf464876183b10b6b            480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
rendered-worker-5fdac24dfa23ffc8c93186cb77a2f10d            480accd5d4f631d34e560aa5c8a3dfab0c7bbe27   2.2.0             51m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-5fdac24dfa23ffc8c93186cb77a2f10d   True      False      False      3              3                   3                     0                      53m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-5fdac24dfa23ffc8c93186cb77a2f10d   False     True       False      3              0                   0                     0                      53m
$ watch oc get mcp/worker
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-65ceeb8c35e9a1a85b7b43bf851e1105   True      False      False      3              3                   3                     0                      86m
$ oc get nodes
NAME                                                    STATUS   ROLES    AGE   VERSION
mnguye-xrzfw-m-0.c.openshift-gce-devel.internal         Ready    master   91m   v1.17.1
mnguye-xrzfw-m-1.c.openshift-gce-devel.internal         Ready    master   91m   v1.17.1
mnguye-xrzfw-m-2.c.openshift-gce-devel.internal         Ready    master   91m   v1.17.1
mnguye-xrzfw-w-a-fsxd4.c.openshift-gce-devel.internal   Ready    worker   79m   v1.17.1
mnguye-xrzfw-w-b-bgzw9.c.openshift-gce-devel.internal   Ready    worker   79m   v1.17.1
mnguye-xrzfw-w-c-lmm94.c.openshift-gce-devel.internal   Ready    worker   79m   v1.17.1
$ oc debug node/mnguye-xrzfw-w-a-fsxd4.c.openshift-gce-devel.internal -- chroot /host rpm -qa kernel*
Starting pod/mnguye-xrzfw-w-a-fsxd4copenshift-gce-develinternal-debug ...
To use host binaries, run `chroot /host`
kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64

Removing debug pod ...
$ oc adm upgrade --to=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939 --force
error: --to must be a semantic version (e.g. 4.0.1 or 4.1.0-nightly-20181104): Invalid character(s) found in major number "registry"
$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939 --force
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         73m     Cluster version is 4.4.5
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         73m     Cluster version is 4.4.5
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        True          1s      Working towards registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-09-17-023939: downloading update
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-09-17-023939   True        False         74s     Cluster version is 4.4.0-0.nightly-2020-09-17-023939
$ oc debug node/mnguye-xrzfw-w-a-fsxd4.c.openshift-gce-devel.internal -- chroot /host rpm -qa kernel*
Starting pod/mnguye-xrzfw-w-a-fsxd4copenshift-gce-develinternal-debug ...
To use host binaries, run `chroot /host`
kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64

Removing debug pod ...

Comment 6 errata-xmlrpc 2020-09-22 06:58:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.23 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3715