Bug 2029371 - patch pipeline--worker nodes unexpectedly reboot during scale out
Summary: patch pipeline--worker nodes unexpectedly reboot during scale out
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Jiří Mencák
QA Contact: liqcui
URL:
Whiteboard:
Depends On:
Blocks: 2029693
TreeView+ depends on / blocked
 
Reported: 2021-12-06 10:29 UTC by Jiří Mencák
Modified: 2022-03-10 16:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2024682
Environment:
Last Closed: 2022-03-10 16:32:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 293 0 None open Bug 2029371: controller: update MC after application by TuneD 2021-12-06 10:30:13 UTC
Red Hat Product Errata RHSA-2022:0056 0 None Waiting on Customer multiple objects listed with "VersionId": "null" "IsLatest": true 2022-06-23 14:00:22 UTC

Comment 2 Jiří Mencák 2021-12-07 13:29:21 UTC
A note to QE.  I believe this is fixed as of 4.10.0-0.nightly-2021-12-07-095056.  Just tested by the method described in
https://bugzilla.redhat.com/show_bug.cgi?id=2024682#c7

None of the nodes sharing the same MCP rebooted apart from the one being scaled up and no trace of
updated MachineConfig 50-nto-worker-rt with ignition and kernel parameters: []

in the logs.

Can we please get this VERIFIED so that we can backport this down to 4.8 ASAP?  Thank you!

Comment 4 liqcui 2021-12-08 07:13:32 UTC
Verified Result:

oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-12-06-201335   True        False         175m    Cluster version is 4.10.0-0.nightly-2021-12-06-201335

node1=ip-10-0-170-206.us-east-2.compute.internal
node2=ip-10-0-221-240.us-east-2.compute.internal

oc label no $node1 ${nodeLabelRealtime}= --overwrite
node/ip-10-0-170-206.us-east-2.compute.internal not labeled

oc label no $node2 ${nodeLabelRealtime}= --overwrite

oc create -f $profileRealtime
profileRealtime="../testing_manifests/stalld.yaml"
mcpRealtime="../../../examples/realtime-mcp.yaml"
nodeLabelRealtime="node-role.kubernetes.io/worker-rt"
oc create -f $mcpRealtime
machineconfigpool.machineconfiguration.openshift.io/worker-rt created

oc scale machineset/liqcui-ocaws410-v9vjb-worker-us-east-2a --replicas=1 -n openshift-machine-api

 oc logs tuned-zn9wd  -n openshift-cluster-node-tuning-operator |tail -5
E1208 06:54:40.904915    1833 tuned.go:776] unable to sync(daemon/) requeued (3)
E1208 06:54:40.905075    1833 tuned.go:776] unable to sync(daemon/) requeued (4)
2021-12-08 06:54:40,953 INFO     tuned.plugins.plugin_script: calling script '/usr/lib/tuned/realtime/script.sh' with arguments '['start']'
E1208 06:54:41.190305    1833 tuned.go:776] unable to sync(daemon/) requeued (5)
2021-12-08 06:54:41,372 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-realtime' applied
[ocpadmin@ec2-18-217-45-133 ~]$ oc logs tuned-86lpp  -n openshift-cluster-node-tuning-operator |tail -5
E1208 06:29:10.384597    1789 tuned.go:776] unable to sync(daemon/) requeued (3)
E1208 06:29:10.384706    1789 tuned.go:776] unable to sync(daemon/) requeued (4)
2021-12-08 06:29:10,433 INFO     tuned.plugins.plugin_script: calling script '/usr/lib/tuned/realtime/script.sh' with arguments '['start']'
E1208 06:29:10.727631    1789 tuned.go:776] unable to sync(daemon/) requeued (5)
2021-12-08 06:29:11,064 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-realtime' applied
[ocpadmin@ec2-18-217-45-133 ~]$ oc logs tuned-6txlp  -n openshift-cluster-node-tuning-operator |tail -5
E1208 06:26:51.002255    2135 tuned.go:776] unable to sync(daemon/) requeued (4)
E1208 06:26:51.002400    2135 tuned.go:776] unable to sync(daemon/) requeued (5)
2021-12-08 06:26:51,029 INFO     tuned.plugins.plugin_script: calling script '/usr/lib/tuned/realtime/script.sh' with arguments '['start']'
E1208 06:26:51.618703    2135 tuned.go:776] unable to sync(daemon/) requeued (6)
2021-12-08 06:26:52,039 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-realtime' applied

No old node rebooted, only new scaled node rebooted

Comment 7 errata-xmlrpc 2022-03-10 16:32:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.