Bug 1960205 - The kubelet log flooded with reconcileState message once CPU manager enabled
Summary: The kubelet log flooded with reconcileState message once CPU manager enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.8.0
Assignee: Jiří Mencák
QA Contact: Walid A.
URL:
Whiteboard:
: 1960206 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-13 10:24 UTC by Artyom
Modified: 2021-07-27 23:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:08:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 771 0 None open Bug 1960205: UPSTREAM: <carry>: verbosity of managed workloads logging 2021-05-25 13:59:50 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:37 UTC

Description Artyom 2021-05-13 10:24:51 UTC
Description of problem:
Our recent change to support managed workloads, introduced a log message under the CPU manager reconcile loop that flood the kubelet log.

May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.516646 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: node-ca-clqfn)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.516543 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: service-ca-5d4d467cc6-k4pmz)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.516406 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: openshift-config-operator-5755486445-ndcj7)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.515829 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: openshift-state-metrics-576bf897b6-mm74t)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470596 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: alertmanager-main-0)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470583 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: apiserver-84dcc57bbb-mvfwf)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470569 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: network-operator-655cff7b5c-82vgj)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470555 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: openshift-controller-manager-operator-7f7d985788-kgqps)

Version-Release number of selected component (if applicable):
master

How reproducible:
Always

Steps to Reproduce:
1. Configure the kubelet with the CPU manager static policy
2. Check kubelet logs
3.

Actual results:
The log is flooded with
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.516646 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: node-ca-clqfn)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.516543 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: service-ca-5d4d467cc6-k4pmz)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.516406 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: openshift-config-operator-5755486445-ndcj7)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.515829 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: openshift-state-metrics-576bf897b6-mm74t)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470596 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: alertmanager-main-0)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470583 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: apiserver-84dcc57bbb-mvfwf)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470569 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: network-operator-655cff7b5c-82vgj)
May 08 00:18:10 cnfde5.ptp.lab.eng.bos.redhat.com hyperkube[1815127]: W0508 00:18:10.470555 1815127 cpu_manager.go:375] [cpumanager] reconcileState: skipping pod; pod is managed (pod: openshift-controller-manager-operator-7f7d985788-kgqps)

Expected results:
We should reduce the log verbosity or remove it at all.

Additional info:

Comment 1 Ryan Phillips 2021-05-13 15:17:11 UTC
*** Bug 1960206 has been marked as a duplicate of this bug. ***

Comment 3 Walid A. 2021-06-02 04:46:23 UTC
On OCP 4.8 IPI cluster on AWS, 4.8.0-0.nightly-2021-06-01-043518, I am still seeing several thousands of messages with CPU manager enabled:

sh-4.4# 
sh-4.4# journalctl --no-pager --since yesterday | grep -c "cpu_manager"
33263
sh-4.4# 
sh-4.4# journalctl --no-pager --since yesterday | grep -c "cpu_manager.go:412"
33242
sh-4.4# 
sh-4.4# journalctl --no-pager | grep -c "cpu_manager.go:412"
33260
sh-4.4# 
sh-4.4# journalctl --no-pager | grep -c "cpu_manager.go:412] \"ReconcileState: ignoring terminated container"
33326
sh-4.4# 
sh-4.4# journalctl --no-pager -f
-- Logs begin at Tue 2021-06-01 13:59:02 UTC. --
Jun 02 04:27:52 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:52.328126    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="aa63191d5ebbdd39d28e28f42813723663b0fbfc45705664e94bf973d678ed65"
Jun 02 04:27:52 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:52.328139    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="349ffa6e1d82d090d954b5b63a68996cacf21e75f7f39eff75a0455ddd698b1f"
Jun 02 04:27:52 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:52.328152    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="8ed635204e177dc16bef900dae11ebbba332615a47b182e13785810cfb1dade3"
Jun 02 04:27:52 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:52.328167    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="227fc478587b72b1531cd90eb9e20c9f87d87a403b824f37f6be983a303e0d82"
Jun 02 04:27:57 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:57.857372    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-monitoring/node-exporter-95llb" containerID="c1e1f6232872768f0fec6677a3910d912ef3d81903cfd0302d0ad029b45fd94e"
Jun 02 04:27:58 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:58.208880    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="783c24dcb86857d3e7a2ad9c5edf32cc567919e40c7465e687eeff48df3ddb0d"
Jun 02 04:27:58 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:58.208912    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="aa63191d5ebbdd39d28e28f42813723663b0fbfc45705664e94bf973d678ed65"
Jun 02 04:27:58 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:58.208931    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="349ffa6e1d82d090d954b5b63a68996cacf21e75f7f39eff75a0455ddd698b1f"
Jun 02 04:27:58 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:58.208946    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="8ed635204e177dc16bef900dae11ebbba332615a47b182e13785810cfb1dade3"
Jun 02 04:27:58 ip-10-0-149-94 hyperkube[1395]: I0602 04:27:58.208961    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="227fc478587b72b1531cd90eb9e20c9f87d87a403b824f37f6be983a303e0d82"
Jun 02 04:28:03 ip-10-0-149-94 hyperkube[1395]: I0602 04:28:03.676608    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-monitoring/node-exporter-95llb" containerID="c1e1f6232872768f0fec6677a3910d912ef3d81903cfd0302d0ad029b45fd94e"
Jun 02 04:28:04 ip-10-0-149-94 hyperkube[1395]: I0602 04:28:04.061887    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="783c24dcb86857d3e7a2ad9c5edf32cc567919e40c7465e687eeff48df3ddb0d"
Jun 02 04:28:04 ip-10-0-149-94 hyperkube[1395]: I0602 04:28:04.061917    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="aa63191d5ebbdd39d28e28f42813723663b0fbfc45705664e94bf973d678ed65"
Jun 02 04:28:04 ip-10-0-149-94 hyperkube[1395]: I0602 04:28:04.061931    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="349ffa6e1d82d090d954b5b63a68996cacf21e75f7f39eff75a0455ddd698b1f"
Jun 02 04:28:04 ip-10-0-149-94 hyperkube[1395]: I0602 04:28:04.061946    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="8ed635204e177dc16bef900dae11ebbba332615a47b182e13785810cfb1dade3"
Jun 02 04:28:04 ip-10-0-149-94 hyperkube[1395]: I0602 04:28:04.061959    1395 cpu_manager.go:412] "ReconcileState: ignoring terminated container" pod="openshift-multus/multus-additional-cni-plugins-sgbgb" containerID="227fc478587b72b1531cd90eb9e20c9f87d87a403b824f37f6be983a303e0d82"
^C
sh-4.4#

Comment 4 Jiří Mencák 2021-06-02 05:53:10 UTC
Thanks for looking into this, Walid.  This is expected and I also noticed and discussed this particular log message with Artyom.
For the moment we decided to address only the '[cpumanager] reconcileState: skipping pod; pod is managed ' message,
introduced in the OpenShift-specific (openshift/kubernetes) repo by https://github.com/openshift/kubernetes/pull/627

The messages you are seeing have not been introduced by PR#627 and could potentially be addressed in the upstream kubernetes/kubernetes repo.
However, this particular BZ report is not about them.

Comment 7 errata-xmlrpc 2021-07-27 23:08:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.