Bug 2060997

Summary: should support sysctls: kernel.shm_rmid_forced = 0
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED WORKSFORME Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, pehunt, sippy
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-07 17:13:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2022-03-04 21:11:37 UTC
[sig-node] Sysctls [LinuxOnly] [NodeConformance] should support sysctls [MinimumKubeletVersion:1.21] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

is failing frequently in CI, see:
https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=%5Bsig-node%5D%20Sysctls%20%5BLinuxOnly%5D%20%5BNodeConformance%5D%20should%20support%20sysctls%20%5BMinimumKubeletVersion%3A1.21%5D%20%5BConformance%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%2Fminimal%5D%20%5BSuite%3Ak8s%5D

Plenty of failures; seems like they're 4.10 and later:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=build-log&search=kernel.shm_rmid_forced+=+0' | grep 'failures match' | grep -v 'pull-ci\|rehearse' | sort
periodic-ci-openshift-hypershift-main-periodics-e2e-aws-pooled-periodic-conformance (all) - 11 runs, 45% failed, 40% of failures match = 18% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-e2e-aws-arm64 (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-e2e-aws-arm64-single-node (all) - 6 runs, 83% failed, 20% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-e2e-aws-arm64-techpreview (all) - 6 runs, 33% failed, 100% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-e2e-aws-ovn-arm64 (all) - 6 runs, 50% failed, 33% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-e2e-remote-libvirt-s390x (all) - 5 runs, 100% failed, 20% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-e2e-aws-arm64 (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-remote-libvirt-ppc64le (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn (all) - 6 runs, 50% failed, 67% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn-upgrade (all) - 120 runs, 48% failed, 42% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview (all) - 6 runs, 83% failed, 40% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn (all) - 6 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 60 runs, 65% failed, 46% of failures match = 30% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade-single-node (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp (all) - 12 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-ovn (all) - 6 runs, 83% failed, 40% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade (all) - 132 runs, 42% failed, 45% of failures match = 19% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade (all) - 132 runs, 61% failed, 29% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade (all) - 12 runs, 58% failed, 29% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-azure-upgrade (all) - 120 runs, 93% failed, 21% of failures match = 19% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade (all) - 60 runs, 55% failed, 48% of failures match = 27% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn-upgrade (all) - 130 runs, 46% failed, 45% of failures match = 21% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-gcp (all) - 14 runs, 36% failed, 60% of failures match = 21% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade (all) - 143 runs, 52% failed, 34% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade (all) - 145 runs, 47% failed, 37% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade (all) - 13 runs, 23% failed, 67% of failures match = 15% impact
periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade (all) - 131 runs, 79% failed, 17% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws (all) - 12 runs, 67% failed, 75% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy (all) - 6 runs, 67% failed, 25% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node (all) - 6 runs, 83% failed, 40% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-upgrade (all) - 66 runs, 58% failed, 42% of failures match = 24% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-azure (all) - 6 runs, 83% failed, 40% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp (all) - 6 runs, 50% failed, 67% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp-rt (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi (all) - 15 runs, 60% failed, 56% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-dualstack (all) - 6 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6 (all) - 18 runs, 83% failed, 27% of failures match = 22% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 5 runs, 80% failed, 25% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-virtualmedia (all) - 6 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-ovirt (all) - 13 runs, 77% failed, 10% of failures match = 8% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview (all) - 10 runs, 60% failed, 17% of failures match = 10% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-upi (all) - 10 runs, 60% failed, 33% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade (all) - 6 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-metal-ipi-upgrade (all) - 5 runs, 60% failed, 33% of failures match = 20% impact
promote-release-openshift-machine-os-content-e2e-aws-4.10 (all) - 174 runs, 1% failed, 100% of failures match = 1% impact
promote-release-openshift-machine-os-content-e2e-aws-4.11 (all) - 161 runs, 2% failed, 75% of failures match = 2% impact
release-openshift-ocp-installer-e2e-aws-upi-4.10 (all) - 6 runs, 83% failed, 20% of failures match = 17% impact

Picking periodic-ci-openshift-release-master-nightly-4.1*-e2e-aws as a core job with a bunch of runs and a high failure rate for this mode, [1] gives recent hits, including [2]:

  : [sig-node] Sysctls [LinuxOnly] [NodeConformance] should support sysctls [MinimumKubeletVersion:1.21] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
  Run #0: Failed expand_less	13s
  fail [k8s.io/kubernetes.0/test/e2e/common/node/sysctl.go:112]: Expected
      <string>: kernel.shm_rmid_forced = 0
    
  to contain substring
      <string>: kernel.shm_rmid_forced = 1

[3] gives brackets on the regression, more on that in a future comment.

[1]: https://search.ci.openshift.org/?search=kernel.shm_rmid_forced+%3D+0&maxAge=48h&type=junit&name=periodic-ci-openshift-release-master-nightly-4.1%5B0-9%5D*-e2e-aws%24
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws/1499822740935282688
[3]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-e2e-aws

Comment 1 Peter Hunt 2022-03-04 21:17:22 UTC
this should have been fixed by cri-o-1.23.1-12.rhaos4.10.git1607c6e.el8