Bug 1905492
| Summary: | The stalld service has a higher scheduler priority than ksoftirq and rcu{b, c} threads | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Artyom <alukiano> | |
| Component: | Performance Addon Operator | Assignee: | Martin Sivák <msivak> | |
| Status: | CLOSED ERRATA | QA Contact: | Gowrishankar Rajaiyan <grajaiya> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.7 | CC: | aos-bugs, grajaiya, mniranja, shajmakh | |
| Target Milestone: | --- | |||
| Target Release: | 4.7.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1960386 1985365 (view as bug list) | Environment: | ||
| Last Closed: | 2021-02-24 15:41:14 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1960386, 1985371 | |||
Version: -------- [root@dell-r730-009 cnf-internal-deploy]# oc version Client Version: 4.7.0-0.nightly-2020-12-04-013308 Server Version: 4.7.0-0.nightly-2020-12-21-131655 Kubernetes Version: v1.20.0+87544c5 Version of Performance Operator: --------------------------------- rh-osbs/openshift4-performance-addon-operator-bundle-registry-container-rhel8:v4.7.0-285 Index image v4.7: registry-proxy.engineering.redhat.com/rh-osbs/iib:34633 [root@dell-r730-009 cnf-internal-deploy]# oc logs pods/performance-operator-767ddb449-lknqr -n openshift-performance-addon-operator I1228 07:50:49.908341 1 main.go:72] Operator Version: I1228 07:50:49.908453 1 main.go:73] Git Commit: I1228 07:50:49.908461 1 main.go:74] Build Date: 2020-12-22T13:00:05+0000 I1228 07:50:49.908475 1 main.go:75] Go Version: go1.13.15 I1228 07:50:49.908481 1 main.go:76] Go OS/Arch: linux/amd64 I1228 07:50:50.962708 1 request.go:621] Throttling request took 1.036451752s, request: GET:https://172.30.0.1:443/apis/tuned.openshift.io/v1?timeout=32s Deployed performance profile and checked the tuned profile pod and it contains the relevant information. [main] summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5) include=openshift-node,cpu-partitioning [variables] isolated_cores=2-15 not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [cpu] force_latency=cstate.id:1|3 # latency-performance (override) governor=performance # latency-performance energy_perf_bias=performance # latency-performance min_perf_pct=100 # latency-performance [service] service.stalld=start,enable [vm] transparent_hugepages=never # network-latency [scheduler] group.ksoftirqd=0:f:11:*:ksoftirqd.* group.rcuc=0:f:11:*:rcuc.* Marking it verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633  | 
Description of problem: The stalld service has higher scheduler priority than ksoftirq and rcu{b, c} threads, that can lead to the system freeze when running load intensive processes as oslat. Version-Release number of selected component (if applicable): Client Version: 4.6.0-0.nightly-2020-07-25-091217 Server Version: 4.7.0-0.ci-2020-12-08-050547 Kubernetes Version: v1.19.2-1007+ad738ba548b6d6-dirty How reproducible: Always Steps to Reproduce: 1. run oslat tool on the node to create pressure on the real-time kernel environment 2. 3. Actual results: The host will be freeze until the oslat will finish running Expected results: The host should continue to work as expected. Additional info: