Bug 1900196
Summary: | stalld is not restarted after crash | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Martin Sivák <msivak> |
Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> |
Status: | CLOSED ERRATA | QA Contact: | Simon <skordas> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6.z | CC: | sejug |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:35:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1900261 |
Description
Martin Sivák
2020-11-21 10:21:14 UTC
$ oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-24-113830 True False 75m Cluster version is 4.7.0-0.nightly-2020-11-24-113830 $ oc project openshift-cluster-node-tuning-operator Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas2411.qe.devcluster.openshift.com:6443". $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-149-21.us-east-2.compute.internal Ready master 101m v1.19.2+13d6aa9 ip-10-0-159-175.us-east-2.compute.internal Ready worker 94m v1.19.2+13d6aa9 ip-10-0-170-112.us-east-2.compute.internal Ready master 101m v1.19.2+13d6aa9 ip-10-0-177-142.us-east-2.compute.internal Ready worker 92m v1.19.2+13d6aa9 ip-10-0-210-52.us-east-2.compute.internal Ready master 101m v1.19.2+13d6aa9 ip-10-0-223-201.us-east-2.compute.internal Ready worker 93m v1.19.2+13d6aa9 $ # Using worker node $ node=ip-10-0-159-175.us-east-2.compute.internal $ echo $node ip-10-0-159-175.us-east-2.compute.internal $ oc label node $node node-role.kubernetes.io/worker-rt= node/ip-10-0-159-175.us-east-2.compute.internal labeled $ oc create -f- <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-rt labels: worker-rt: "" spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]} nodeSelector: matchLabels: node-role.kubernetes.io/worker-rt: "" EOF machineconfigpool.machineconfiguration.openshift.io/worker-rt created $ oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-realtime namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Custom OpenShift realtime profile include=openshift-node,realtime [variables] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=1 #isolate_managed_irq=Y not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [bootloader] cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} [service] service.stalld=start,enable name: openshift-realtime recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-rt" priority: 20 profile: openshift-realtime EOF tuned.tuned.openshift.io/openshift-realtime created $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-149-21.us-east-2.compute.internal Ready master 115m v1.19.2+13d6aa9 ip-10-0-159-175.us-east-2.compute.internal Ready worker,worker-rt 107m v1.19.2+13d6aa9 ip-10-0-170-112.us-east-2.compute.internal Ready master 114m v1.19.2+13d6aa9 ip-10-0-177-142.us-east-2.compute.internal Ready worker 106m v1.19.2+13d6aa9 ip-10-0-210-52.us-east-2.compute.internal Ready master 114m v1.19.2+13d6aa9 ip-10-0-223-201.us-east-2.compute.internal Ready worker 106m v1.19.2+13d6aa9 $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7fc779fc3075d82c9dc6e66f4a7da331 True False False 3 3 3 0 114m worker rendered-worker-db825c5f533a49125e760e8a24e1be69 True False False 2 2 2 0 114m worker-rt rendered-worker-rt-ba47b802db57ed1656d6aa35b68f6aee True False False 1 1 1 0 10m $ oc debug node/$node Starting pod/ip-10-0-159-175us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.159.175 If you don't see a command prompt, try pressing enter. sh-4.4# ps auxww | grep stalld root 3425 0.5 0.0 8140 2596 ? Ss 18:37 0:02 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stal ld.pid root 8765 0.0 0.0 9184 1080 pts/0 S+ 18:46 0:00 grep stalld sh-4.4# kill 3425 sh-4.4# ps auxww | grep stalld root 10691 0.7 0.0 7568 2348 ? Ss 18:49 0:00 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid root 10765 0.0 0.0 9184 976 pts/0 S+ 18:49 0:00 grep stalld sh-4.4# kill 10691 sh-4.4# ps auxww | grep stalld root 11127 1.0 0.0 7260 2396 ? Ss 18:50 0:00 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid root 11148 0.0 0.0 9184 1092 pts/0 S+ 18:50 0:00 grep stalld sh-4.4# ps auxww | grep stalld root 11127 0.7 0.0 7568 2396 ? Ss 18:50 0:00 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid root 11167 0.0 0.0 9184 1036 pts/0 S+ 18:50 0:00 grep stalld sh-4.4# exit exit Removing debug pod ... $ # ^^ new stalld ID after killing previous ones Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |