Bug 1900666
| Summary: | Increased etcd fsync latency as of OCP 4.6 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.6 | CC: | amurdaca, jeder, jhopper, jhou, kgarriso, miabbott, mifiedle, nelluri, oarribas, sbatsche, sdodson, wking, wlewis |
| Target Milestone: | --- | Keywords: | Performance, Regression, ServiceDeliveryBlocker |
| Target Release: | 4.6.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-12-14 13:51:18 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1899600 | ||
| Bug Blocks: | |||
|
Comment 3
Sam Batschelet
2020-11-30 17:09:16 UTC
Verified on 4.6.0-0.nightly-2020-12-04-033739. Scheduler is set to bfq on master $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-12-04-033739 True False 12m Cluster version is 4.6.0-0.nightly-2020-12-04-033739 $ oc get nodes | grep master ip-10-0-157-110.us-west-2.compute.internal Ready master 37m v1.19.0+1348ff8 ip-10-0-161-30.us-west-2.compute.internal Ready master 37m v1.19.0+1348ff8 ip-10-0-216-159.us-west-2.compute.internal Ready master 37m v1.19.0+1348ff8 $ oc debug node/ip-10-0-157-110.us-west-2.compute.internal Starting pod/ip-10-0-157-110us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# cat /sys/block/nvme0n1/queue/scheduler [none] mq-deadline kyber bfq I also tested this with 4.6.7 on GCP
The PR switches the scheduler on *all* nodes during OS updates, including switching to kernel-rt.
```
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.7 True False 2m5s Cluster version is 4.6.7
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-rq1mq4b-f76d1-xf849-master-0 Ready master 40m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-master-1 Ready master 40m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-master-2 Ready master 40m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-worker-b-tmklt Ready worker 30m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4 Ready worker 30m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-worker-d-tccv4 Ready worker 34m v1.19.0+1348ff8
$ cat machineConfigs/worker-realtime.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: "worker"
name: 99-worker-kerneltype
spec:
kernelType: realtime
$ oc apply -f machineConfigs/worker-realtime.yaml
machineconfig.machineconfiguration.openshift.io/99-worker-kerneltype created
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-rq1mq4b-f76d1-xf849-master-0 Ready master 67m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-master-1 Ready master 67m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-master-2 Ready master 67m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-worker-b-tmklt Ready worker 57m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4 Ready,SchedulingDisabled worker 57m v1.19.0+1348ff8
ci-ln-rq1mq4b-f76d1-xf849-worker-d-tccv4 Ready worker 61m v1.19.0+1348ff8
$ oc debug nodes/ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4
Starting pod/ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.32.4
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
sh-4.4# watch cat /sys/block/sda/queue/scheduler
sh-4.4# cat /sys/block/sda/queue/scheduler
mq-deadline kyber [bfq] none
sh-4.4#
Removing debug pod ...
```
Verified on 4.6.7 promoted candidate. Inside etcd pod on 4.6.6: # cat /sys/class/block/nvme0n1/queue/scheduler mq-deadline kyber [bfq] none 4.6.7: # cat /sys/class/block/nvme0n1/queue/scheduler [none] mq-deadline kyber bfq Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.6.8 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5259 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |