Bug 1900666
Summary: | Increased etcd fsync latency as of OCP 4.6 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | amurdaca, jeder, jhopper, jhou, kgarriso, miabbott, mifiedle, nelluri, oarribas, sbatsche, sdodson, wking, wlewis |
Target Milestone: | --- | Keywords: | Performance, Regression, ServiceDeliveryBlocker |
Target Release: | 4.6.z | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-14 13:51:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1899600 | ||
Bug Blocks: |
Comment 3
Sam Batschelet
2020-11-30 17:09:16 UTC
Verified on 4.6.0-0.nightly-2020-12-04-033739. Scheduler is set to bfq on master $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-12-04-033739 True False 12m Cluster version is 4.6.0-0.nightly-2020-12-04-033739 $ oc get nodes | grep master ip-10-0-157-110.us-west-2.compute.internal Ready master 37m v1.19.0+1348ff8 ip-10-0-161-30.us-west-2.compute.internal Ready master 37m v1.19.0+1348ff8 ip-10-0-216-159.us-west-2.compute.internal Ready master 37m v1.19.0+1348ff8 $ oc debug node/ip-10-0-157-110.us-west-2.compute.internal Starting pod/ip-10-0-157-110us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# cat /sys/block/nvme0n1/queue/scheduler [none] mq-deadline kyber bfq I also tested this with 4.6.7 on GCP The PR switches the scheduler on *all* nodes during OS updates, including switching to kernel-rt. ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.7 True False 2m5s Cluster version is 4.6.7 $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-rq1mq4b-f76d1-xf849-master-0 Ready master 40m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-master-1 Ready master 40m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-master-2 Ready master 40m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-worker-b-tmklt Ready worker 30m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4 Ready worker 30m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-worker-d-tccv4 Ready worker 34m v1.19.0+1348ff8 $ cat machineConfigs/worker-realtime.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kerneltype spec: kernelType: realtime $ oc apply -f machineConfigs/worker-realtime.yaml machineconfig.machineconfiguration.openshift.io/99-worker-kerneltype created $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-rq1mq4b-f76d1-xf849-master-0 Ready master 67m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-master-1 Ready master 67m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-master-2 Ready master 67m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-worker-b-tmklt Ready worker 57m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4 Ready,SchedulingDisabled worker 57m v1.19.0+1348ff8 ci-ln-rq1mq4b-f76d1-xf849-worker-d-tccv4 Ready worker 61m v1.19.0+1348ff8 $ oc debug nodes/ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4 Starting pod/ci-ln-rq1mq4b-f76d1-xf849-worker-c-hqln4-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.32.4 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# cat /sys/block/sda/queue/scheduler [mq-deadline] kyber bfq none sh-4.4# watch cat /sys/block/sda/queue/scheduler sh-4.4# cat /sys/block/sda/queue/scheduler mq-deadline kyber [bfq] none sh-4.4# Removing debug pod ... ``` Verified on 4.6.7 promoted candidate. Inside etcd pod on 4.6.6: # cat /sys/class/block/nvme0n1/queue/scheduler mq-deadline kyber [bfq] none 4.6.7: # cat /sys/class/block/nvme0n1/queue/scheduler [none] mq-deadline kyber bfq Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.6.8 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5259 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |