Bug 1546097 - Master controllers are using high amount of CPU after upgrade to 3.7
Summary: Master controllers are using high amount of CPU after upgrade to 3.7
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.7.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 3.9.0
Assignee: Maciej Szulik
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-16 10:09 UTC by Sunil Choudhary
Modified: 2018-06-27 18:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-06-27 18:01:32 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1515058 None CLOSED Scheduled image import performed every 15 sec with scheduledImageImportMinimumIntervalSeconds: 30 2019-09-16 14:01:11 UTC
Red Hat Bugzilla 1515060 None CLOSED Scheduled image import performed on non scheduled tags 2019-09-16 14:01:11 UTC
Red Hat Bugzilla 1543446 None CLOSED Scheduled images from external registry randomly polled and triggered deployment by overriding default minimum interval ... 2019-09-16 14:01:11 UTC
Red Hat Product Errata RHSA-2018:2013 None None None 2018-06-27 18:02:05 UTC

Description Sunil Choudhary 2018-02-16 10:09:01 UTC
Description of problem:

- After updating from OCP 3.6 to OCP 3.7, OpenShift Master Controllers 
  processes is constantly consuming high CPU.
- At a time only one of the master have high CPU utilization.
- The master with active controller role always have high CPU usage. 
  And when any other master take active controller role (after restart of 
  masters-controllers service), the CPU usage on that master node spikes 
  and CPU usage drops on former master which had active controller role.
- I see "mvcc: required revision has been compacted" messages in logs. 
  However it does not appear to be related to the high CPU usage?

[...]
atomic-openshift-master-controllers[121072]: W0202 13:10:33.934019  121072 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/p
    kg/client/informers/informers_generated/internalversion/factory.go:72: watch of *rbac.Role ended with: etcdserver: mvcc: required revision has been compacted
atomic-openshift-master-controllers[121072]: W0202 13:11:15.941157  121072 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/p
    kg/client/informers/informers_generated/externalversions/factory.go:72: watch of *v1.StorageClass ended with: etcdserver: mvcc: required revision has been compacted
[...]

- Captured perf, pprof data and see openshift master controller using 
  around 52% CPU and around 45% CPU is idle.
- I see most of the time is being spent in 'wait.JitterUntil' and futex related calls.

Kernel version 3.10.0-693.11.6.el7
OpenShift version 3.7.23-1


From perf data:
--------------
Samples: 238K of event 'cpu-clock', Event count (approx.): 59744250000
  Children      Self  Command
+   51.73%    51.73%  openshift
+   45.06%    45.06%  swapper
+    1.39%     1.39%  etcd
     0.49%     0.49%  fluentd

Expanding openshift process:

Samples: 238K of event 'cpu-clock', Event count (approx.): 59744250000
  Children      Self  Command                                                                                   
-   51.73%    51.73%  openshift                                                                                 
   - 46.20% runtime.goexit                                         
      - 24.61% github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until                        
         - 24.48% github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil               
            - 10.96% runtime.selectgo                                                                            
               - 10.32% runtime.selectgoImpl                                                                     
                  - 8.67% runtime.mcall                                                                           
                     - 8.61% runtime.park_m                                                                       
                        - 8.40% runtime.schedule                                                             
                           - 7.85% runtime.findrunnable                                                       
                              - 5.95% runtime.stopm                                                            
                                 - 5.49% runtime.notesleep                                                     
                                    - 5.21% runtime.futex                                                     
                                       - 4.51% system_call_fastpath                                           
                                          - 4.48% sys_futex                                                   
                                             - 4.42% do_futex                                                 
                                                - 4.36% futex_wait                                           
                                                   - 3.19% futex_wait_queue_me                                
                                                      - 3.04% schedule                                      
                                                         - 3.03% __schedule                                   
                                                              2.71% finish_task_switch                       
                                                   + 0.84% futex_wait_setup                                  
                              + 0.89% runtime.runqsteal                                                      
            - 8.25% github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.resetOrReuseTimer   
               - 7.95% time.(*Timer).Reset                                                                   
                  - 7.67% time.startTimer                                                                    
                     + 7.63% runtime.addtimer                                                                
            + 4.61% github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1   
      - 18.49% runtime.timerproc                                                           
         - 11.85% runtime.notetsleepg                                                                        
            - 5.28% runtime.notetsleep_internal                                                              
               - 4.79% runtime.futex                                                                          
                  - 4.14% system_call_fastpath                                                                
                     - 4.09% sys_futex                                                                       
                        - 3.91% do_futex                                                                       
                           - 3.82% futex_wait                                                                   
                              + 2.56% futex_wait_queue_me                                                        
                              + 0.68% futex_wait_setup                                                            
            + 4.98% runtime.systemstack                                                                           
            + 1.03% runtime.exitsyscall                                                                           
         + 5.41% time.sendTime                                                                                    
   + 2.24% 0x7f8f7b193eb0                                                                                         
   + 1.68% runtime.futex                                                                                           
     0.67% 0x5acacf0

From pprof data:
---------------
(pprof) top
15.08s of 26.54s total (56.82%)
Dropped 232 nodes (cum <= 0.13s)
Showing top 10 nodes out of 80 (cum >= 5.30s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     21.08s 79.43%  runtime.goexit
    14.53s 54.75% 54.75%     14.53s 54.75%  runtime.futex
     0.10s  0.38% 55.12%     11.08s 41.75%  github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
         0     0% 55.12%     11.08s 41.75%  github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until
     0.18s  0.68% 55.80%      9.83s 37.04%  runtime.timerproc
     0.07s  0.26% 56.07%      9.48s 35.72%  runtime.notewakeup
     0.02s 0.075% 56.14%      9.45s 35.61%  runtime.futexwakeup
     0.11s  0.41% 56.56%      8.19s 30.86%  runtime.systemstack
     0.02s 0.075% 56.63%      5.33s 20.08%  runtime.notetsleepg
     0.05s  0.19% 56.82%      5.30s 19.97%  runtime.startm

(pprof) peek
23.55s of 26.54s total (88.73%)
Dropped 232 nodes (cum <= 0.13s)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
         0     0%     0%     21.08s 79.43%                | runtime.goexit
                                            11.05s 52.92% |   github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until
                                             9.83s 47.08% |   runtime.timerproc
----------------------------------------------------------+-------------
                                             9.43s 64.90% |   runtime.futexwakeup
                                             5.10s 35.10% |   runtime.futexsleep
    14.53s 54.75% 54.75%     14.53s 54.75%                | runtime.futex
----------------------------------------------------------+-------------
                                            11.08s   100% |   github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until
     0.10s  0.38% 55.12%     11.08s 41.75%                | github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
                                             4.71s 43.21% |   github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.resetOrReuseTimer
                                             3.97s 36.42% |   runtime.selectgo
                                             2.22s 20.37% |   github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
----------------------------------------------------------+-------------
                                            11.05s   100% |   runtime.goexit
         0     0% 55.12%     11.08s 41.75%                | github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until
                                            11.08s   100% |   github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
----------------------------------------------------------+-------------
                                             9.83s   100% |   runtime.goexit
     0.18s  0.68% 55.80%      9.83s 37.04%                | runtime.timerproc
                                             5.33s 55.29% |   runtime.notetsleepg
                                             3.90s 40.46% |   time.sendTime
                                             0.22s  2.28% |   runtime.unlock
                                             0.14s  1.45% |   runtime.siftdownTimer
                                             0.05s  0.52% |   runtime.lock
----------------------------------------------------------+-------------
                                             5.17s 54.54% |   runtime.startm
                                             4.26s 44.94% |   runtime.addtimerLocked
                                             0.05s  0.53% |   runtime.exitsyscallfast_pidle
     0.07s  0.26% 56.07%      9.48s 35.72%                | runtime.notewakeup
                                             9.41s   100% |   runtime.futexwakeup
----------------------------------------------------------+-------------
                                             9.41s 99.58% |   runtime.notewakeup
                                             0.04s  0.42% |   runtime.unlock
     0.02s 0.075% 56.14%      9.45s 35.61%                | runtime.futexwakeup
                                             9.43s   100% |   runtime.futex

Comment 19 Maciej Szulik 2018-04-05 11:22:59 UTC
From checking the logs it looks like the problem has been addressed in the latest version of openshift (3.9). See the attached BZs for the exact details what has been fixed. Moving this bug to on-qa.

Comment 20 Wang Haoran 2018-04-08 10:43:02 UTC
Move to verify acorrding to the attached fixed bugs.

Comment 22 errata-xmlrpc 2018-06-27 18:01:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2013


Note You need to log in before you can comment on or make changes to this bug.