Bug 1442676
| Summary: | Flapping high load on the Openshift masters | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> |
| Component: | Networking | Assignee: | Dan Williams <dcbw> |
| Status: | CLOSED DUPLICATE | QA Contact: | Meng Bo <bmeng> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.4.0 | CC: | aos-bugs, bbennett, dcbw, jeder, ptalbert |
| Target Milestone: | --- | Flags: | mleitner:
needinfo-
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-24 15:18:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 1387149 *** |
Description of problem: High load detected on Openshift masters. The load is so high that 'yum install sos' too hangs. Something is causing iptables to hit 100% cpu usage or cause and in turn is increasing load to such high values that; the other nrpe based nagios checks are failing with socket time The problem appears to be a stale file lock which is used by the iptables command. Openshift is invoking the iptables command with the -w option. When the -w option is used, the iptables program will wait indefinitely for the "xtables" lock unless some timeout is given (spoiler alert: no timeout is given). -w, --wait [seconds] Wait for the xtables lock. To prevent multiple instances of the program from running concurrently, an attempt will be made to obtain an exclusive lock at launch. By default, the program will exit if the lock cannot be obtained. This option will make the program wait (indefinitely or for optional seconds) until the exclusive lock can be obtained. Unfortunately, whatever process last held the lock appears to no longer be running so the outstanding iptables commands will truly be waiting forever... openmaster-67-136-2017-Apr-4-09:33:22$ grep ipt ps root 98366 0.0 0.0 18248 728 ? S 09:40 0:00 iptables -w -C POSTROUTING -t nat -s 10.1.0.0/16 -j MASQUERADE root 98412 0.0 0.0 16056 500 ? S 09:40 0:00 iptables -w -N KUBE-MARK-DROP -t nat root 98477 0.0 0.1 37584 13748 ? R 09:40 0:00 iptables -w -C KUBE-PORTALS-CONTAINER -t nat -m comment --comment app-dev-on-cloud-suite/rhcs-brms-install-demo:9990-tcp -p tcp -m tcp --dport 9990 -d 172.30.33.252/32 -j REDIRECT --to-ports 46761 The xtables lock is an old file lock (flock) at /run/xtables.lock . We can confirm the two sleeping processes are indeed waiting for the lock by checking the lsof output of the report: openmaster-67-136-2017-Apr-4-09:33:22$ grep /run/xtables.lock sos_commands/process/lsof_-b_M_-n_-l iptables 98366 0 3r REG 0,19 0 28474 /run/xtables.lock iptables 98412 0 3r REG 0,19 0 28474 /run/xtables.lock Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: