Bug 1608571
Summary: | OCP 3.10: unable to pull images on compute node due to dnsmasq failures after running scale tests | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Walid A. <wabouham> | ||||
Component: | Networking | Assignee: | Ivan Chavero <ichavero> | ||||
Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | unspecified | CC: | aos-bugs, dmace, hongli, mifiedle, shzhou, tibrahim, tmanor, wabouham, weliang, wmeng | ||||
Version: | 3.10.0 | Keywords: | NeedsTestCase | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.11.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | aos-scalability-310 | ||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-11 07:22:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Walid A.
2018-07-25 20:16:35 UTC
Created attachment 1470566 [details]
dnsmasq configuration on compute node
I'm trying to replicate the issue, did you use the Ansible Installer for this cluster? @Ivan, yes, I used the openshift-ansible deloy_cluster.yml playbook to build this cluster. @walid, can you still reproduce this bug after you install the fixed PR and run your testing scripts? @Weibin, the PR fix in Comment 11 appears to resolve this issue. I ran the same automated the scripts (SVT Conformance followed by Node Vertical test with 500 pods per node) on the AWS clusters installed with the openshift-ansible PR fix. So far, I am not hitting the dnsmasq failures anymore. Also after the testcases that used to leave 1052+ open files by dnmasq, I am now seeing only 40-50 files open while executing the next testcase, so dnsmaq appears to be closing the files accordingly: # lsof 2>/dev/null | grep dnsmasq | wc -l 50 @Walid, thanks for your confirmation. Verified on OCP v3.11.0-0.19.0: cd /etc/systemd/system/dnsmasq.service.d cat override.conf [Service] LimitNOFILE=65535 Node Vertical test with 500 pods per node was successfully executed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |