Bug 1608571
| Summary: | OCP 3.10: unable to pull images on compute node due to dnsmasq failures after running scale tests | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Walid A. <wabouham> | ||||
| Component: | Networking | Assignee: | Ivan Chavero <ichavero> | ||||
| Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | unspecified | CC: | aos-bugs, dmace, hongli, mifiedle, shzhou, tibrahim, tmanor, wabouham, weliang, wmeng | ||||
| Version: | 3.10.0 | Keywords: | NeedsTestCase | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.11.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | aos-scalability-310 | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-11 07:22:15 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Walid A.
2018-07-25 20:16:35 UTC
Created attachment 1470566 [details]
dnsmasq configuration on compute node
I'm trying to replicate the issue, did you use the Ansible Installer for this cluster? @Ivan, yes, I used the openshift-ansible deloy_cluster.yml playbook to build this cluster. @walid, can you still reproduce this bug after you install the fixed PR and run your testing scripts? @Weibin, the PR fix in Comment 11 appears to resolve this issue. I ran the same automated the scripts (SVT Conformance followed by Node Vertical test with 500 pods per node) on the AWS clusters installed with the openshift-ansible PR fix. So far, I am not hitting the dnsmasq failures anymore. Also after the testcases that used to leave 1052+ open files by dnmasq, I am now seeing only 40-50 files open while executing the next testcase, so dnsmaq appears to be closing the files accordingly: # lsof 2>/dev/null | grep dnsmasq | wc -l 50 @Walid, thanks for your confirmation. Verified on OCP v3.11.0-0.19.0: cd /etc/systemd/system/dnsmasq.service.d cat override.conf [Service] LimitNOFILE=65535 Node Vertical test with 500 pods per node was successfully executed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |