| Summary: | oo-diagnostics fails on unexpected DNS error | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | thunt |
| Component: | Node | Assignee: | Luke Meyer <lmeyer> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 1.2.1 | CC: | bleanhar, libra-onpremise-devel, xiama |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | rubygem-openshift-origin-common-1.8.16-1 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
The default oo-diagnostics DNS healthcheck was too strict.
Consequence:
If admins configured their DNS server to disable recursion the default health check would report a false positive.
Fix:
Relax the check simply to ensure that it receives a response.
Result:
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-02-04 16:47:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 1033701 | ||
| Bug Blocks: | |||
The point of this check is to see whether the first nameserver in /etc/resolv.conf is listening or not (since that is typically the one configured by install scripts, and a really likely point of failure/misconfiguration). Given how little we can assume from oo-diagnostics, this was just trying to get an expected failure response to see if the server is listening. Since it didn't respond, I'd mark that as a successful diagnostic. Refining the request isn't the point; the nameserver should respond, no matter what the request is. A timeout indicates failure. In 2.0 this check now just checks for any response at all, not a specific one. I would like to backport that change to 1.2, so I'll track that in this BZ. Backported fix to bug 1033701 check it on rubygem-openshift-origin-common-1.8.16-1
# oo-diagnostics --abortok -v
INFO: loading list of installed packages
INFO: OpenShift broker installed.
INFO: running: prereq_dns_server_available
INFO: checking that the first server in /etc/resolv.conf responds
FAIL: prereq_dns_server_available
192.168.59.150 doesn't appear to respond to DNS requests.
This command:
host -W 1 example.com. 192.168.59.150
should have connected to your primary nameserver.
Instead, it returned:
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
Please check the following to resolve this issue:
* Does /etc/resolv.conf have your correct nameserver?
* Is your nameserver running?
* Is the firewall on your nameserver open (udp:53)?
* Can you connect to your nameserver?
Many OpenShift functions fail without working DNS resolution.
INFO: running: test_enterprise_rpms
INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise
INFO: running: test_selinux_policy_rpm
INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4
INFO: running: test_selinux_enabled
INFO: running: test_broker_cache_permissions
INFO: broker application cache permissions appear fine
INFO: running: test_node_profiles_districts_from_broker
INFO: checking node profiles via MCollective
I wait for several minutes, these is no response.
(In reply to Ma xiaoqiang from comment #4) > check it on rubygem-openshift-origin-common-1.8.16-1 > # oo-diagnostics --abortok -v > INFO: loading list of installed packages > INFO: OpenShift broker installed. > INFO: running: prereq_dns_server_available > INFO: checking that the first server in /etc/resolv.conf responds > FAIL: prereq_dns_server_available > 192.168.59.150 doesn't appear to respond to DNS requests. > This command: > host -W 1 example.com. 192.168.59.150 > should have connected to your primary nameserver. > Instead, it returned: > ;; connection timed out; trying next origin > ;; connection timed out; no servers could be reached > > Please check the following to resolve this issue: > * Does /etc/resolv.conf have your correct nameserver? > * Is your nameserver running? > * Is the firewall on your nameserver open (udp:53)? > * Can you connect to your nameserver? > Many OpenShift functions fail without working DNS resolution. > > INFO: running: test_enterprise_rpms > INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise > INFO: running: test_selinux_policy_rpm > INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4 > INFO: running: test_selinux_enabled > INFO: running: test_broker_cache_permissions > INFO: broker application cache permissions appear fine > INFO: running: test_node_profiles_districts_from_broker > INFO: checking node profiles via MCollective > > I wait for several minutes, these is no response. To the above problem, I will file a new bug , I will check again for this bug as follow: After configuring /etc/named.conf with "recursion no", run "oo-diagnostics" . No error was thrown out. |
Description of problem: oo-diagnostics fails with following error message: - FAIL: prereq_dns_server_available 10.75.16.10 doesn't appear to respond to DNS requests. This command: host -W 1 do-not-expect-to-resolve. 10.75.16.10 should have returned NXDOMAIN to the request. Instead, it returned: ;; connection timed out; trying next origin ;; connection timed out; trying next origin ;; connection timed out; no servers could be reached Version-Release number of selected component (if applicable): How reproducible: Very Steps to Reproduce: 1. Run oo-diagnostics 2. 3. Actual results: oo-diagnostics fails quickly - see above Expected results: oo-diagnostics should continue Additional info: Workround - Change Line 399 of oo-diagnostics - from command = "host -W 1 do-not-expect-to-resolve. #{server}" to command = "host -W 1 do-not-expect-to-resolve.$(dnsdomainname) #{server}"