Bug 1014882 - oo-diagnostics fails on unexpected DNS error
oo-diagnostics fails on unexpected DNS error
Status: CLOSED CURRENTRELEASE
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
1.2.1
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Luke Meyer
libra bugs
:
Depends On: 1033701
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-02 22:24 EDT by thunt
Modified: 2017-03-08 12 EST (History)
3 users (show)

See Also:
Fixed In Version: rubygem-openshift-origin-common-1.8.16-1
Doc Type: Bug Fix
Doc Text:
Cause: The default oo-diagnostics DNS healthcheck was too strict. Consequence: If admins configured their DNS server to disable recursion the default health check would report a false positive. Fix: Relax the check simply to ensure that it receives a response. Result:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-04 11:47:36 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description thunt 2013-10-02 22:24:27 EDT
Description of problem:

oo-diagnostics fails with following error message: -

FAIL: prereq_dns_server_available
        10.75.16.10 doesn't appear to respond to DNS requests.
        This command:
          host -W 1 do-not-expect-to-resolve. 10.75.16.10
        should have returned NXDOMAIN to the request.
        Instead, it returned:
          ;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached


Version-Release number of selected component (if applicable):


How reproducible:
Very

Steps to Reproduce:
1. Run oo-diagnostics
2.
3.

Actual results:
oo-diagnostics fails quickly - see above

Expected results:
oo-diagnostics should continue

Additional info:

Workround - Change Line 399 of oo-diagnostics - from 
 command = "host -W 1 do-not-expect-to-resolve. #{server}"

to
 command = "host -W 1 do-not-expect-to-resolve.$(dnsdomainname) #{server}"
Comment 2 Luke Meyer 2013-12-23 10:53:44 EST
The point of this check is to see whether the first nameserver in /etc/resolv.conf is listening or not (since that is typically the one configured by install scripts, and a really likely point of failure/misconfiguration). Given how little we can assume from oo-diagnostics, this was just trying to get an expected failure response to see if the server is listening. Since it didn't respond, I'd mark that as a successful diagnostic. Refining the request isn't the point; the nameserver should respond, no matter what the request is. A timeout indicates failure.

In 2.0 this check now just checks for any response at all, not a specific one. I would like to backport that change to 1.2, so I'll track that in this BZ.
Comment 3 Luke Meyer 2013-12-31 16:17:44 EST
Backported fix to bug 1033701
Comment 4 Ma xiaoqiang 2014-01-01 22:07:07 EST
check it on rubygem-openshift-origin-common-1.8.16-1
# oo-diagnostics --abortok -v
INFO: loading list of installed packages
INFO: OpenShift broker installed.
INFO: running: prereq_dns_server_available
INFO: checking that the first server in /etc/resolv.conf responds
FAIL: prereq_dns_server_available
        192.168.59.150 doesn't appear to respond to DNS requests.
        This command:
          host -W 1 example.com. 192.168.59.150
        should have connected to your primary nameserver.
        Instead, it returned:
          ;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

        Please check the following to resolve this issue:
        * Does /etc/resolv.conf have your correct nameserver?
        * Is your nameserver running?
        * Is the firewall on your nameserver open (udp:53)?
        * Can you connect to your nameserver?
        Many OpenShift functions fail without working DNS resolution.

INFO: running: test_enterprise_rpms
INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise
INFO: running: test_selinux_policy_rpm
INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4
INFO: running: test_selinux_enabled
INFO: running: test_broker_cache_permissions
INFO: broker application cache permissions appear fine
INFO: running: test_node_profiles_districts_from_broker
INFO: checking node profiles via MCollective

I wait for several minutes, these is no response.
Comment 5 Ma xiaoqiang 2014-01-02 00:17:05 EST
(In reply to Ma xiaoqiang from comment #4)
> check it on rubygem-openshift-origin-common-1.8.16-1
> # oo-diagnostics --abortok -v
> INFO: loading list of installed packages
> INFO: OpenShift broker installed.
> INFO: running: prereq_dns_server_available
> INFO: checking that the first server in /etc/resolv.conf responds
> FAIL: prereq_dns_server_available
>         192.168.59.150 doesn't appear to respond to DNS requests.
>         This command:
>           host -W 1 example.com. 192.168.59.150
>         should have connected to your primary nameserver.
>         Instead, it returned:
>           ;; connection timed out; trying next origin
> ;; connection timed out; no servers could be reached
> 
>         Please check the following to resolve this issue:
>         * Does /etc/resolv.conf have your correct nameserver?
>         * Is your nameserver running?
>         * Is the firewall on your nameserver open (udp:53)?
>         * Can you connect to your nameserver?
>         Many OpenShift functions fail without working DNS resolution.
> 
> INFO: running: test_enterprise_rpms
> INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise
> INFO: running: test_selinux_policy_rpm
> INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4
> INFO: running: test_selinux_enabled
> INFO: running: test_broker_cache_permissions
> INFO: broker application cache permissions appear fine
> INFO: running: test_node_profiles_districts_from_broker
> INFO: checking node profiles via MCollective
> 
> I wait for several minutes, these is no response.

To the above problem, I will file a new bug , I will check again for this bug as  follow:

After configuring /etc/named.conf with "recursion no", run "oo-diagnostics" .
No error was thrown out.

Note You need to log in before you can comment on or make changes to this bug.