Bug 1609138 - Node unreachable when deploy logging
Summary: Node unreachable when deploy logging
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
: 1609131 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-27 06:07 UTC by Qiaoling Tang
Modified: 2018-10-11 07:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
As part of installing the ES5 stack, we need to create a sysctl file for the nodes that ES runs on. This was to fix the way we were evaluating which nodes/ansible hosts to run the tasks against.
Clone Of:
Environment:
Last Closed: 2018-10-11 07:22:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:22:45 UTC

Comment 1 Qiaoling Tang 2018-07-27 06:21:24 UTC
Ansible version:
 ansible-2.6.1-1.el7ae.noarch

Comment 3 Anping Li 2018-07-27 10:11:42 UTC
*** Bug 1609131 has been marked as a duplicate of this bug. ***

Comment 5 Rich Megginson 2018-07-27 16:02:28 UTC
Please provide the entire ansible inventory file, and any -e parameters you pass on the ansible-playbook command line, and any vars.yaml files you pass in with -e

Comment 7 Rich Megginson 2018-07-30 23:08:40 UTC
Does this work?

oc project default
oc get pods
# look for the router pod e.g.
# router-1-njpz6              1/1       Running   0          12m
oc exec router-1-xxxx -- ls

What happens?

Comment 8 Rich Megginson 2018-07-30 23:12:44 UTC
I get this error:

$ oc exec router-1-njpz6 -- ls
Error from server: error dialing backend: dial tcp: lookup infra-node-0.ocp311.rmeggins.test on 192.168.99.15:53: no such host

This is with an openshift on openstack deployment, without logging.

It seems that the oc exec command is attempting to ssh to the _external_ fqdn of the node rather than the _internal_ cluster IP address:

$ oc get nodes
NAME                                STATUS    ROLES     AGE       VERSION
app-node-0.ocp311.rmeggins.test     Ready     compute   18m       v1.11.0+d4cacc0
app-node-1.ocp311.rmeggins.test     Ready     compute   18m       v1.11.0+d4cacc0
infra-node-0.ocp311.rmeggins.test   Ready     infra     18m       v1.11.0+d4cacc0
master-0.ocp311.rmeggins.test       Ready     master    21m       v1.11.0+d4cacc0

$ oc get node master-0.ocp311.rmeggins.test
status:
  addresses:
  - address: 192.168.99.15
    type: InternalIP
  - address: master-0.ocp311.rmeggins.test
    type: Hostname

Did something change in ocp 3.11 that makes it use the external FQDN for the node names instead of the internal names/IP addresses?  Also, in 3.9, the node addresses were both internal IP addresses e.g.

status:
  addresses:
  - address: 192.168.99.15
    type: InternalIP
  - address: 192.168.99.15
    type: Hostname

So I don't know if this is a logging problem.

Comment 11 ewolinet 2018-07-31 14:22:00 UTC
Can you try this again with the latest ansible changes for the logging playbook?
We had done just that -- map node names back to the inventory names. This was originally done as a fix for oc cluster up --logging

It had merged in < 24 hours ago.

https://github.com/openshift/openshift-ansible/pull/9267

Comment 12 Qiaoling Tang 2018-08-01 06:06:34 UTC
Tried to use the latest  playbooks/openshift-logging/private/config.yml mentioned by ewolinet to deploy logging, the playbook ran successfully without any error, and all pod are running and ready.

Comment 13 Qiaoling Tang 2018-08-01 06:07:39 UTC
According to comment 12 , I removed the keyword "TestBlocker"

Comment 14 Jeff Cantrill 2018-08-01 15:00:13 UTC
Please close if this is no longer an issue.

Comment 15 Qiaoling Tang 2018-08-02 00:27:06 UTC
Wait for a new official 3.11 puddle.

Comment 17 Qiaoling Tang 2018-08-23 06:34:47 UTC
Verified on 

openshift-ansible-docs-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-roles-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-playbooks-3.11.0-0.19.0.git.0.ebd1bf9None.noarch

Comment 19 errata-xmlrpc 2018-10-11 07:22:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.