Bug 1609138

Summary: Node unreachable when deploy logging
Product: OpenShift Container Platform Reporter: Qiaoling Tang <qitang>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: anli, aos-bugs, ewolinet, jcantril, jeder, qitang, rmeggins
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
As part of installing the ES5 stack, we need to create a sysctl file for the nodes that ES runs on. This was to fix the way we were evaluating which nodes/ansible hosts to run the tasks against.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:22:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Qiaoling Tang 2018-07-27 06:21:24 UTC
Ansible version:
 ansible-2.6.1-1.el7ae.noarch

Comment 3 Anping Li 2018-07-27 10:11:42 UTC
*** Bug 1609131 has been marked as a duplicate of this bug. ***

Comment 5 Rich Megginson 2018-07-27 16:02:28 UTC
Please provide the entire ansible inventory file, and any -e parameters you pass on the ansible-playbook command line, and any vars.yaml files you pass in with -e

Comment 7 Rich Megginson 2018-07-30 23:08:40 UTC
Does this work?

oc project default
oc get pods
# look for the router pod e.g.
# router-1-njpz6              1/1       Running   0          12m
oc exec router-1-xxxx -- ls

What happens?

Comment 8 Rich Megginson 2018-07-30 23:12:44 UTC
I get this error:

$ oc exec router-1-njpz6 -- ls
Error from server: error dialing backend: dial tcp: lookup infra-node-0.ocp311.rmeggins.test on 192.168.99.15:53: no such host

This is with an openshift on openstack deployment, without logging.

It seems that the oc exec command is attempting to ssh to the _external_ fqdn of the node rather than the _internal_ cluster IP address:

$ oc get nodes
NAME                                STATUS    ROLES     AGE       VERSION
app-node-0.ocp311.rmeggins.test     Ready     compute   18m       v1.11.0+d4cacc0
app-node-1.ocp311.rmeggins.test     Ready     compute   18m       v1.11.0+d4cacc0
infra-node-0.ocp311.rmeggins.test   Ready     infra     18m       v1.11.0+d4cacc0
master-0.ocp311.rmeggins.test       Ready     master    21m       v1.11.0+d4cacc0

$ oc get node master-0.ocp311.rmeggins.test
status:
  addresses:
  - address: 192.168.99.15
    type: InternalIP
  - address: master-0.ocp311.rmeggins.test
    type: Hostname

Did something change in ocp 3.11 that makes it use the external FQDN for the node names instead of the internal names/IP addresses?  Also, in 3.9, the node addresses were both internal IP addresses e.g.

status:
  addresses:
  - address: 192.168.99.15
    type: InternalIP
  - address: 192.168.99.15
    type: Hostname

So I don't know if this is a logging problem.

Comment 11 ewolinet 2018-07-31 14:22:00 UTC
Can you try this again with the latest ansible changes for the logging playbook?
We had done just that -- map node names back to the inventory names. This was originally done as a fix for oc cluster up --logging

It had merged in < 24 hours ago.

https://github.com/openshift/openshift-ansible/pull/9267

Comment 12 Qiaoling Tang 2018-08-01 06:06:34 UTC
Tried to use the latest  playbooks/openshift-logging/private/config.yml mentioned by ewolinet to deploy logging, the playbook ran successfully without any error, and all pod are running and ready.

Comment 13 Qiaoling Tang 2018-08-01 06:07:39 UTC
According to comment 12 , I removed the keyword "TestBlocker"

Comment 14 Jeff Cantrill 2018-08-01 15:00:13 UTC
Please close if this is no longer an issue.

Comment 15 Qiaoling Tang 2018-08-02 00:27:06 UTC
Wait for a new official 3.11 puddle.

Comment 17 Qiaoling Tang 2018-08-23 06:34:47 UTC
Verified on 

openshift-ansible-docs-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-roles-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-playbooks-3.11.0-0.19.0.git.0.ebd1bf9None.noarch

Comment 19 errata-xmlrpc 2018-10-11 07:22:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652