Bug 1609138

Summary:	Node unreachable when deploy logging
Product:	OpenShift Container Platform	Reporter:	Qiaoling Tang <qitang>
Component:	Logging	Assignee:	ewolinet
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.11.0	CC:	anli, aos-bugs, ewolinet, jcantril, jeder, qitang, rmeggins
Target Milestone:	---
Target Release:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	As part of installing the ES5 stack, we need to create a sysctl file for the nodes that ES runs on. This was to fix the way we were evaluating which nodes/ansible hosts to run the tasks against.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-11 07:22:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Qiaoling Tang 2018-07-27 06:21:24 UTC

Ansible version:
 ansible-2.6.1-1.el7ae.noarch

Comment 3 Anping Li 2018-07-27 10:11:42 UTC

*** Bug 1609131 has been marked as a duplicate of this bug. ***

Comment 5 Rich Megginson 2018-07-27 16:02:28 UTC

Please provide the entire ansible inventory file, and any -e parameters you pass on the ansible-playbook command line, and any vars.yaml files you pass in with -e

Comment 7 Rich Megginson 2018-07-30 23:08:40 UTC

Does this work?

oc project default
oc get pods
# look for the router pod e.g.
# router-1-njpz6              1/1       Running   0          12m
oc exec router-1-xxxx -- ls

What happens?

Comment 8 Rich Megginson 2018-07-30 23:12:44 UTC

I get this error:

$ oc exec router-1-njpz6 -- ls
Error from server: error dialing backend: dial tcp: lookup infra-node-0.ocp311.rmeggins.test on 192.168.99.15:53: no such host

This is with an openshift on openstack deployment, without logging.

It seems that the oc exec command is attempting to ssh to the _external_ fqdn of the node rather than the _internal_ cluster IP address:

$ oc get nodes
NAME                                STATUS    ROLES     AGE       VERSION
app-node-0.ocp311.rmeggins.test     Ready     compute   18m       v1.11.0+d4cacc0
app-node-1.ocp311.rmeggins.test     Ready     compute   18m       v1.11.0+d4cacc0
infra-node-0.ocp311.rmeggins.test   Ready     infra     18m       v1.11.0+d4cacc0
master-0.ocp311.rmeggins.test       Ready     master    21m       v1.11.0+d4cacc0

$ oc get node master-0.ocp311.rmeggins.test
status:
  addresses:
  - address: 192.168.99.15
    type: InternalIP
  - address: master-0.ocp311.rmeggins.test
    type: Hostname

Did something change in ocp 3.11 that makes it use the external FQDN for the node names instead of the internal names/IP addresses?  Also, in 3.9, the node addresses were both internal IP addresses e.g.

status:
  addresses:
  - address: 192.168.99.15
    type: InternalIP
  - address: 192.168.99.15
    type: Hostname

So I don't know if this is a logging problem.

Comment 11 ewolinet 2018-07-31 14:22:00 UTC

Can you try this again with the latest ansible changes for the logging playbook?
We had done just that -- map node names back to the inventory names. This was originally done as a fix for oc cluster up --logging

It had merged in < 24 hours ago.

https://github.com/openshift/openshift-ansible/pull/9267

Comment 12 Qiaoling Tang 2018-08-01 06:06:34 UTC

Tried to use the latest  playbooks/openshift-logging/private/config.yml mentioned by ewolinet to deploy logging, the playbook ran successfully without any error, and all pod are running and ready.

Comment 13 Qiaoling Tang 2018-08-01 06:07:39 UTC

According to comment 12 , I removed the keyword "TestBlocker"

Comment 14 Jeff Cantrill 2018-08-01 15:00:13 UTC

Please close if this is no longer an issue.

Comment 15 Qiaoling Tang 2018-08-02 00:27:06 UTC

Wait for a new official 3.11 puddle.

Comment 17 Qiaoling Tang 2018-08-23 06:34:47 UTC

Verified on 

openshift-ansible-docs-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-roles-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-3.11.0-0.19.0.git.0.ebd1bf9None.noarch
openshift-ansible-playbooks-3.11.0-0.19.0.git.0.ebd1bf9None.noarch

Comment 19 errata-xmlrpc 2018-10-11 07:22:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652