Summary: | Node unreachable when deploy logging | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> |
Component: | Logging | Assignee: | ewolinet |
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.11.0 | CC: | anli, aos-bugs, ewolinet, jcantril, jeder, qitang, rmeggins |
Target Milestone: | --- | ||
Target Release: | 3.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
As part of installing the ES5 stack, we need to create a sysctl file for the nodes that ES runs on. This was to fix the way we were evaluating which nodes/ansible hosts to run the tasks against.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-11 07:22:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Comment 1
Qiaoling Tang
2018-07-27 06:21:24 UTC
*** Bug 1609131 has been marked as a duplicate of this bug. *** Please provide the entire ansible inventory file, and any -e parameters you pass on the ansible-playbook command line, and any vars.yaml files you pass in with -e Does this work? oc project default oc get pods # look for the router pod e.g. # router-1-njpz6 1/1 Running 0 12m oc exec router-1-xxxx -- ls What happens? I get this error: $ oc exec router-1-njpz6 -- ls Error from server: error dialing backend: dial tcp: lookup infra-node-0.ocp311.rmeggins.test on 192.168.99.15:53: no such host This is with an openshift on openstack deployment, without logging. It seems that the oc exec command is attempting to ssh to the _external_ fqdn of the node rather than the _internal_ cluster IP address: $ oc get nodes NAME STATUS ROLES AGE VERSION app-node-0.ocp311.rmeggins.test Ready compute 18m v1.11.0+d4cacc0 app-node-1.ocp311.rmeggins.test Ready compute 18m v1.11.0+d4cacc0 infra-node-0.ocp311.rmeggins.test Ready infra 18m v1.11.0+d4cacc0 master-0.ocp311.rmeggins.test Ready master 21m v1.11.0+d4cacc0 $ oc get node master-0.ocp311.rmeggins.test status: addresses: - address: 192.168.99.15 type: InternalIP - address: master-0.ocp311.rmeggins.test type: Hostname Did something change in ocp 3.11 that makes it use the external FQDN for the node names instead of the internal names/IP addresses? Also, in 3.9, the node addresses were both internal IP addresses e.g. status: addresses: - address: 192.168.99.15 type: InternalIP - address: 192.168.99.15 type: Hostname So I don't know if this is a logging problem. Can you try this again with the latest ansible changes for the logging playbook? We had done just that -- map node names back to the inventory names. This was originally done as a fix for oc cluster up --logging It had merged in < 24 hours ago. https://github.com/openshift/openshift-ansible/pull/9267 Tried to use the latest playbooks/openshift-logging/private/config.yml mentioned by ewolinet to deploy logging, the playbook ran successfully without any error, and all pod are running and ready. According to comment 12 , I removed the keyword "TestBlocker" Please close if this is no longer an issue. Wait for a new official 3.11 puddle. Verified on openshift-ansible-docs-3.11.0-0.19.0.git.0.ebd1bf9None.noarch openshift-ansible-roles-3.11.0-0.19.0.git.0.ebd1bf9None.noarch openshift-ansible-3.11.0-0.19.0.git.0.ebd1bf9None.noarch openshift-ansible-playbooks-3.11.0-0.19.0.git.0.ebd1bf9None.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |