Bug 1609138
| Summary: | Node unreachable when deploy logging | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> |
| Component: | Logging | Assignee: | ewolinet |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | anli, aos-bugs, ewolinet, jcantril, jeder, qitang, rmeggins |
| Target Milestone: | --- | ||
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
As part of installing the ES5 stack, we need to create a sysctl file for the nodes that ES runs on. This was to fix the way we were evaluating which nodes/ansible hosts to run the tasks against.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-11 07:22:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Qiaoling Tang
2018-07-27 06:21:24 UTC
*** Bug 1609131 has been marked as a duplicate of this bug. *** Please provide the entire ansible inventory file, and any -e parameters you pass on the ansible-playbook command line, and any vars.yaml files you pass in with -e Does this work? oc project default oc get pods # look for the router pod e.g. # router-1-njpz6 1/1 Running 0 12m oc exec router-1-xxxx -- ls What happens? I get this error:
$ oc exec router-1-njpz6 -- ls
Error from server: error dialing backend: dial tcp: lookup infra-node-0.ocp311.rmeggins.test on 192.168.99.15:53: no such host
This is with an openshift on openstack deployment, without logging.
It seems that the oc exec command is attempting to ssh to the _external_ fqdn of the node rather than the _internal_ cluster IP address:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
app-node-0.ocp311.rmeggins.test Ready compute 18m v1.11.0+d4cacc0
app-node-1.ocp311.rmeggins.test Ready compute 18m v1.11.0+d4cacc0
infra-node-0.ocp311.rmeggins.test Ready infra 18m v1.11.0+d4cacc0
master-0.ocp311.rmeggins.test Ready master 21m v1.11.0+d4cacc0
$ oc get node master-0.ocp311.rmeggins.test
status:
addresses:
- address: 192.168.99.15
type: InternalIP
- address: master-0.ocp311.rmeggins.test
type: Hostname
Did something change in ocp 3.11 that makes it use the external FQDN for the node names instead of the internal names/IP addresses? Also, in 3.9, the node addresses were both internal IP addresses e.g.
status:
addresses:
- address: 192.168.99.15
type: InternalIP
- address: 192.168.99.15
type: Hostname
So I don't know if this is a logging problem.
Can you try this again with the latest ansible changes for the logging playbook? We had done just that -- map node names back to the inventory names. This was originally done as a fix for oc cluster up --logging It had merged in < 24 hours ago. https://github.com/openshift/openshift-ansible/pull/9267 Tried to use the latest playbooks/openshift-logging/private/config.yml mentioned by ewolinet to deploy logging, the playbook ran successfully without any error, and all pod are running and ready. According to comment 12 , I removed the keyword "TestBlocker" Please close if this is no longer an issue. Wait for a new official 3.11 puddle. Verified on openshift-ansible-docs-3.11.0-0.19.0.git.0.ebd1bf9None.noarch openshift-ansible-roles-3.11.0-0.19.0.git.0.ebd1bf9None.noarch openshift-ansible-3.11.0-0.19.0.git.0.ebd1bf9None.noarch openshift-ansible-playbooks-3.11.0-0.19.0.git.0.ebd1bf9None.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |