| Summary: | externalID changes and kubelet attempts to delete/recreate Node API object | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> |
| Component: | Node | Assignee: | Solly Ross <sross> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jianwei Hou <jhou> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 3.1.0 | CC: | agoldste, anli, aos-bugs, bleanhar, eparis, erich, erjones, jdetiber, jialiu, jliggitt, jokerman, mmccomas, pep, rhowe, rpenta, sross, tdawson |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | atomic-openshift-3.1.1.6-1.git.0.b57e8bd.el7aos | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-01-29 20:30:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1267746 | ||
|
Description
Ryan Howe
2016-01-08 23:13:15 UTC
How the node's ExternalID is set depends on if a cloud provider has been configured or not. If a cloud provider has been configured, Kubernetes uses the cloud provider to get the ExternalID. If there is no cloud provider, the ExternalID is set to the same value as the node's "hostname." I put "hostname" in quotes because there is logic around how the node's hostname is set, and it can be an IP address. If the node-config.yaml file has nodeIP set, then the node's hostname is set to match nodeIP, which means that ExternalID will also be that IP address. Otherwise, the node's hostname comes from the nodeName field in node-config.yaml. I looked at the customer's logs and they are not running with a cloud provider set, which means that the ExternalID came from the node's "hostname." I have not yet found any code that looks like it is responsible for replacing the hostname with an IP address. I can imagine a few possibilities for how this happened: 1) Someone manually set nodeIP in node-config.yaml. This doesn't seem likely, as it appears that every node's ExternalID is an IP address. 2) Some tooling (openshift-ansible?) set the nodeIP, or set nodeName to be an IP. 3) Some code either exists currently or existed in the past that manipulated setting either nodeIP or the node's hostname to an IP openshift-ansible at one point was setting nodeIP in node-config.yaml. The latest version does not. The customer runs ansible every night to ensure the node configs are all correct. It sounds like they ran ansible when it was setting nodeIP, then later ran it again after it was no longer setting nodeIP. According to Andrew Butcher, ansible would remove nodeIP from the configs. I'm thinking this is what happened. PR was merged to master https://github.com/openshift/openshift-ansible/pull/970 Errata release with changes to ansible installer https://access.redhat.com/errata/RHBA-2015:2667 There's a PR in to address several facets of this problem: https://github.com/openshift/origin/pull/6310 (among other things, it tolerates switching between nodeIP and hostname for externalID without deleting and recreating the node). https://github.com/openshift/origin/pull/6310 has merged in origin Fix is in latest OSE build, moving to QE. Verified on
openshift v3.1.1.3
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2
Test env: OpenStack
Test scenarios:
==Without a cloud provider==
1. Add public ip associated to the instance as nodeIP
In node-config.yaml:
nodeName: openshift-125.lab.eng.nay.redhat.com
nodeIP: "10.66.79.125"
Result: Failed to start node service, on master, node status is unknown, kubelet stopped posting node status
Reason: "failed to create kubelet: Node IP: "10.66.79.125" not found in the host's network interfaces". On openstack env, "10.66.79.125" is an associated floating ip.
2. Set nodeIP: "" or remove nodeIP from node-config.yaml
Result: node status is ready, no errors or warnings seen
3. Set nodeIP: "192.168.0.116" in node-config.yaml, where "192.168.0.116" is the eth0 network interface
nodeName: openshift-125.lab.eng.nay.redhat.com
nodeIP: "192.168.0.116"
Result: Node status is ready, the externalID is shown as openshift-125.lab.eng.nay.redhat.com, not the nodeIP(192.168.0.116)
==With openstack as cloud provider(openstack instance names are updated to be same with nodeName)==
Result: The cloud provider gets the ExternalID, here the node has to be deleted in order to be updated successfully.
Jan 18 14:36:15 openshift-125.lab.eng.nay.redhat.com atomic-openshift-node[7486]: I0118 14:36:15.714228 7486 kubelet.go:972] Attempting to register node openshift-125.lab.eng.nay.redhat.com
Jan 18 14:36:15 openshift-125.lab.eng.nay.redhat.com atomic-openshift-node[7486]: E0118 14:36:15.723288 7486 kubelet.go:1011] Previously "openshift-125.lab.eng.nay.redhat.com" had externalID "af53b164-a3a4-48c9-bb6a-b3725c1dcae4"; now it is "openshift-125.lab.eng.nay.redhat.com"; will delete and recreate.
Jan 18 14:36:15 openshift-125.lab.eng.nay.redhat.com atomic-openshift-node[7486]: E0118 14:36:15.724619 7486 kubelet.go:1013] Unable to delete old node: User "system:node:openshift-125.lab.eng.nay.redhat.com" cannot delete nodes at the cluster scope
After deleting the origin node as admin, node is launched successfully
oc get node -o yaml
```
spec:
externalID: af53b164-a3a4-48c9-bb6a-b3725c1dcae4
providerID: openstack:///af53b164-a3a4-48c9-bb6a-b3725c1dcae4
status:
addresses:
- address: 192.168.0.116
type: InternalIP
- address: 10.66.79.125
type: InternalIP
- address: ""
type: ExternalIP
```
(In reply to Eric Jones from comment #23) > What version of AEP should this not be a problem in? Per comment #20 this should be in OSE 3.1.1 (RHSA-2016:0070). Updating the missing "fixed in version". AEP preview is based on the same packages, so it should be fixed in atomic-openshift-3.1.1.6-1.git.0.b57e8bd.el7aos there too. |