Description of problem: Some nodes are losing IP address information, which cause hepster cannot collect metrics for the nodes. See the node address info in {{.node.status.address}} This is the good node: "addresses": [ { "address": "172.29.54.179", "type": "InternalIP" }, { "address": "54.x.x.x", "type": "ExternalIP" }, { "address": "ip-172-29-54-179.ec2.internal", "type": "InternalDNS" }, { "address": "ec2--x-x-x-x.amazonaws.com", "type": "ExternalDNS" }, { "address": "ip-172-29-54-179.ec2.internal", "type": "Hostname" } ], this is the bad node: "addresses": [ { "address": "ip-172-29-48-191.ec2.internal", "type": "Hostname" } ], Version-Release number of selected component (if applicable): openshift v3.9.33 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.16 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
We tried to bring the information back by restarting atomic-openshift-node.service on the affected node. This failed to restart. Relevant log from atomic-openshift-node.service: atomic-openshift-node[65550]: F0723 04:18:26.328675 65550 network.go:100] Unable to get a bind address: failed to retrieve node IP: host IP unknown; known <snip> Possibly related: https://bugzilla.redhat.com/show_bug.cgi?id=1589396
Performing a reboot (at least, AWS stop/start the affected node) allowed the atomic-openshift-node.service to restart and provide all address information again to the master api (oc get nodes). This also removed related complaints from heapster.
Hi haowang, help verify the bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2549