Bug 1608092 - [3.9] Nodes losing IP address information in aws
Summary: [3.9] Nodes losing IP address information in aws
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.9.z
Assignee: Seth Jennings
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-24 23:36 UTC by Wang Haoran
Modified: 2018-09-25 12:18 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-08-29 14:42:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2549 0 None None None 2018-08-29 14:43:24 UTC

Description Wang Haoran 2018-07-24 23:36:08 UTC
Description of problem:

Some nodes are losing IP address information, which cause hepster cannot collect metrics for the nodes.

See the node address info in {{.node.status.address}}

This is the good node:

"addresses": [
            {
                "address": "172.29.54.179",
                "type": "InternalIP"
            },
            {
                "address": "54.x.x.x",
                "type": "ExternalIP"
            },
            {
                "address": "ip-172-29-54-179.ec2.internal",
                "type": "InternalDNS"
            },
            {
                "address": "ec2--x-x-x-x.amazonaws.com",
                "type": "ExternalDNS"
            },
            {
                "address": "ip-172-29-54-179.ec2.internal",
                "type": "Hostname"
            }
        ],


this is the bad node:

"addresses": [
            {
                "address": "ip-172-29-48-191.ec2.internal",
                "type": "Hostname"
            }
        ],
Version-Release number of selected component (if applicable):

openshift v3.9.33
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Drew Anderson 2018-07-24 23:48:30 UTC
We tried to bring the information back by restarting atomic-openshift-node.service on the affected node. This failed to restart.

Relevant log from atomic-openshift-node.service:
atomic-openshift-node[65550]: F0723 04:18:26.328675   65550 network.go:100] Unable to get a bind address: failed to retrieve node IP: host IP unknown; known <snip>

Possibly related: https://bugzilla.redhat.com/show_bug.cgi?id=1589396

Comment 2 Drew Anderson 2018-07-24 23:50:39 UTC
Performing a reboot (at least, AWS stop/start the affected node) allowed the atomic-openshift-node.service to restart and provide all address information again to the master api (oc get nodes). This also removed related complaints from heapster.

Comment 8 DeShuai Ma 2018-08-21 01:26:05 UTC
Hi haowang, help verify the bug.

Comment 11 errata-xmlrpc 2018-08-29 14:42:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2549


Note You need to log in before you can comment on or make changes to this bug.