Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1614713 - [3.11] node IP lost
[3.11] node IP lost
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute (Show other bugs)
3.11.0
Unspecified Unspecified
high Severity high
: ---
: 3.11.0
Assigned To: Jan Chaloupka
Weihua Meng
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-10 05:31 EDT by Weihua Meng
Modified: 2018-10-11 03:25 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-11 03:24:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 03:25 EDT

  None (edit)
Description Weihua Meng 2018-08-10 05:31:37 EDT
Description of problem:
This cluster was OK when setted up, this issue was found after running for 3 days.

this cluster is 3 masters + 3 infra + 3 comupte, runs on openstack with behind proxy.

If restart node service, the IP will be back.

Version-Release number of selected component (if applicable):
openshift v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0

How reproducible:
N/A

Steps to Reproduce:
1. install HA OCP 3.11 and running for 3 days with multi-users. 

Actual results:
5 out of 9 nodes lost IP.
# oc get node -o wide
NAME                              STATUS    ROLES     AGE       VERSION           INTERNAL-IP     EXTERNAL-IP    OS-IMAGE                                      KERNEL-VERSION               CONTAINER-RUNTIME
preserve-sharefr2-master-etcd-1   Ready     master    3d        v1.11.0+d4cacc0   <none>          <none>         Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-master-etcd-2   Ready     master    3d        v1.11.0+d4cacc0   172.16.120.8    10.8.247.142   Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-master-etcd-3   Ready     master    3d        v1.11.0+d4cacc0   <none>          <none>         Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-node-1          Ready     compute   3d        v1.11.0+d4cacc0   <none>          <none>         Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-node-2          Ready     compute   3d        v1.11.0+d4cacc0   172.16.120.78   10.8.251.177   Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-node-3          Ready     compute   3d        v1.11.0+d4cacc0   172.16.120.79   10.8.247.128   Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-node-infra-1    Ready     infra     3d        v1.11.0+d4cacc0   <none>          <none>         Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-node-infra-2    Ready     infra     3d        v1.11.0+d4cacc0   <none>          <none>         Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1
preserve-sharefr2-node-infra-3    Ready     infra     3d        v1.11.0+d4cacc0   172.16.120.6    10.8.241.153   Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.21.1.el7.x86_64   cri-o://1.11.1


lots of pods in terminating status

Expected results:
all nodes IP are remaining, pods are not in terminating status for long time

Additional info:
heapster logs
E0810 06:46:05.000408       1 summary.go:376] Node preserve-sharefr2-master-etcd-3 has no valid hostname and/or IP address: preserve-sharefr2-master-etcd-3 
E0810 06:46:05.000420       1 summary.go:376] Node preserve-sharefr2-node-infra-2 has no valid hostname and/or IP address: preserve-sharefr2-node-infra-2 
E0810 06:46:05.000432       1 summary.go:376] Node preserve-sharefr2-node-1 has no valid hostname and/or IP address: preserve-sharefr2-node-1 
E0810 06:46:05.000437       1 summary.go:376] Node preserve-sharefr2-node-infra-1 has no valid hostname and/or IP address: preserve-sharefr2-node-infra-1 
E0810 06:46:05.000442       1 summary.go:376] Node preserve-sharefr2-master-etcd-1 has no valid hostname and/or IP address: preserve-sharefr2-master-etcd-1
Comment 7 Weihua Meng 2018-08-23 09:16:53 EDT
already set up clusters, if do not meet this issue while running for 4 days, I will mark this bug is fixed.
Comment 8 Weihua Meng 2018-08-27 18:39:17 EDT
Fixed.
openshift v3.11.0-0.20.0
kubernetes v1.11.0+d4cacc0

running 2 clusters for 4 days, not meet node IP lost issue.

Kernel Version: 3.10.0-862.9.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)

Kernel Version: 3.10.0-933.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.6 Beta (Maipo)
Comment 10 errata-xmlrpc 2018-10-11 03:24:38 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.