Bug 1589396
Summary: | atomic-openshift-node.service unable to start because "network.go:100] Unable to get a bind address: failed to retrieve node IP" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Chris Kim <chrkim> |
Component: | Cloud Compute | Assignee: | Jan Chaloupka <jchaloup> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | DeShuai Ma <dma> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | andrew.rolls-drew, aos-bugs, bleanhar, bowe, byount, chrkim, cshereme, hongli, jchaloup, jokerman, jolee, mmccomas, rbost, tatanaka |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 3.9.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-27 15:29:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chris Kim
2018-06-08 23:37:17 UTC
I reopen this bug because another customer case #02171117 seems to face this bug. This customer is using Azure and there are similar error messages. We have a sosreport of an affected master node. I'll attach in private. I’m going to ask the customer to collect master and service log with LogLevel=10. If you have any information to ask the customer, could you tell me? We already test the errata, no any extra test needed, move to verified. I see this was released as errata for 3.9.43 and has a target of 3.12 What is the status for 3.10 and 3.11? fyi... i came across the same issue on one of what should have been 3 identical nodes, reference architecture installed on AWS, upgrading (some components) from 3.9.30 -> 3.9.60.... pre upgrade: [root@ip-172-15-23-65 git]# rpm -qa | grep openshift atomic-openshift-node-3.9.30-1.git.0.dec1ba7.el7.x86_64 atomic-openshift-sdn-ovs-3.9.30-1.git.0.dec1ba7.el7.x86_64 atomic-openshift-clients-3.9.30-1.git.0.dec1ba7.el7.x86_64 atomic-openshift-docker-excluder-3.9.30-1.git.0.dec1ba7.el7.noarch atomic-openshift-excluder-3.9.30-1.git.0.dec1ba7.el7.noarch atomic-openshift-3.9.30-1.git.0.dec1ba7.el7.x86_64 post-upgrade: [root@ip-172-15-43-148 ec2-user]# rpm -qa |grep openshift atomic-openshift-node-3.9.30-1.git.0.dec1ba7.el7.x86_64 atomic-openshift-excluder-3.9.60-1.git.0.f8b38ff.el7.noarch atomic-openshift-sdn-ovs-3.9.30-1.git.0.dec1ba7.el7.x86_64 atomic-openshift-docker-excluder-3.9.60-1.git.0.f8b38ff.el7.noarch atomic-openshift-clients-3.9.30-1.git.0.dec1ba7.el7.x86_64 atomic-openshift-3.9.30-1.git.0.dec1ba7.el7.x86_64 the workaround mentioned above resolved the issue. @jan, According to Comment #22 should he have have to explicitly set the node's bind address to workaround the problem is the 3.9.30 build even after upgrading to 3.9.60 (which has the fix)? If not, perhaps Bowe could provide the exact steps he performed to reproduce the problem and we could ask QE to double check. No additional workaround is needed after upgrading to 3.9.60. The timeout issue was fully fixed. If there is anything else malfunctioning, it's different issue we need to revisit.
> Azure REST API seems unstable. However, this timeout error didn't show up at the next service restart.
@Bowe, are you referring to this workaround?
The fix is part of atomic-openshift rpm which needs to be updated on each node. Until that happens the only way how to temporarily fix the issue is to restart node daemon.
|