Bug 1916843

Summary: collect logs from openshift-sdn-controller pod
Product: OpenShift Container Platform Reporter: Serhii Zakharov <szakharo>
Component: Insights OperatorAssignee: Serhii Zakharov <szakharo>
Status: CLOSED ERRATA QA Contact: Pavel Šimovec <psimovec>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: aos-bugs, inecas, mklika, tremes
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
sdn-controller (openshift-sdn namespace) emits important messages when it finds issues affecting Egress IPs. The important messages are the following: - “Node %s is not Ready”: A node has been set offline for egress IPs because it is reported not ready at API - “Node %s may be offline... retrying”: An egress node has failed the egress IP health check once, so it has big chances to be marked as offline soon or, at the very least, there has been a connectivity glitch. - “Node %s is offline”: An egress node has failed enough probes to have been marked offline for egress IPs. If it has egress CIDRs assigned, its egress IPs have been moved to other nodes. Indicates issues at either the node or the network between the master and the node. - “Node %s is back online”: This indicates that a node has recovered from the condition described at the previous message, by starting succeeding the egress IP health checks. Useful just in case that previous “Node %s is offline” messages are lost, so that we have a clue that there was failure previously. As IO is gathered every 2hrs we want to gather latest occurrences of those errors in logs
Story Points: ---
Clone Of:
: 1921743 (view as bug list) Environment:
Last Closed: 2021-02-24 15:53:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1921743    

Description Serhii Zakharov 2021-01-15 16:34:06 UTC
sdn-controller (openshift-sdn namespace) emits important messages when it finds issues affecting Egress IPs. The important messages are the following:

    “Node %s is not Ready”: A node has been set offline for egress IPs because it is reported not ready at API
    “Node %s may be offline... retrying”: An egress node has failed the egress IP health check once, so it has big chances to be marked as offline soon or, at the very least, there has been a connectivity glitch.
    “Node %s is offline”: An egress node has failed enough probes to have been marked offline for egress IPs. If it has egress CIDRs assigned, its egress IPs have been moved to other nodes. Indicates issues at either the node or the network between the master and the node.
    “Node %s is back online”: This indicates that a node has recovered from the condition described at the previous message, by starting succeeding the egress IP health checks. Useful just in case that previous “Node %s is offline” messages are lost, so that we have a clue that there was failure previously.

As IO is gathered every 2hrs we want to gather latest occurrences of those errors in logs

Comment 5 errata-xmlrpc 2021-02-24 15:53:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633