Bug 1680562 - [network-operator] It may take about 5 minutes to update the operator status back to normal after fix the problem in network config
Summary: [network-operator] It may take about 5 minutes to update the operator status ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.0
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-25 10:26 UTC by Meng Bo
Modified: 2019-06-04 10:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
network_operator_log (16.27 KB, text/plain)
2019-02-25 10:26 UTC, Meng Bo
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:44:33 UTC
Github openshift cluster-network-operator pull 115 None None None 2019-03-05 17:15:04 UTC

Description Meng Bo 2019-02-25 10:26:26 UTC
Created attachment 1538395 [details]
network_operator_log

Description of problem:
The new added network clusteroperator.config.openshift.io can monitor the network operator and report back the status in real time. But it may take long time to report that the operator is available after fix the config from a problem.

Version-Release number of selected component (if applicable):
v4.0.0-0.177.0

How reproducible:
always 

Steps to Reproduce:
1. Setup ocp cluster

2. Check the network clusteroperator.config.openshift.io
NAME      VERSION   AVAILABLE   PROGRESSING   FAILING   SINCE
network             True        False         False     7s

3. Make some problem in the network.config.openshift.io

4. Check that the clusteroperator.config.openshift.io is reporting FAILING
NAME      VERSION   AVAILABLE   PROGRESSING   FAILING   SINCE
network             False       False         True      9s

5. Fix the problem in the network.config.openshift.io

6. Watch the clusteroperator 

Actual results:
It may take about 5 mins to report the cluster available again.

NAME      VERSION   AVAILABLE   PROGRESSING   FAILING   SINCE
network             False       False         False     4m


Expected results:
Should report the operator status in real time.

Additional info:
Full log of the network operator attached.

Comment 1 Dan Winship 2019-03-05 17:15:05 UTC
The fix for this also changes the behavior of the status conditions a bit. In particular, now when you break the configuration, the operator Failing status will become True, but the Available status will not change; the operator will report that it is both Failing and Available. Then when you fix the config again, it should report Failing False, but you won't need to wait for Available to change, because it never changed in the first place.

Comment 3 Anurag saxena 2019-03-14 18:26:05 UTC
(In reply to Dan Winship from comment #1)
> The fix for this also changes the behavior of the status conditions a bit.
> In particular, now when you break the configuration, the operator Failing
> status will become True, but the Available status will not change; the
> operator will report that it is both Failing and Available. Then when you
> fix the config again, it should report Failing False, but you won't need to
> wait for Available to change, because it never changed in the first place.

Yep, as noticed on 4.0.0-0.nightly-2019-03-13-233958 , now it seems like True,False,True after bad config

# oc get clusteroperators.config.openshift.io | grep "network\|NAME"
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network                               4.0.0-0.nightly-2019-03-13-233958   True        False         True      125m

and ~20 seconds post correct config, its shows True,False,False

# oc get clusteroperators.config.openshift.io | grep "network\|NAME"
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network                               4.0.0-0.nightly-2019-03-13-233958   True        False         False     129m

Comment 5 Meng Bo 2019-03-19 07:55:17 UTC
Checked on OCP 4.0.0-0.nightly-2019-03-19-004004

The status of the network operator at the beginning:
# oc get clusteroperator network -o wide
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network   4.0.0-0.nightly-2019-03-19-004004   True        False         False     84m

When I making the problem in network.config.openshift.io/cluster:
# oc get clusteroperator network
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network   4.0.0-0.nightly-2019-03-19-004004   True        False         True      86m

After I fix the problem above, it will refresh the status in a few seconds:
# oc get clusteroperator network
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network   4.0.0-0.nightly-2019-03-19-004004   True        False         False     86m

Mark the bug as verified.

Comment 7 errata-xmlrpc 2019-06-04 10:44:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.