Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1764629

Summary: Missing degraded condition when the static pod installer is unable to create pods due to networking errors
Product: OpenShift Container Platform Reporter: Michal Fojtik <mfojtik>
Component: kube-apiserverAssignee: Michal Fojtik <mfojtik>
Status: CLOSED EOL QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.0CC: aos-bugs, eparis, mfojtik, scuppett
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1775252 (view as bug list) Environment:
Last Closed: 2020-06-18 08:39:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1775252, 1782791, 1782793, 1782795    
Bug Blocks:    

Description Michal Fojtik 2019-10-23 13:30:52 UTC
Description of problem:

In case the installer pods are stucked in Pending because the networking on the node is not working properly, we should set the degraded state for the operator to reflect that state.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Xingxing Xia 2019-11-18 09:52:54 UTC
Above library-go PR is not bumped into cluster-kube-apiserver-operator of latest 4.2 payload 4.2.0-0.nightly-2019-11-17-020725:
cd /data/src/github.com/openshift/cluster-kube-apiserver-operator
git pull
oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-11-17-020725 | grep cluster-kube-apiserver-operator # get commit id
  cluster-kube-apiserver-operator               https://github.com/openshift/cluster-kube-apiserver-operator               2515807cd5baf006944df47595deaba12a3a08e5
git checkout -b 4.2.0-0.nightly-2019-11-17-020725 2515807c
git log --pretty="%h %an %cd - %s" 2515807c -1 # Older than above library PR
2515807c OpenShift Merge Robot Tue Oct 22 03:35:24 2019 +0200 - Merge pull request #590 from openshift-cherrypick-robot/cherry-pick-569-to-release-4.2
cd vendor/github.com/openshift/library-go/
ls pkg/operator/staticpod/controller/installerstate # check above library-go PR's diff files, the diff file not found in latest 4.2.0-0.nightly-2019-11-17-020725
ls: cannot access 'pkg/operator/staticpod/controller/installerstate': No such file or directory

Comment 7 Xingxing Xia 2019-11-18 10:07:03 UTC
(In reply to Michal Fojtik from comment #0)
> Description of problem:
> In case the installer pods are stucked in Pending because the networking on the node is not working properly
Could you give hint how to make "the networking on the node is not working properly" to verify the bug? This would save much time in trying to make that. Appreciate it very much!
Not sure if below way of comment 7 is correct?

Comment 8 Xingxing Xia 2019-11-18 10:07:26 UTC
ssh to one master (ip-10-0-155-116.ap-northeast-2.compute.internal), run below to try to make above "networking" condition:
while true # delete the sdn container in loop
do
  CONT_ID=`crictl ps | grep -e sdn | grep -v sdn-controller | awk '{print $1}'`
  if [ "$CONT_ID" != "" ]; then
    crictl stop $CONT_ID;
  fi
done

oc get no
NAME                                              STATUS     ROLES    AGE    VERSION
ip-10-0-132-196.ap-northeast-2.compute.internal   Ready      master   142m   v1.14.6+9fb2d5cf9
ip-10-0-155-116.ap-northeast-2.compute.internal   NotReady   master   142m   v1.14.6+9fb2d5cf9
ip-10-0-174-15.ap-northeast-2.compute.internal    Ready      master   142m   v1.14.6+9fb2d5cf9

Force rotate kubeapiserver.
oc describe po installer-7-ip-10-0-155-116.ap-northeast-2.compute.internal -n openshift-kube-apiserver
...
Status:               Pending
...
  Warning  NetworkNotReady  4m26s (x151 over 9m26s)  kubelet, ip-10-0-155-116.ap-northeast-2.compute.internal  network is not ready: runtime network not ready: Networ
kReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Then check below conditions?
oc describe co kube-apiserver
oc get kubeapiserver cluster -o yaml

Comment 9 Stephen Cuppett 2019-11-21 13:45:28 UTC
Can this be marked VERIFIED now? This was merged 10 days ago and likely in 4.2.7. It was a backport of a commit during feature development, PR550 (I don't see need for 4.3.0 bug to track).

Comment 12 Xingxing Xia 2019-11-22 01:45:46 UTC
(In reply to Stephen Cuppett from comment #9)
> Can this be marked VERIFIED now? This was merged 10 days ago and likely in 4.2.7
See comment 6's 4.2.0-0.nightly-2019-11-17-020725, the library-go repo's PR was not bumped in, let alone earlier 4.2.7 ("Release 4.2.7 was created from registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-11-13-203727" from https://openshift-release.svc.ci.openshift.org/releasestream/4-stable/release/4.2.7 )