Bug 1936857 - e2e-ovirt-ipi-install-install is permafailing on 4.5 nightlies
Summary: e2e-ovirt-ipi-install-install is permafailing on 4.5 nightlies
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Gal Zaidman
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-09 11:16 UTC by Vadim Rutkovsky
Modified: 2021-07-27 22:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
operator.Run multi-stage test e2e-ovirt - e2e-ovirt-ipi-install-install container test
Last Closed: 2021-07-27 22:51:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:52:20 UTC

Description Vadim Rutkovsky 2021-03-09 11:16:10 UTC
test:
operator.Run multi-stage test e2e-ovirt - e2e-ovirt-ipi-install-install container test 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?search=operator%5C.Run+multi-stage+test+e2e-ovirt+-+e2e-ovirt-ipi-install-install+container+test&maxAge=168h&context=1&type=bug%2Bjunit&name=4.5&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job


Example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.5-e2e-ovirt/1369136087938633728

Breaks at bootstrapping:

 level=info msg="Waiting up to 40m0s for bootstrapping to complete..."
W0309 04:31:55.245000      69 reflector.go:326] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received
E0309 04:31:56.369901      69 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.ovirt14.gcp.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF
E0309 04:31:57.473420      69 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.ovirt14.gcp.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF
E0309 04:31:58.625698      69 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.ovirt14.gcp.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF
E0309 04:31:59.765295      69 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.ovirt14.gcp.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF
E0309 04:32:00.878767      69 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get https://api.ovirt14.gcp.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: EOF
level=info msg="Pulling debug logs from the bootstrap machine"

Comment 3 Douglas Schilling Landgraf 2021-03-18 13:34:04 UTC
Locally it seems to work the 4.5 branch, still investigating.

$ git branch -a                                                                                                                                          
* 4.5                                                                                                                                                                                 
  master                                                                                                                                                                              
  remotes/origin/HEAD -> origin/master                                                                                                                                                

# OPENSHIFT_RELEASE_VERSION can be 4.4, 4.5, 4.6 etc
export OPENSHIFT_RELEASE_VERSION=4.5
export MIRROR="mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp-dev-preview/latest-${OPENSHIFT_RELEASE_VERSION}"
export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$(curl -s -l "${MIRROR}/release.txt" | sed -n 's/^Pull From: //p')"
./bin/openshift-install create cluster --dir=CI_4.5_failure_Mar18_2021 --log-level=debug


DEBUG Still waiting for the cluster to initialize: Working towards 4.5.35: 87% complete                                                                                               
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.35: 87% complete, waiting on authentication                                                                    
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.35: 87% complete, waiting on authentication                                                                    
DEBUG Cluster is initialized                                                                                                                                                          
INFO Waiting up to 10m0s for the openshift-console route to be created...                                                                                                             
DEBUG Route found in openshift-console namespace: console                                                                                                                             
DEBUG Route found in openshift-console namespace: downloads                                                                                                                           
DEBUG OpenShift console route is created                                                                                                                                              
INFO Install complete!                                                                                                                                                                
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/douglas/installer/CI_4.5_failure_Mar18_2021/auth/kubeconfig'                        
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.cluster.medogz.ocp4                                                                                
INFO Login to the console with user: "kubeadmin", and password: "VDNxX-dD......."                                                                                             
DEBUG Time elapsed per stage:                                                                                                                                                         
DEBUG     Infrastructure: 8m19s                                                                                                                                                       
DEBUG Bootstrap Complete: 14m10s                    
DEBUG                API: 2m28s                     
DEBUG  Bootstrap Destroy: 19s                       
DEBUG  Cluster Operators: 22m50s                    
INFO Time elapsed: 46m31s                          
[douglas@localhost installer]$

Comment 4 Gal Zaidman 2021-03-30 15:11:54 UTC
due to capacity constraints we will be revisiting this bug in the upcoming sprint

Comment 5 Douglas Schilling Landgraf 2021-04-20 14:32:45 UTC
Hi,

Worked with Gal in this one, we need to set IGNITIONVERSION: "2.2.0" in 4.5 CI job. 
Gal is finishing the tests and will send a PR soon to openshift/release project.

Comment 7 Gal Zaidman 2021-04-28 07:00:40 UTC
(In reply to Vadim Rutkovsky from comment #6)
> Not yet fixed, see
> https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-
> openshift-release-master-nightly-4.5-e2e-ovirt
> 
> *
> https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-
> openshift-release-master-nightly-4.5-e2e-ovirt/1386893064492027904
> *
> https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-
> openshift-release-master-nightly-4.5-e2e-ovirt/1386787354529763328
> *
> https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-
> openshift-release-master-nightly-4.5-e2e-ovirt/1386530542912016384

This is expected, look at the test failures the tests fail due to docker rate limiting:
"""
Apr 27 05:13:31.987: INFO: At 2021-04-27 05:08:39 +0000 UTC - event for execpodkqdlj: {kubelet ovirt17-5cb78-worker-0-5l9nj} Failed: Failed to pull image "centos:7": rpc error: code = Unknown desc = Error reading manifest 7 in docker.io/library/centos: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
"""

I know that on more advanced releases there were a number of PRs that addressed the docker rate limiting and fixed it but they were probably not backported if we are still seeing this on 4.5.
Anywhy this bug is for the bootstrap failure so that we fixed, now the cluster is installed but some of the tests fail due to docker rate limiting, so I think you can move this to verified and if you like can open a bug on docker rate limiting test failures on 4.4 and 4.5 releases, guess we can talk to Clayton I think he has done some work in that area

Comment 8 Vadim Rutkovsky 2021-04-28 07:27:09 UTC
>I think you can move this to verified and if you like can open a bug on docker rate limiting test failures on 4.4 and 4.5 releases

This bug tracks it now then, unless we find a better suiting bug for this

Comment 9 Gal Zaidman 2021-05-24 14:32:37 UTC
Closing this bug for:
https://bugzilla.redhat.com/show_bug.cgi?id=1963999

As I mentioned,
"this bug is for the bootstrap failure so that we fixed, now the cluster is installed but some of the tests fail due to docker rate limiting, so I think you can move this to verified and if you like can open a bug on docker rate limiting test failures on 4.4 and 4.5 releases, guess we can talk to Clayton I think he has done some work in that area"

Comment 15 errata-xmlrpc 2021-07-27 22:51:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.