Bug 1805821

Summary: [build-cop] extended test failure: Unexpected error in oauth ldap
Product: OpenShift Container Platform Reporter: Venkata Siva Teja Areti <vareti>
Component: apiserver-authAssignee: Venkata Siva Teja Areti <vareti>
Status: CLOSED ERRATA QA Contact: pmali
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.0CC: aos-bugs, mfojtik, slaznick, sttts, tflannag, xxia
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:16:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Venkata Siva Teja Areti 2020-02-21 15:31:39 UTC
Description of problem:

An extended oauth ldap test failed 26 times in last 14 days

https://search.svc.ci.openshift.org/?search=errours+encountered+trying+to+run+ldapsearch+pod&maxAge=336h&context=2&type=all

Seeing two kinds of failures

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-ovn-4.3/516

fail [github.com/openshift/origin/test/extended/oauth/oauth_ldap.go:53]: Unexpected error:
    <*errors.errorString | 0xc000f841f0>: {
        s: "errours encountered trying to run ldapsearch pod: [error waiting for the pod 'runonce-ldapsearch-pod' to complete: timed out waiting for the condition]",
    }
    errours encountered trying to run ldapsearch pod: [error waiting for the pod 'runonce-ldapsearch-pod' to complete: timed out waiting for the condition]
occurred

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-serial-4.3/937

fail [github.com/openshift/origin/test/extended/oauth/groupsync.go:32]: Unexpected error:
    <*errors.errorString | 0xc002f39070>: {
        s: "errours encountered trying to run ldapsearch pod: [command pod runonce-ldapsearch-pod did not complete: Get https://147.75.102.37:10250/containerLogs/e2e-test-ldap-group-sync-5cmcx/runonce-ldapsearch-pod/runonce-ldapsearch-pod: remote error: tls: internal error]",
    }
    errours encountered trying to run ldapsearch pod: [command pod runonce-ldapsearch-pod did not complete: Get https://147.75.102.37:10250/containerLogs/e2e-test-ldap-group-sync-5cmcx/runonce-ldapsearch-pod/runonce-ldapsearch-pod: remote error: tls: internal error]
occurred

Comment 5 Venkata Siva Teja Areti 2020-05-08 14:39:24 UTC
As of today, there are a bunch of failures with ldap client pod errors.

Error logs from client pod

May  6 17:32:59.580: INFO: runonce-ldapsearch-pod[e2e-test-oauth-ldap-2gzs5].container[runonce-ldapsearch-pod].log
ldap_start_tls: Can't contact LDAP server (-1)
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

Events in the namespaces show that the containers are running

May  6 17:32:59.479: INFO: POD                               NODE                               PHASE    GRACE  CONDITIONS
May  6 17:32:59.479: INFO: openldap-server-7f479cc6f5-v7lq6  qvp04xw3-3c054-8dfh6-worker-ttm74  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:37 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:53 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:53 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:37 +0000 UTC  }]
May  6 17:32:59.480: INFO: runonce-ldapsearch-pod            qvp04xw3-3c054-8dfh6-worker-26gxz  Failed          [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:54 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:59 +0000 UTC ContainersNotReady containers with unready status: [runonce-ldapsearch-pod]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:59 +0000 UTC ContainersNotReady containers with unready status: [runonce-ldapsearch-pod]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-06 17:32:54 +0000 UTC  }]

It is possible that there is a race condition between client and server. Updated the PR to reflect the fix.

Comment 6 Venkata Siva Teja Areti 2020-05-08 14:40:56 UTC
For the previous comment, fetched the logs from this CI failure

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.5/670

Comment 10 Venkata Siva Teja Areti 2020-05-11 17:06:57 UTC
*** Bug 1812186 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2020-07-13 17:16:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409