Bug 1830293 - test: [sig-auth][Feature:HTPasswdAuth] HTPasswd IDP should successfully configure htpasswd and be responsive [Suite:openshift/conformance/parallel]
Summary: test: [sig-auth][Feature:HTPasswdAuth] HTPasswd IDP should successfully confi...
Keywords:
Status: CLOSED DUPLICATE of bug 1794839
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-01 14:15 UTC by Cyril
Modified: 2023-09-14 05:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
test: [sig-auth][Feature:HTPasswdAuth] HTPasswd IDP should successfully configure htpasswd and be responsive [Suite:openshift/conformance/parallel]
Last Closed: 2020-08-17 21:34:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Cyril 2020-05-01 14:15:58 UTC
test: [sig-auth][Feature:HTPasswdAuth] HTPasswd IDP should successfully configure htpasswd and be responsive [Suite:openshift/conformance/parallel] failed, see job: <link>

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.5/73

Seem multiple tests are failing due to OAuth Server issue.

Comment 2 Cyril 2020-05-01 21:08:01 UTC
The test is consistently failing.

Comment 4 Ben Parees 2020-05-02 00:21:44 UTC
Looks like a whole block of oauth tests are consistently failing in that test job specifically... whether it's a compact cluster issue or an azure issue.

I also see the etcd leader change test is failing meaning we probably had etcd issues that need to be investigated, but it's odd they'd only/specifically impact the oauth tests every time.

Comment 6 Sam Batschelet 2020-05-18 14:30:38 UTC
> Job failed consistently: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.5/74

The data exposed form event is very useful. Unfortunately, even with current raft tuning, we are seeing leader elections. I triaged this BZ before and at the time the load balancers were causing the test to fail compact clusters. Since this test never passed to my knowledge and support for compact is pending for Azure I am moving to the Installer team for clarification. Once support exists we can try to resolve.

Comment 7 Abhinav Dahiya 2020-05-18 16:24:57 UTC
The fix for compact was merged on May 7, and starting May 8 no `HTPasswd IDP should successfully configure htpasswd and be responsive` consistently failing.

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-origin-installer-e2e-azure-compact-4.5&sort-by-flakiness=10

there is one failure https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.5/81

```
fail [github.com/openshift/origin/test/extended/util/client.go:693]: May 10 15:20:30.771: the server is currently unable to handle the request (get users.user.openshift.io e2e-test-htpasswd-idp-7pv5k-user)
```

which could be etcd stalling the kube-apiserver. So moving back to etcd to traige if that's so, otherwise close it as dup of https://bugzilla.redhat.com/show_bug.cgi?id=1794839

Comment 8 Suresh Kolichala 2020-05-20 13:48:44 UTC
We have made some improvements to reduce the number of leader elections during an upgrade and regular installations. Can the reporter verify if the etcd failures are still found (not the problem reported in BZ 1794839)?

Comment 9 Dan Mace 2020-05-20 19:42:50 UTC
We took a look at the isolated failure[1] and noticed a few things:

1. etcd seemed to be okay, although the operator was misreporting one of the members as unhealthy
2. the oauth server couldn't talk to the openshift apiserver, but it's not entirely clear why

   2020-05-10T15:25:55.982108229Z I0510 15:25:55.982033       1 log.go:172] http: TLS handshake error from 10.128.0.50:51094: read tcp 10.129.0.42:6443->10.128.0.50:51094: read: connection timed out

3. the openshift apiserver logs didn't reveal much of interest about why
4. the kube apiserver logs didn't either

Given that failure was part of a huge vertical cascade of other test failures, we'd like to continue observing to see if things remain stable. More analysis is required to root cause the last specific failure, but it's not at all clear yet there's an etcd issue there. Can't rule out networking yet.

Moving to 4.6.

[1] https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.5/81

Comment 10 Sam Batschelet 2020-06-20 12:51:47 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 14 Red Hat Bugzilla 2023-09-14 05:56:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.