Bug 1765756

Summary: [sig-apps] Deployment should not disrupt a cloud load-balancer's connectivity during rollout
Product: OpenShift Container Platform Reporter: Anurag saxena <anusaxen>
Component: openshift-controller-managerAssignee: Miciah Dashiel Butler Masters <mmasters>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: low Docs Contact:
Priority: low    
Version: 4.3.0CC: aos-bugs, jchaloup, mfojtik, tnozicka
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: workloads
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:09:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anurag saxena 2019-10-25 21:09:08 UTC
Description of problem:

fail [k8s.io/kubernetes/test/e2e/apps/deployment.go:899]: Unexpected error:
    <*errors.errorString | 0xc00437f840>: {
        s: "error waiting for deployment \"test-rolling-update-with-lb\" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:3, UpdatedReplicas:3, ReadyReplicas:2, AvailableReplicas:2, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:\"Available\", Status:\"False\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:\"MinimumReplicasUnavailable\", Message:\"Deployment does not have minimum availability.\"}, v1.DeploymentCondition{Type:\"Progressing\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589471, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:\"ReplicaSetUpdated\", Message:\"ReplicaSet \\\"test-rolling-update-with-lb-8575469454\\\" is progressing.\"}}, CollisionCount:(*int32)(nil)}",
    }
    error waiting for deployment "test-rolling-update-with-lb" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:3, UpdatedReplicas:3, ReadyReplicas:2, AvailableReplicas:2, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589471, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"test-rolling-update-with-lb-8575469454\" is progressing."}}, CollisionCount:(*int32)(nil)}
occurred

Additional info:
https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/184

Comment 1 Jan Chaloupka 2019-10-29 11:41:43 UTC
Oct 25 08:42:41.476: INFO: test-rolling-update-with-lb-8575469454-jkqnh                                Pending         [{PodScheduled False 0001-01-01 00:00:00 +0000 UTC 2019-10-25 08:37:41 +0000 UTC Unschedulable 0/5 nodes are available: 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules, 3 node(s) had taints that the pod didn't tolerate.}]

There is either something wrong with affinity/anti-affinity or taints/tolerations. Scheduler seems to work fine.

Comment 2 Tomáš Nožička 2019-10-29 12:17:26 UTC
it is not clear to me why it tries to run a test not present in origin (which has kube 1.16) and added upstream for 1.17

Comment 3 Tomáš Nožička 2019-10-29 12:53:06 UTC
(the test was nuked by a rebase just few days after)


mmasters assigning to you as backported the test

Comment 4 Miciah Dashiel Butler Masters 2019-10-29 22:46:04 UTC
> it is not clear to me why it tries to run a test not present in origin (which has kube 1.16) and added upstream for 1.17

https://github.com/openshift/origin/pull/23806 backported the test and the corresponding feature from Kubernetes master.


It looks like release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3 creates a 2-worker cluster, correct?  The test, as written, requires 3 workers.  However, the test can be trivially amended to need only 2 workers, or better (but slightly less trivially), the test code can determine the number of workers and scale its deployment to that number.  (Using more replicas is desirable because it increases the likelihood of failure absent the feature under test.)

Comment 5 Tomáš Nožička 2019-10-30 08:31:49 UTC
> It looks like release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3 creates a 2-worker cluster, correct?  The test, as written, requires 3 workers.  However, the test can be trivially amended to need only 2 workers, or better (but slightly less trivially), the test code can determine the number of workers and scale its deployment to that number.  (Using more replicas is desirable because it increases the likelihood of failure absent the feature under test.)

If a test requires at least 3 workers it should claim it and skip itself on <3 worker clusters. I think other tests already do that.

If it is possible to test with lower number of workers, that would be preferable since I think we require only 1, but the skip check is valid if it really needs more then 1.

Comment 7 wewang 2019-11-12 05:50:13 UTC
[sig-apps] Deployment should not disrupt a cloud load-balancer's connectivity during rollout [Suite:openshift/conformance/parallel] [Suite:k8s] 
passed in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/237

Comment 9 errata-xmlrpc 2020-01-23 11:09:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062