Bug 1765756 - [sig-apps] Deployment should not disrupt a cloud load-balancer's connectivity during rollout
Summary: [sig-apps] Deployment should not disrupt a cloud load-balancer's connectivity...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.3.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: wewang
URL:
Whiteboard: workloads
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-25 21:09 UTC by Anurag saxena
Modified: 2020-01-23 11:09 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:09:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24057 0 'None' 'closed' 'Bug 1765756: UPSTREAM: 80004: Prefer to delete doubled-up pods of a ReplicaSet' 2019-11-11 06:26:47 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:09:58 UTC

Description Anurag saxena 2019-10-25 21:09:08 UTC
Description of problem:

fail [k8s.io/kubernetes/test/e2e/apps/deployment.go:899]: Unexpected error:
    <*errors.errorString | 0xc00437f840>: {
        s: "error waiting for deployment \"test-rolling-update-with-lb\" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:3, UpdatedReplicas:3, ReadyReplicas:2, AvailableReplicas:2, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:\"Available\", Status:\"False\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:\"MinimumReplicasUnavailable\", Message:\"Deployment does not have minimum availability.\"}, v1.DeploymentCondition{Type:\"Progressing\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589471, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:\"ReplicaSetUpdated\", Message:\"ReplicaSet \\\"test-rolling-update-with-lb-8575469454\\\" is progressing.\"}}, CollisionCount:(*int32)(nil)}",
    }
    error waiting for deployment "test-rolling-update-with-lb" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:3, UpdatedReplicas:3, ReadyReplicas:2, AvailableReplicas:2, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589471, loc:(*time.Location)(0xa2cd800)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63707589461, loc:(*time.Location)(0xa2cd800)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"test-rolling-update-with-lb-8575469454\" is progressing."}}, CollisionCount:(*int32)(nil)}
occurred

Additional info:
https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/184

Comment 1 Jan Chaloupka 2019-10-29 11:41:43 UTC
Oct 25 08:42:41.476: INFO: test-rolling-update-with-lb-8575469454-jkqnh                                Pending         [{PodScheduled False 0001-01-01 00:00:00 +0000 UTC 2019-10-25 08:37:41 +0000 UTC Unschedulable 0/5 nodes are available: 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules, 3 node(s) had taints that the pod didn't tolerate.}]

There is either something wrong with affinity/anti-affinity or taints/tolerations. Scheduler seems to work fine.

Comment 2 Tomáš Nožička 2019-10-29 12:17:26 UTC
it is not clear to me why it tries to run a test not present in origin (which has kube 1.16) and added upstream for 1.17

Comment 3 Tomáš Nožička 2019-10-29 12:53:06 UTC
(the test was nuked by a rebase just few days after)


mmasters assigning to you as backported the test

Comment 4 Miciah Dashiel Butler Masters 2019-10-29 22:46:04 UTC
> it is not clear to me why it tries to run a test not present in origin (which has kube 1.16) and added upstream for 1.17

https://github.com/openshift/origin/pull/23806 backported the test and the corresponding feature from Kubernetes master.


It looks like release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3 creates a 2-worker cluster, correct?  The test, as written, requires 3 workers.  However, the test can be trivially amended to need only 2 workers, or better (but slightly less trivially), the test code can determine the number of workers and scale its deployment to that number.  (Using more replicas is desirable because it increases the likelihood of failure absent the feature under test.)

Comment 5 Tomáš Nožička 2019-10-30 08:31:49 UTC
> It looks like release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3 creates a 2-worker cluster, correct?  The test, as written, requires 3 workers.  However, the test can be trivially amended to need only 2 workers, or better (but slightly less trivially), the test code can determine the number of workers and scale its deployment to that number.  (Using more replicas is desirable because it increases the likelihood of failure absent the feature under test.)

If a test requires at least 3 workers it should claim it and skip itself on <3 worker clusters. I think other tests already do that.

If it is possible to test with lower number of workers, that would be preferable since I think we require only 1, but the skip check is valid if it really needs more then 1.

Comment 7 wewang 2019-11-12 05:50:13 UTC
[sig-apps] Deployment should not disrupt a cloud load-balancer's connectivity during rollout [Suite:openshift/conformance/parallel] [Suite:k8s] 
passed in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/237

Comment 9 errata-xmlrpc 2020-01-23 11:09:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.