Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1460388

Summary:

AWS getInstancesByNodeNames is broken for large clusters

Product:

OpenShift Container Platform

Reporter:

Hemant Kumar <hekumar>

Component:

Node

Assignee:

Hemant Kumar <hekumar>

Status:

CLOSED ERRATA

QA Contact:

Mike Fiedler <mifiedle>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.5.1

CC:

aos-bugs, bchilds, eparis, jokerman, mmccomas, xtian

Target Milestone:

---

Target Release:

3.5.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: When describing multiple instances on AWS, we are supplying each node as a filter. This fails to work if cluster is large enough because AWS only allows upto 200 filters to a request. Consequence: DescribeInstances calls fail, resulting in broken load balancer and storage functionality in AWS. Fix: Implement batching of describeinstance calls to get over the filtering limit. Result: DescribeInstances calls work for larger clusters too.

Story Points:

---

Clone Of:

Clones:

1461865 (view as bug list)

Environment:

Last Closed:

2017-10-25 13:02:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1461865

Attachments:

Description	Flags
Master and node syslogs	none

Description Hemant Kumar 2017-06-09 21:58:32 UTC

From CloudTrail logs:

    "errorMessage": "The maximum number of filter values specified on a single call is 200",

This should affect all features (such as Load Balancers, Storage) which rely on this function to work.


I have opened an upstream ticket as well - 
https://github.com/kubernetes/kubernetes/issues/47271

Comment 1 Hemant Kumar 2017-06-09 21:59:37 UTC

Assigning this to myself for now.

Comment 2 Hemant Kumar 2017-06-09 22:01:43 UTC

We will have to backport the fix to both 3.5 and 3.6 once available.

Comment 3 Hemant Kumar 2017-06-15 13:32:11 UTC

Link to PR - https://github.com/openshift/ose/pull/788

Comment 11 Mike Fiedler 2017-07-11 20:18:49 UTC

Tested on v3.5.5.31.3 and there seems to be an issue still.

I followed the same procedure used to verify on 3.6:  See https://bugzilla.redhat.com/show_bug.cgi?id=1461865#c15 and https://bugzilla.redhat.com/show_bug.cgi?id=1461865#c17 for details and results

1.  Installed 202 node cluster at v3.5.5.31.3
2.  Created 250 projects with deployment configurations that included Volume and VolumeMount for a PVC which was dynamically bound to an EBS PV
3.  In the node syslog (attached), you can see the volume successfully attached to the node and formatted
4. Verified the PV and PVC were Bound
5. In the AWS console, force detached the volume used in namespace svt-190 (/dev/xvdbq), vol id vol-0a70d2fb3b570c842
6. Waited for the volume to be reattached.

The volume stayed available in the AWS console and never reattached.   The PVC remained in Bound state

NAME      STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
pvc2      Bound     pvc-abfad388-666e-11e7-be60-02238eeb625a   1Gi        RWO           51m
root@ip-172-31-47-221: ~ # oc get pv pvc-abfad388-666e-11e7-be60-02238eeb625a
NAME                                       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM          REASON    AGE
pvc-abfad388-666e-11e7-be60-02238eeb625a   1Gi        RWO           Delete          Bound     svt-190/pvc2             51m
root@ip-172-31-47-221: ~ # 


There were some odd errors re:  PVCs in the master logs, but they were for different namespaces.  The entry below is all one message with repeated text.

Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: E0711 15:30:15.657255   21952 factory.go:583] Error scheduling svt-246 deploymentconfig2-1-kn6xx: [SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolum
Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: eClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected
Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: ., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentV
Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: olumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpe

Comment 12 Mike Fiedler 2017-07-11 20:19:24 UTC

Created attachment 1296452 [details]
Master and node syslogs

Comment 16 Hemant Kumar 2017-07-11 21:06:15 UTC

If node is doing attach/detach - controller does not performs verification of detached volumes. That is the mechanism that causes detached volumes to be automatically attached back.

In Openshift-3.6, the default is controller attach/detach.

Comment 17 Mike Fiedler 2017-07-12 00:35:24 UTC

Applied the configuration from https://docs.openshift.org/1.5/install_config/persistent_storage/enabling_controller_attach_detach.html to the node and the scenario worked correctly.

Verified on 3.5.5.5.31.3

Comment 19 errata-xmlrpc 2017-10-25 13:02:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049