Bug 1460388 - AWS getInstancesByNodeNames is broken for large clusters
AWS getInstancesByNodeNames is broken for large clusters
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.5.1
Unspecified Unspecified
unspecified Severity high
: ---
: 3.5.z
Assigned To: Hemant Kumar
Mike Fiedler
:
Depends On:
Blocks: 1461865
  Show dependency treegraph
 
Reported: 2017-06-09 17:58 EDT by Hemant Kumar
Modified: 2017-10-25 09:02 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When describing multiple instances on AWS, we are supplying each node as a filter. This fails to work if cluster is large enough because AWS only allows upto 200 filters to a request. Consequence: DescribeInstances calls fail, resulting in broken load balancer and storage functionality in AWS. Fix: Implement batching of describeinstance calls to get over the filtering limit. Result: DescribeInstances calls work for larger clusters too.
Story Points: ---
Clone Of:
: 1461865 (view as bug list)
Environment:
Last Closed: 2017-10-25 09:02:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Master and node syslogs (7.44 MB, application/x-gzip)
2017-07-11 16:19 EDT, Mike Fiedler
no flags Details

  None (edit)
Description Hemant Kumar 2017-06-09 17:58:32 EDT
From CloudTrail logs:

    "errorMessage": "The maximum number of filter values specified on a single call is 200",

This should affect all features (such as Load Balancers, Storage) which rely on this function to work.


I have opened an upstream ticket as well - 
https://github.com/kubernetes/kubernetes/issues/47271
Comment 1 Hemant Kumar 2017-06-09 17:59:37 EDT
Assigning this to myself for now.
Comment 2 Hemant Kumar 2017-06-09 18:01:43 EDT
We will have to backport the fix to both 3.5 and 3.6 once available.
Comment 3 Hemant Kumar 2017-06-15 09:32:11 EDT
Link to PR - https://github.com/openshift/ose/pull/788
Comment 11 Mike Fiedler 2017-07-11 16:18:49 EDT
Tested on v3.5.5.31.3 and there seems to be an issue still.

I followed the same procedure used to verify on 3.6:  See https://bugzilla.redhat.com/show_bug.cgi?id=1461865#c15 and https://bugzilla.redhat.com/show_bug.cgi?id=1461865#c17 for details and results

1.  Installed 202 node cluster at v3.5.5.31.3
2.  Created 250 projects with deployment configurations that included Volume and VolumeMount for a PVC which was dynamically bound to an EBS PV
3.  In the node syslog (attached), you can see the volume successfully attached to the node and formatted
4. Verified the PV and PVC were Bound
5. In the AWS console, force detached the volume used in namespace svt-190 (/dev/xvdbq), vol id vol-0a70d2fb3b570c842
6. Waited for the volume to be reattached.

The volume stayed available in the AWS console and never reattached.   The PVC remained in Bound state

NAME      STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
pvc2      Bound     pvc-abfad388-666e-11e7-be60-02238eeb625a   1Gi        RWO           51m
root@ip-172-31-47-221: ~ # oc get pv pvc-abfad388-666e-11e7-be60-02238eeb625a
NAME                                       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM          REASON    AGE
pvc-abfad388-666e-11e7-be60-02238eeb625a   1Gi        RWO           Delete          Bound     svt-190/pvc2             51m
root@ip-172-31-47-221: ~ # 


There were some odd errors re:  PVCs in the master logs, but they were for different namespaces.  The entry below is all one message with repeated text.

Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: E0711 15:30:15.657255   21952 factory.go:583] Error scheduling svt-246 deploymentconfig2-1-kn6xx: [SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolum
Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: eClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected
Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: ., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentV
Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: olumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpe
Comment 12 Mike Fiedler 2017-07-11 16:19 EDT
Created attachment 1296452 [details]
Master and node syslogs
Comment 16 Hemant Kumar 2017-07-11 17:06:15 EDT
If node is doing attach/detach - controller does not performs verification of detached volumes. That is the mechanism that causes detached volumes to be automatically attached back.

In Openshift-3.6, the default is controller attach/detach.
Comment 17 Mike Fiedler 2017-07-11 20:35:24 EDT
Applied the configuration from https://docs.openshift.org/1.5/install_config/persistent_storage/enabling_controller_attach_detach.html to the node and the scenario worked correctly.

Verified on 3.5.5.5.31.3
Comment 19 errata-xmlrpc 2017-10-25 09:02:19 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049

Note You need to log in before you can comment on or make changes to this bug.