Bug 1460388
Summary: | AWS getInstancesByNodeNames is broken for large clusters | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> | ||||
Component: | Node | Assignee: | Hemant Kumar <hekumar> | ||||
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.5.1 | CC: | aos-bugs, bchilds, eparis, jokerman, mmccomas, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.5.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
When describing multiple instances on AWS, we are supplying each node as a filter. This fails to work if cluster is large enough because AWS only allows upto 200 filters to a request.
Consequence:
DescribeInstances calls fail, resulting in broken load balancer and storage functionality in AWS.
Fix:
Implement batching of describeinstance calls to get over the filtering limit.
Result:
DescribeInstances calls work for larger clusters too.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1461865 (view as bug list) | Environment: | |||||
Last Closed: | 2017-10-25 13:02:19 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1461865 | ||||||
Attachments: |
|
Description
Hemant Kumar
2017-06-09 21:58:32 UTC
Assigning this to myself for now. We will have to backport the fix to both 3.5 and 3.6 once available. Link to PR - https://github.com/openshift/ose/pull/788 Tested on v3.5.5.31.3 and there seems to be an issue still. I followed the same procedure used to verify on 3.6: See https://bugzilla.redhat.com/show_bug.cgi?id=1461865#c15 and https://bugzilla.redhat.com/show_bug.cgi?id=1461865#c17 for details and results 1. Installed 202 node cluster at v3.5.5.31.3 2. Created 250 projects with deployment configurations that included Volume and VolumeMount for a PVC which was dynamically bound to an EBS PV 3. In the node syslog (attached), you can see the volume successfully attached to the node and formatted 4. Verified the PV and PVC were Bound 5. In the AWS console, force detached the volume used in namespace svt-190 (/dev/xvdbq), vol id vol-0a70d2fb3b570c842 6. Waited for the volume to be reattached. The volume stayed available in the AWS console and never reattached. The PVC remained in Bound state NAME STATUS VOLUME CAPACITY ACCESSMODES AGE pvc2 Bound pvc-abfad388-666e-11e7-be60-02238eeb625a 1Gi RWO 51m root@ip-172-31-47-221: ~ # oc get pv pvc-abfad388-666e-11e7-be60-02238eeb625a NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE pvc-abfad388-666e-11e7-be60-02238eeb625a 1Gi RWO Delete Bound svt-190/pvc2 51m root@ip-172-31-47-221: ~ # There were some odd errors re: PVCs in the master logs, but they were for different namespaces. The entry below is all one message with repeated text. Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: E0711 15:30:15.657255 21952 factory.go:583] Error scheduling svt-246 deploymentconfig2-1-kn6xx: [SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolum Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: eClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: ., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentV Jul 11 15:30:15 ip-172-31-41-8 atomic-openshift-master: olumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpected., SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "pvc2", which is unexpe Created attachment 1296452 [details]
Master and node syslogs
If node is doing attach/detach - controller does not performs verification of detached volumes. That is the mechanism that causes detached volumes to be automatically attached back. In Openshift-3.6, the default is controller attach/detach. Applied the configuration from https://docs.openshift.org/1.5/install_config/persistent_storage/enabling_controller_attach_detach.html to the node and the scenario worked correctly. Verified on 3.5.5.5.31.3 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049 |