Bug 1486523
| Summary: | New EBS PVs sometime can't attach and result in errors in events & multiple retries | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Online | Reporter: | jchevret | ||||
| Component: | Storage | Assignee: | Tomas Smetana <tsmetana> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Jianwei Hou <jhou> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.x | CC: | aos-bugs, aos-storage-staff, jchevret, ldimaggi, xtian | ||||
| Target Milestone: | --- | Keywords: | OnlineStarter | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-10-12 14:14:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
jchevret
2017-08-30 02:50:47 UTC
It's hard to guess without logs what is going on. The one known problem we are solving right now is described in the bug #1481729: If AWS takes long time to attach a volume to a node and the pod requesting the volume is deleted in the meantime the volume gets attached but the controller will never (== after long timeout) detach it since it considers it mounted. If this is the same problem then I have a fix suggested: however the patch basically adds a synchronization "mechanism" between kubelet and the controller so I'm fixing the kubelet and ADC tests for it and I expect some discussions around it in the upstream too (too many components involved). This issue affects: https://github.com/openshiftio/openshift.io/issues/666 Created attachment 1320280 [details] OpenShift.io Che pod log attached OpenShift.io Che pod log from related issue: https://github.com/openshiftio/openshift.io/issues/666 These are event logs... There is nothing much to discover there: do we have the controller and kubelet logs? What was actually the instance "i-0e724bdbe7dea5968"? It seems like something grabbed the newly created pvc as soon as it was created... We need the logs from kubelet on the affected nodes and from controller on master. It is not possible to deduce what is going on here just from the pod events. Marking as "UpcomingRelease". I'm tempted to close this one with "Insufficient data". However: we have discovered we run into API quota issues on the online cluster. I think it might explain the cause of the problem. I have not seen this issue again since the last cluster upgrades. Lets close and I will re-open /w the requested logs if the issue comes back. OK. Thanks for the response. |