Bug 1464585
Summary: | Pounding etcd for events?recursive=true | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Paris <eparis> | ||||
Component: | Master | Assignee: | Dan Mace <dmace> | ||||
Status: | CLOSED WONTFIX | QA Contact: | DeShuai Ma <dma> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.6.0 | CC: | aos-bugs, deads, decarr, jokerman, mmccomas, sjenning, trankin | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.6.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-29 13:38:25 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Eric Paris
2017-06-23 20:14:14 UTC
Created attachment 1291284 [details]
journalctl -u atomic-openshift-master-api | grep '/events' | grep GET | grep -v namespaces
Logs trying to see who is hitting the apiserver looking for all events
Tranferring to Dan while I'm out. Adding Derek in case he has likely culprits in mind. Just a quick update here. There seem to be two problems I'm chasing: 1. Reflector-driven controller watches are being reset every 30s rather than every 5-10m, possibly due to external factors (e.g. ELB) which I'm still investigating. 2. During the watch resets (1), the `events` watch established by the un-idling controller appears to be retried immediately many times before settling. Understanding more about (2) also reveals the un-idling controller is probably fetching and caching far more `event` resources than it needs to operate. That said, (1) still seems like a behavior that needs explained ASAP. Looking forward, it's too soon to say what the best way forward with the un-idling controller is in the short term. Upon further inspection, the massive amount of GETs seems to simply be the result of the unidling controller's reflector watch on the events resource: events are generated constantly, resulting in the reflector issuing new GETs to fetch each new record in turn. There are likely some unidentified cascading effects here or some reflector behavior which might explain the duplicate resourceVersion issues. It's also possible that the etcd history limit interplays with bogged down event processing, requiring re-lists to establish a new watch resourceversion. However, I have no evidence of that yet. In the meantime, it might be more fruitful to refactor the unidling controller to stop using event list/watches entirely. https://github.com/openshift/origin/issues/15079 opened to track refactoring the unidling mechanism. closing bug, we can push the refactor in the issue. |