| Summary: | oc/kubectl get events -w showing old events from 2 hours prior | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> |
| Component: | Node | Assignee: | Seth Jennings <sjenning> |
| Status: | CLOSED NOTABUG | QA Contact: | Mike Fiedler <mifiedle> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.2.0 | CC: | agoldste, aos-bugs, decarr, jokerman, mifiedle, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-26 17:24:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
I set kubernetesMasterConfig/apiServerArguments/event-ttl to 1h and the old events appearing in the watch were 1 hour old instead of 2. It appears that the watch is picking up expiring events as well as new ones. I have a theory on this; that is the the garbage collection of the event is actually triggering a watch event on the... event. Taking a look. After thinking about this, I don't think this is a bug. Take watching pods (a resource type just like events) as an example. If you are watching pods and a pod is deleted, the watch will return a watch event with type DELETED: https://github.com/kubernetes/kubernetes/blob/master/pkg/watch/watch.go#L43 kubectl -w just prints a resource when an event of any type is returned by the watch. To see only new events, one would need to do a watch (via the API) and filter for watch events that have type ADDED; a functionality that kubectl doesn't have currently. Adding a type filter for watches with kubectl would be a new feature. Mike, is that sufficient explanation to close? I understand what I am seeing is an event representing the deletion of an event. The problem I still have is that it looks exactly like the original event, there is nothing to indicate that it is an event that is "going away". Plus, a straight oc/kubectl get events doesn't show them.
Any chance of updating the oc/kubectl get events watch to filter them? Maybe have the watch use the same filters the initial get events has? My use case for get events -w is always debugging things that are happening now and having old events interleaved with new ones is quite confusing.
Example from today:
2016-06-16 14:52:47 -0400 EDT 2016-06-16 14:52:47 -0400 EDT 1 cakephp-example-51-build Pod Normal Scheduled {default-scheduler } Successfully assigned cakephp-example-51-build to ip-172-31-8-197.ec2.internal
2016-06-16 12:52:47 -0400 EDT 2016-06-16 12:52:47 -0400 EDT 1 cakephp-example-11-build Pod spec.containers{sti-build} Normal Pulling {kubelet ip-172-31-8-200.ec2.internal} pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.1"
2016-06-16 12:52:47 -0400 EDT 2016-06-16 12:52:47 -0400 EDT 1 django-example-11-build Pod spec.containers{sti-build} Normal Pulling {kubelet ip-172-31-8-199.ec2.internal} pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.1"
2016-06-16 14:52:48 -0400 EDT 2016-06-16 14:52:48 -0400 EDT 1 django-example-51-build Pod spec.containers{sti-build} Normal Pulling {kubelet ip-172-31-8-198.ec2.internal} pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.1"
kubectl using a per-resource-type printer function to output the object returned in a watch event. In the case of pods, the pod has a status field the printer shows (Creating, Running, Terminating). However an event resource has no "status" per say. It is just a record of what happened. The initial listing that kubectl get -w does is a separate API call from the watch. If you add --v=9 you'll see on API call for the list and a separate one for the watch. The only thing I can think of is to add a new flag to kubectl get like --watch-type=[ADDED|MODIFIED|DELETED] that would only print watch events of a certain type. Great digging Seth! Somewhat related issue: https://github.com/kubernetes/kubernetes/issues/17612 This is working as designed. Feel free to open an RFE to request that you can filter by watch-type per Seth's prior comment. Closing this bug, and we can prioritize a future RFE (if opened). |
Description of problem: I've seen this several times but not sure what triggers it. oc get events -w starts replaying events from 2 hours prior. In this case, running 50 concurrent builds in 50 different projects with oc get events -w running. Current events (timestamp = now) come in just fine, but interspersed with the current events are events from exactly (minute/seconds) 2 hours prior. The events really are old ones - things that happened 2 hours ago. I've seen cases where they were from DCs that had since been deleted. Definitely not just bad timestamps. See below for an example. Version-Release number of selected component (if applicable): 3.2.0.15 How reproducible: Unknown Steps to Reproduce: 1. Create 4 node cluster on AWS: master, infra, 2 builder nodes 2. Create 50 projects with app django-example in each 3. Run 50 concurrent builds (all apps, all projects) 4. Run oc get events --all-namespaces -w 5. If the problem triggers, old events from 2 hours prior will be intermixed with current events. Actual results: Expected results: Additional info: jkl28 2016-04-15 10:13:31 -0400 EDT 2016-04-15 10:13:31 -0400 EDT 1 django-example-3-deploy Pod Warning FailedSync {kubelet ip-172-31-45-163.us-west-2.compute.internal} Error syncing pod, skipping: failed to "TeardownNetwork" for "django-example-3-deploy_jkl28" with TeardownNetworkError: "Failed to teardown network for pod \"875923b7-0313-11e6-8b99-02038bdb70bb\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1" jkl23 2016-04-15 10:13:32 -0400 EDT 2016-04-15 10:13:32 -0400 EDT 1 django-example-3-deploy Pod Warning FailedSync {kubelet ip-172-31-45-163.us-west-2.compute.internal} Error syncing pod, skipping: failed to "TeardownNetwork" for "django-example-3-deploy_jkl23" with TeardownNetworkError: "Failed to teardown network for pod \"8d143f6f-0313-11e6-8b99-02038bdb70bb\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1" jkl26 2016-04-15 08:13:34 -0400 EDT 2016-04-15 08:13:34 -0400 EDT 1 django-example-1-hrge5 Pod spec.containers{django-example} Normal Pulling {kubelet ip-172-31-45-164.us-west-2.compute.internal} pulling image "172.25.10.176:5000/jkl26/django-example@sha256:b5295d79ae45a8e28c797b235ee3785dbf6d4059eac72d61efcbd4b4e054ac14" jkl34 2016-04-15 08:13:34 -0400 EDT 2016-04-15 08:13:34 -0400 EDT 1 django-example-1-build Pod spec.containers{sti-build} Normal Pulled {kubelet ip-172-31-45-164.us-west-2.compute.internal} Container image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.0.14" already present on machine jkl14 2016-04-15 10:13:35 -0400 EDT 2016-04-15 10:13:35 -0400 EDT 1 django-example-3 ReplicationController Normal SuccessfulDelete {replication-controller } Deleted pod: django-example-3-1idhf jkl14 2016-04-15 10:13:35 -0400 EDT 2016-04-15 10:13:35 -0400 EDT 1 django-example-3-1idhf Pod spec.containers{django-example} Normal Killing {kubelet ip-172-31-45-164.us-west-2.compute.internal} Killing container with docker id 3454ed728395: Need to kill pod. jkl24 2016-04-15 08:13:36 -0400 EDT 2016-04-15 08:13:36 -0400 EDT 1 django-example-1-67j7b Pod spec.containers{django-example} Normal Pulling {kubelet ip-172-31-45-163.us-west-2.compute.internal} pulling image "172.25.10.176:5000/jkl24/django-example@sha256:1aba63aeab85314442eef7bd2b5befcfdd90d7c504f1a8520657dca19f796a62" jkl42 2016-04-15 10:13:36 -0400 EDT 2016-04-15 10:13:36 -0400 EDT 1 django-example-2 ReplicationController Normal SuccessfulDelete {replication-controller } Deleted pod: django-example-2-760gm jkl42 2016-04-15 10:13:37 -0400 EDT 2016-04-15 10:13:37 -0400 EDT 1 django-example-2-760gm Pod spec.containers{django-example} Normal Killing {kubelet ip-172-31-45-163.us-west-2.compute.internal} Killing container with docker id 6e892f273928: Need to kill pod.