Bug 1327644 - oc/kubectl get events -w showing old events from 2 hours prior
Summary: oc/kubectl get events -w showing old events from 2 hours prior
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Seth Jennings
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-15 14:24 UTC by Mike Fiedler
Modified: 2016-10-26 17:24 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-26 17:24:38 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Mike Fiedler 2016-04-15 14:24:09 UTC
Description of problem:

I've seen this several times but not sure what triggers it.  oc get events -w starts replaying events from 2 hours prior.   In this case, running 50 concurrent builds in 50 different projects with oc get events -w running.   Current events (timestamp = now) come in just fine, but interspersed with the current events are events from exactly (minute/seconds) 2 hours prior.

The events really are old ones - things that happened 2 hours ago.  I've seen cases where they were from DCs that had since been deleted.  Definitely not just bad timestamps.   


See below for an example.


Version-Release number of selected component (if applicable): 3.2.0.15


How reproducible:  Unknown


Steps to Reproduce:
1.  Create 4 node cluster on AWS:  master, infra, 2 builder nodes
2.  Create 50 projects with app django-example in each
3.  Run 50 concurrent builds (all apps, all projects)
4.  Run oc get events --all-namespaces -w 
5.  If the problem triggers, old events from 2 hours prior will be intermixed with current events.

Actual results:


Expected results:


Additional info:

jkl28     2016-04-15 10:13:31 -0400 EDT   2016-04-15 10:13:31 -0400 EDT   1         django-example-3-deploy   Pod                 Warning   FailedSync   {kubelet ip-172-31-45-163.us-west-2.compute.internal}   Error syncing pod, skipping: failed to "TeardownNetwork" for "django-example-3-deploy_jkl28" with TeardownNetworkError: "Failed to teardown network for pod \"875923b7-0313-11e6-8b99-02038bdb70bb\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1"

jkl23     2016-04-15 10:13:32 -0400 EDT   2016-04-15 10:13:32 -0400 EDT   1         django-example-3-deploy   Pod                 Warning   FailedSync   {kubelet ip-172-31-45-163.us-west-2.compute.internal}   Error syncing pod, skipping: failed to "TeardownNetwork" for "django-example-3-deploy_jkl23" with TeardownNetworkError: "Failed to teardown network for pod \"8d143f6f-0313-11e6-8b99-02038bdb70bb\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1"

jkl26     2016-04-15 08:13:34 -0400 EDT   2016-04-15 08:13:34 -0400 EDT   1         django-example-1-hrge5   Pod       spec.containers{django-example}   Normal    Pulling   {kubelet ip-172-31-45-164.us-west-2.compute.internal}   pulling image "172.25.10.176:5000/jkl26/django-example@sha256:b5295d79ae45a8e28c797b235ee3785dbf6d4059eac72d61efcbd4b4e054ac14"
jkl34     2016-04-15 08:13:34 -0400 EDT   2016-04-15 08:13:34 -0400 EDT   1         django-example-1-build   Pod       spec.containers{sti-build}   Normal    Pulled    {kubelet ip-172-31-45-164.us-west-2.compute.internal}   Container image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.0.14" already present on machine
jkl14     2016-04-15 10:13:35 -0400 EDT   2016-04-15 10:13:35 -0400 EDT   1         django-example-3   ReplicationController             Normal    SuccessfulDelete   {replication-controller }   Deleted pod: django-example-3-1idhf
jkl14     2016-04-15 10:13:35 -0400 EDT   2016-04-15 10:13:35 -0400 EDT   1         django-example-3-1idhf   Pod       spec.containers{django-example}   Normal    Killing   {kubelet ip-172-31-45-164.us-west-2.compute.internal}   Killing container with docker id 3454ed728395: Need to kill pod.
jkl24     2016-04-15 08:13:36 -0400 EDT   2016-04-15 08:13:36 -0400 EDT   1         django-example-1-67j7b   Pod       spec.containers{django-example}   Normal    Pulling   {kubelet ip-172-31-45-163.us-west-2.compute.internal}   pulling image "172.25.10.176:5000/jkl24/django-example@sha256:1aba63aeab85314442eef7bd2b5befcfdd90d7c504f1a8520657dca19f796a62"
jkl42     2016-04-15 10:13:36 -0400 EDT   2016-04-15 10:13:36 -0400 EDT   1         django-example-2   ReplicationController             Normal    SuccessfulDelete   {replication-controller }   Deleted pod: django-example-2-760gm
jkl42     2016-04-15 10:13:37 -0400 EDT   2016-04-15 10:13:37 -0400 EDT   1         django-example-2-760gm   Pod       spec.containers{django-example}   Normal    Killing   {kubelet ip-172-31-45-163.us-west-2.compute.internal}   Killing container with docker id 6e892f273928: Need to kill pod.

Comment 1 Mike Fiedler 2016-04-29 18:39:32 UTC
I set kubernetesMasterConfig/apiServerArguments/event-ttl to 1h and the old events appearing in the watch were 1 hour old instead of 2.  It appears that the watch is picking up expiring events as well as new ones.

Comment 2 Seth Jennings 2016-06-16 15:59:43 UTC
I have a theory on this; that is the the garbage collection of the event is actually triggering a watch event on the... event.  Taking a look.

Comment 3 Seth Jennings 2016-06-16 16:13:23 UTC
After thinking about this, I don't think this is a bug.

Take watching pods (a resource type just like events) as an example.  If you are watching pods and a pod is deleted, the watch will return a watch event with type DELETED:

https://github.com/kubernetes/kubernetes/blob/master/pkg/watch/watch.go#L43

kubectl -w just prints a resource when an event of any type is returned by the watch.

To see only new events, one would need to do a watch (via the API) and filter for watch events that have type ADDED; a functionality that kubectl doesn't have currently.

Adding a type filter for watches with kubectl would be a new feature.

Mike, is that sufficient explanation to close?

Comment 4 Mike Fiedler 2016-06-16 18:53:54 UTC
I understand what I am seeing is an event representing the deletion of an event.  The problem I still have is that it looks exactly like the original event, there is nothing to indicate that it is an event that is "going away".  Plus, a straight oc/kubectl get events doesn't show them.  

Any chance of updating the oc/kubectl get events watch to filter them?   Maybe have the watch use the same filters the initial get events has?  My use case for get events -w is always debugging things that are happening now and having old events interleaved with new ones is quite confusing.

Example from today:

2016-06-16 14:52:47 -0400 EDT   2016-06-16 14:52:47 -0400 EDT   1         cakephp-example-51-build   Pod                 Normal    Scheduled   {default-scheduler }   Successfully assigned cakephp-example-51-build to ip-172-31-8-197.ec2.internal
2016-06-16 12:52:47 -0400 EDT   2016-06-16 12:52:47 -0400 EDT   1         cakephp-example-11-build   Pod       spec.containers{sti-build}   Normal    Pulling   {kubelet ip-172-31-8-200.ec2.internal}   pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.1"
2016-06-16 12:52:47 -0400 EDT   2016-06-16 12:52:47 -0400 EDT   1         django-example-11-build   Pod       spec.containers{sti-build}   Normal    Pulling   {kubelet ip-172-31-8-199.ec2.internal}   pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.1"
2016-06-16 14:52:48 -0400 EDT   2016-06-16 14:52:48 -0400 EDT   1         django-example-51-build   Pod       spec.containers{sti-build}   Normal    Pulling   {kubelet ip-172-31-8-198.ec2.internal}   pulling image "registry.qe.openshift.com/openshift3/ose-sti-builder:v3.2.1.1"

Comment 5 Seth Jennings 2016-06-16 19:07:02 UTC
kubectl using a per-resource-type printer function to output the object returned in a watch event.  In the case of pods, the pod has a status field the printer shows (Creating, Running, Terminating).  However an event resource has no "status" per say.  It is just a record of what happened.

The initial listing that kubectl get -w does is a separate API call from the watch.  If you add --v=9 you'll see on API call for the list and a separate one for the watch.

The only thing I can think of is to add a new flag to kubectl get like --watch-type=[ADDED|MODIFIED|DELETED] that would only print watch events of a certain type.

Comment 6 Andy Goldstein 2016-06-27 20:17:11 UTC
Great digging Seth!

Somewhat related issue: https://github.com/kubernetes/kubernetes/issues/17612

Comment 7 Derek Carr 2016-10-26 17:24:38 UTC
This is working as designed.  Feel free to open an RFE to request that you can filter by watch-type per Seth's prior comment.  Closing this bug, and we can prioritize a future RFE (if opened).


Note You need to log in before you can comment on or make changes to this bug.