Description of problem: e.g. `oc get pod <pod name> --watch` can watch changes of given pod and the cursor keeps blinking until we press Ctrl + C. But watch fails for old pod ('how old' is not fixed, see 'Additional info') Same issue exists for '/' format: oc get pod/<pod name> --watch Same issue exists for '--watch-only'. Version-Release number of selected component (if applicable): oc/openshift v1.1-766-g2eb4afc kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 How reproducible: Always Steps to Reproduce: 1. oc login, create project 2. Create sth 1> Create new app $ oc new-app -f origin/examples/sample-app/application-template-stibuild.json 2> After some time, create newer resource $ oc create -f origin/examples/hello-openshift/hello-pod.json 3. Check the resource $ oc get pod NAME READY STATUS RESTARTS AGE database-1-1q3lc 1/1 Running 0 2h frontend-1-pohcv 1/1 Running 0 2h frontend-1-tdtd6 1/1 Running 0 2h hello-openshift 1/1 Running 0 28m ruby-sample-build-1-build 0/1 Completed 0 2h 4. Watch resource 1> Watch old pod [xxia@local ~]$ oc get pod database-1-1q3lc --watch NAME READY STATUS RESTARTS AGE database-1-1q3lc 1/1 Running 0 2h [xxia@local ~]$ # <--- `oc get` exit immediately 2> Watch newer pod [xxia@local ~]$ oc get pod hello-openshift --watch NAME READY STATUS RESTARTS AGE hello-openshift 1/1 Running 0 28m ^C[xxia@local ~]$ # <--- `oc get` does not exit. Need press Ctrl + C Actual results: 4. As said in step 4 comment after '#' Expected results: 4. steps 1> and 2> should both watch the resource and not exit unless Ctrl + C pressed Additional info: How 'old' is not fixed because here is a test result where `oc get` exits while watching pod of AGE 15m (Note: here the result is from OSE testing, while the above steps are from Origin testing): [tester@local ~]$ oc get pod hello-openshift --watch NAME READY STATUS RESTARTS AGE hello-openshift 1/1 Running 0 15m [tester@local ~] # <--- `oc get` exit immediately
The issue is that the "old" resource was last updated outside of the etcd watch window (which is currently 1000). Clayton, is there any way we can have the client code start watching from "now"? Or is there anything else we can do? From an outside observer's standpoint, if you don't know anything about etcd and watch window, it seems like this should "just work" regardless of the age of the resource.
The API supports "get then watch" via `?watch=1&resourceVersion=0` which I think is what we need here.
And if the API returns an error there, I think that's a bug in the API
I have a fix here https://github.com/sjenning/kubernetes/pull/1 I am not saying it is pretty. It parses the etcd error from the api server to get the current index and does a watch with the resourceVersion set to the current index. I'm not seeing a way do to this in a more robust way that doesn't involve for invasive changes. Review/comments/alternative approaches welcome!
Hi Seth, Not sure why you closed your PR, please let me know if you need any help and are not working on this actively. Thanks Avesh
Avesh, I never actually file the PR against upstream kube (the link is to a repo local PR). The approach there does work and the changes are localized in kubectl, but it is fragile and involves parsing the error message returned by the api server. If that error message changes format, this will break. I guess I could file the PR upstream and see what becomes of it. I'm not currently working on this if you have cycles and are interested.
Weren't we going to try to find a way to include the etcd index in an http response header that the client could use?
Yes, but I couldn't find a way (not saying there isn't a way). Since the api server doesn't passthrough the response from etcd, the current index doesn't make it to the client.
Hi Seth, I will try to reproduce it and see if I can make any progress.
As per what Clayton suggested above seems like passing rv = 0 to the following call always in pkg/kubectl/cmd/get.go should do the trick: w, err := r.Watch(rv) However, I have been trying to reproduce it with etcd-2.2.1 and latest kube but can not. Event after 20 hours, watch keeps working. #kubectl get pod hello-pod --watch get.go rv ("36") NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 20h hello-pod 1/1 Running 0 20h And it does not exit.
The watch window is not a fixed time but a delta between the resourceVersion of the object and the currentVersion in etcd, which increments when etcd changes. I recreated it using Openshift, which make lots of regular changes to etcd, advancing the currentVersion in a reasonable amount of time.
First of all passing rv="0" solves the problem. Also, I am observing that neither the createIndex nor delta theory seems to be working. For example, there are 4 pods: oc get pods NAME READY STATUS RESTARTS AGE bwide-pause-rc-kvyej 1/1 Running 0 1h database-1-deploy 1/1 Running 0 1h database-1-hook-pre 0/1 Running 0 1h hello-openshift 0/1 Running 0 1h For each pod: 1. bwide-pause-rc-kvyej From etcd: "createdIndex": 628, "modifiedIndex": 4300, From oc edit: resourceVersion: 4300 Result: passed rv to Watch in oc get is 4300 and Watch works. 2. database-1-deploy From etcd: "createdIndex": 661, "modifiedIndex": 4301, From oc edit: resourceVersion: 4301 Result: passed rv to Watch in oc get is 4300 and Watch works. 3. database-1-hook-pre From etcd: "createdIndex": 760, "modifiedIndex": 760, From oc edit: resourceVersion: 760 Result: passed rv to Watch in oc get is 760 and Watch does NOT work. 4. hello-openshift "createdIndex": 727, "modifiedIndex": 727, From oc edit: resourceVersion: 727 Result: passed rv to Watch in oc get is 727 and Watch does NOT work. So even if we passed creatIndex which is anyway happening in case 3 and 4, watch would not work. And case 1 and 2 seem to be contradicting the delta theory. Anyway as I said, (based on Clayton's suggestion), I tested passing rv="0" and it works in all cases.
I am wondering if it is worth investigating further or the solution with rv=0 seems fine. If the latter, please let me know and I will send a PR upstream.
Opened a PR upstream to deal with this: https://github.com/kubernetes/kubernetes/pull/27392
Seth, please cherry-pick the upstream PR to origin, as it's now merged.
Origin PR: https://github.com/openshift/origin/pull/10213
origin PR has merged
VERIFIED in: openshift v1.3.0-alpha.2+dc66809 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git Now it works for old resource: [xxia@pc_vm3 oc]$ oc get pod database-1-m4bc4 --watch NAME READY STATUS RESTARTS AGE database-1-m4bc4 1/1 Running 0 3h ^C[xxia@pc_vm3 oc]$