Bug 1298500
Summary: | '--watch' for `oc get` does not work for old resource | ||
---|---|---|---|
Product: | OKD | Reporter: | Xingxing Xia <xxia> |
Component: | Pod | Assignee: | Seth Jennings <sjenning> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Xingxing Xia <xxia> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.x | CC: | agoldste, aos-bugs, avagarwa, ccoleman, decarr, mmccomas |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-19 13:56:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Xingxing Xia
2016-01-14 09:42:53 UTC
The issue is that the "old" resource was last updated outside of the etcd watch window (which is currently 1000). Clayton, is there any way we can have the client code start watching from "now"? Or is there anything else we can do? From an outside observer's standpoint, if you don't know anything about etcd and watch window, it seems like this should "just work" regardless of the age of the resource. The API supports "get then watch" via `?watch=1&resourceVersion=0` which I think is what we need here. And if the API returns an error there, I think that's a bug in the API I have a fix here https://github.com/sjenning/kubernetes/pull/1 I am not saying it is pretty. It parses the etcd error from the api server to get the current index and does a watch with the resourceVersion set to the current index. I'm not seeing a way do to this in a more robust way that doesn't involve for invasive changes. Review/comments/alternative approaches welcome! Hi Seth, Not sure why you closed your PR, please let me know if you need any help and are not working on this actively. Thanks Avesh Avesh, I never actually file the PR against upstream kube (the link is to a repo local PR). The approach there does work and the changes are localized in kubectl, but it is fragile and involves parsing the error message returned by the api server. If that error message changes format, this will break. I guess I could file the PR upstream and see what becomes of it. I'm not currently working on this if you have cycles and are interested. Weren't we going to try to find a way to include the etcd index in an http response header that the client could use? Yes, but I couldn't find a way (not saying there isn't a way). Since the api server doesn't passthrough the response from etcd, the current index doesn't make it to the client. Hi Seth, I will try to reproduce it and see if I can make any progress. As per what Clayton suggested above seems like passing rv = 0 to the following call always in pkg/kubectl/cmd/get.go should do the trick: w, err := r.Watch(rv) However, I have been trying to reproduce it with etcd-2.2.1 and latest kube but can not. Event after 20 hours, watch keeps working. #kubectl get pod hello-pod --watch get.go rv ("36") NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 20h hello-pod 1/1 Running 0 20h And it does not exit. The watch window is not a fixed time but a delta between the resourceVersion of the object and the currentVersion in etcd, which increments when etcd changes. I recreated it using Openshift, which make lots of regular changes to etcd, advancing the currentVersion in a reasonable amount of time. First of all passing rv="0" solves the problem. Also, I am observing that neither the createIndex nor delta theory seems to be working. For example, there are 4 pods: oc get pods NAME READY STATUS RESTARTS AGE bwide-pause-rc-kvyej 1/1 Running 0 1h database-1-deploy 1/1 Running 0 1h database-1-hook-pre 0/1 Running 0 1h hello-openshift 0/1 Running 0 1h For each pod: 1. bwide-pause-rc-kvyej From etcd: "createdIndex": 628, "modifiedIndex": 4300, From oc edit: resourceVersion: 4300 Result: passed rv to Watch in oc get is 4300 and Watch works. 2. database-1-deploy From etcd: "createdIndex": 661, "modifiedIndex": 4301, From oc edit: resourceVersion: 4301 Result: passed rv to Watch in oc get is 4300 and Watch works. 3. database-1-hook-pre From etcd: "createdIndex": 760, "modifiedIndex": 760, From oc edit: resourceVersion: 760 Result: passed rv to Watch in oc get is 760 and Watch does NOT work. 4. hello-openshift "createdIndex": 727, "modifiedIndex": 727, From oc edit: resourceVersion: 727 Result: passed rv to Watch in oc get is 727 and Watch does NOT work. So even if we passed creatIndex which is anyway happening in case 3 and 4, watch would not work. And case 1 and 2 seem to be contradicting the delta theory. Anyway as I said, (based on Clayton's suggestion), I tested passing rv="0" and it works in all cases. I am wondering if it is worth investigating further or the solution with rv=0 seems fine. If the latter, please let me know and I will send a PR upstream. Opened a PR upstream to deal with this: https://github.com/kubernetes/kubernetes/pull/27392 Seth, please cherry-pick the upstream PR to origin, as it's now merged. origin PR has merged VERIFIED in: openshift v1.3.0-alpha.2+dc66809 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git Now it works for old resource: [xxia@pc_vm3 oc]$ oc get pod database-1-m4bc4 --watch NAME READY STATUS RESTARTS AGE database-1-m4bc4 1/1 Running 0 3h ^C[xxia@pc_vm3 oc]$ |