This is in reference to https://github.com/kubernetes/kubernetes/issues/102464 - where a WebSocket Watch of Service resources simply stops receiving events, but is not closed. Since the WebSocket remains open and responsive to ping/pong after the timeout, the client cannot reliably detect that things are in a bad state. See the upstream issue for a reproducer.
This needs a 1.21.x rebase (fix landed 5 days ago in upstream).
I'm waiting for 1.21.2 and will update it as soon as it's out.
https://github.com/kubernetes/kubernetes/pull/102541 hasn't merged as of yet, we'll probably get that in 1.21.3
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
This will be fixed in https://github.com/openshift/kubernetes/pull/888
The previous PR landed, moving to modified.
The fixed PR https://github.com/openshift/kubernetes/pull/888 is on 4.8, move the Target Release -> 4.8.z
Wrong PR number, we got it in https://github.com/openshift/kubernetes/pull/935 which is 4.7
Verification steps, Created Websocket test client program testws-cm.go under <$GOPATH>/src/github.com/openshift/kubernetes/test $ cat testws-cm.go package main import ( "bytes" "io" "log" "net/http" "path/filepath" "golang.org/x/net/websocket" "k8s.io/client-go/kubernetes" "k8s.io/client-go/rest" "k8s.io/client-go/tools/clientcmd" "k8s.io/client-go/util/homedir" ) func main() { var home = homedir.HomeDir() var kubeconfig = filepath.Join(home, ".kube", "config") config, err := clientcmd.BuildConfigFromFlags("", kubeconfig) if err != nil { panic(err.Error()) } clientset, err := kubernetes.NewForConfig(config) if err != nil { panic(err.Error()) } var url = clientset.RESTClient().Get().URL() apiURL := "wss://" + url.Hostname() + ":" + url.Port() + "/api/v1/configmaps?watch=true&timeoutSeconds=5" wsc, err := websocket.NewConfig(apiURL, apiURL) if err != nil { log.Fatal(err) } wsc.TlsConfig, err = rest.TLSConfigFor(config) if err != nil { log.Fatal(err) } wsc.Header, err = headersForConfig(config) if err != nil { log.Fatal(err) } wsConn, err := websocket.DialConfig(wsc) if err != nil { log.Fatal(err) } defer wsConn.Close() buf := &bytes.Buffer{} for { var msg []byte if err := websocket.Message.Receive(wsConn, &msg); err != nil { if err == io.EOF { break } log.Fatalf("Failed to read completely from websocket %v", err) } if len(msg) == 0 { continue } log.Printf("Read %v %v", len(msg), string(msg)) buf.Write(msg) } if buf.String() != "container is alive\n" { log.Fatalf("Unexpected websocket logs:\n%s", buf.String()) } } type extractRT struct { http.Header } func (rt *extractRT) RoundTrip(req *http.Request) (*http.Response, error) { rt.Header = req.Header return nil, nil } func headersForConfig(c *rest.Config) (http.Header, error) { extract := &extractRT{} rt, err := rest.HTTPWrappersForConfig(c, extract) if err != nil { return nil, err } if _, err := rt.RoundTrip(&http.Request{}); err != nil { return nil, err } return extract.Header, nil } Change line ‘apiURL := "wss://" + url.Hostname() + ":" + url.Port() + "/api/v1/configmaps?watch=true&timeoutSeconds=5"’ watching object with services and save as testws-s.go. Build them with go command line, go build testws-cm.go go build testws-s.go Set the ~/.kube/config with your installed OCP cluster kubeconfig, and run Websocket client programs against the cluster, Try on 4.7 nightly payload, $ oc get clusterversion --kubeconfig=/home/kewang/.kube/config NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-10-18-191324 True False 31m Cluster version is 4.7.0-0.nightly-2021-10-18-191324 $ oc get no --kubeconfig=/home/kewang/.kube/config NAME STATUS ROLES AGE VERSION ci-ln-l07w5mt-f76d1-6zzz7-master-0 Ready master 51m v1.20.0+bbbc079 ci-ln-l07w5mt-f76d1-6zzz7-master-1 Ready master 51m v1.20.0+bbbc079 ci-ln-l07w5mt-f76d1-6zzz7-master-2 Ready master 51m v1.20.0+bbbc079 ci-ln-l07w5mt-f76d1-6zzz7-worker-a-z82nq Ready worker 41m v1.20.0+bbbc079 ci-ln-l07w5mt-f76d1-6zzz7-worker-b-2h9sj Ready worker 41m v1.20.0+bbbc079 ci-ln-l07w5mt-f76d1-6zzz7-worker-c-qhn69 Ready worker 41m v1.20.0+bbbc079 $ time ./testws-cm ... real 0m8.704s user 0m0.375s sys 0m0.406s $ time ./testws-s ... real 0m5.716s user 0m0.027s sys 0m0.028s The websocket watches services object after the timeout, it will terminate as expected, so move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.36 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3931