1334501 – Watch cache regression: changes behavior of "resource version too old" error

Bug 1334501 - Watch cache regression: changes behavior of "resource version too old" error

Summary: Watch cache regression: changes behavior of "resource version too old" error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.2.1
Assignee:	Jordan Liggitt
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-09 19:05 UTC by Jessica Forrester
Modified:	2016-06-27 15:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously when etcd watch cache was enabled, the API server would deliver a 410 HTTP response when a watch was attempted with a resourceVersion that was too old. The expected result was a 200 HTTP status, with a single watch event of type ERROR. This bug fix updates the API server to produce the same results in this case, regardless of whether watch cache is enabled. The "410 Gone" error is now returned as a watch error event, rather than as a HTTP 410 response.
Clone Of:
Environment:
Last Closed:	2016-06-27 15:07:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1343	0	normal	SHIPPED_LIVE	Red Hat OpenShift Enterprise 3.2.1.1 bug fix and enhancement update	2016-06-27 19:04:05 UTC

Description Jessica Forrester 2016-05-09 19:05:43 UTC

Tracking this kubernetes issue https://github.com/kubernetes/kubernetes/issues/25151

Comment 1 Jordan Liggitt 2016-05-09 21:28:00 UTC

Fixes in 
https://github.com/openshift/ose/pull/215
https://github.com/openshift/origin/pull/8810

Comment 5 Qixuan Wang 2016-06-07 16:33:19 UTC

https://github.com/kubernetes/kubernetes/pull/25369

Could you give me some suggestions on how to verify this bug? Thanks.

Comment 6 Jordan Liggitt 2016-06-07 18:19:01 UTC

The steps in https://github.com/kubernetes/kubernetes/issues/25151#issue-153080519 should be good... request a watch on a version that is too old, and ensure the 410 Gone error is returned as a watch error event, rather than as a HTTP 410 response.

To make testing easier, set the watch cache size to something low for a particular resource type, and create enough of that resource type to exceed the cache size, then request resourceVersion=1

Comment 7 DeShuai Ma 2016-06-12 06:33:52 UTC

Test on openshift v3.2.1.1-1-g33fa4ea

Steps to verfify:
1. enable watch-cache
kubernetesMasterConfig:
  apiServerArguments:
    watch-cache: ["true"]
    watch-cache-sizes: ["builds#50","deploymentconfigs#50"]

2. watch a old resource by curl, should return a Gone 410 event.
    [root@dhcp-128-7 Desktop]# curl -k -vvv -H "Authorization: Bearer 82z8aFmWWHrzBH8-nwPPgRxs2sLepbw0re75hqaJgTs" "https://104.197.173.141:8443/oapi/v1/namespaces/dma/builds?watch=1&resourceVersion=1"
    * About to connect() to 104.197.173.141 port 8443 (#0)
    *   Trying 104.197.173.141... connected
    * Connected to 104.197.173.141 (104.197.173.141) port 8443 (#0)
    * Initializing NSS with certpath: sql:/etc/pki/nssdb
    * warning: ignoring value of ssl.verifyhost
    * skipping SSL peer certificate verification
    * NSS: client certificate not found (nickname not specified)
    * SSL connection using TLS_RSA_WITH_AES_128_CBC_SHA
    * Server certificate:
    *       subject: CN=10.240.0.29
    *       start date: Jun 12 03:16:37 2016 GMT
    *       expire date: Jun 12 03:16:38 2018 GMT
    *       common name: 10.240.0.29
    *       issuer: CN=openshift-signer@1465701391
    > GET /oapi/v1/namespaces/dma/builds?watch=1&resourceVersion=1 HTTP/1.1
    > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.3.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
    > Host: 104.197.173.141:8443
    > Accept: */*
    > Authorization: Bearer 82z8aFmWWHrzBH8-nwPPgRxs2sLepbw0re75hqaJgTs
    >
    < HTTP/1.1 200 OK
    < Cache-Control: no-store
    < Transfer-Encoding: chunked
    < Date: Sun, 12 Jun 2016 05:40:07 GMT
    < Content-Type: text/plain; charset=utf-8
    < Transfer-Encoding: chunked
    <
    {"type":"ERROR","object":{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"too old resource version: 1 (1046)","reason":"Gone","code":410}}
    * Connection #0 to host 104.197.173.141 left intact
    * Closing connection #0

Comment 9 errata-xmlrpc 2016-06-27 15:07:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1343

Note You need to log in before you can comment on or make changes to this bug.