Description of problem: During a long run of reliability tests I see frequent error messages in master logs. Reliability tests basically use all the quickstart apps and deploy/scale/build/delete them for a week duration. Jan 22 16:51:38 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:51:38.385024 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:51:46 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:51:46.597975 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:51:54 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:51:54.914722 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:05 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:05.532797 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:15 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:15.753759 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:38 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:38.296244 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:42 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:42.803999 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:45 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:45.608847 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:51 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:51.919464 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Jan 22 16:52:54 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:54.423635 11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted Version-Release number of selected component (if applicable): openshift v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 How reproducible: Steps to Reproduce: 1. start creating apps on ocp cluster 2. periodically build delete and create again quickstart apps Actual results: Lot of error messages in master logs Expected results: These messages should be info. Additional info:
To be fixed upstream by https://github.com/kubernetes/kubernetes/pull/68557. After review we can possibly backport.
Per comment 6 and comment 14, checked in 4.2.0-0.nightly-2019-06-26-043058 env: $ oc version Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"9c6dd05e1", GitTreeState:"clean", BuildDate:"2019-06-27T06:30:43Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+b375ce1", GitCommit:"b375ce1", GitTreeState:"clean", BuildDate:"2019-06-26T03:15:38Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} $ oc rsh -n openshift-etcd etcd-member-ip-10-0-128-51.us-east-2.compute.internal sh-4.2# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=`ls /etc/ssl/etcd/*etcd-peer*crt` ETCDCTL_KEY=`ls /etc/ssl/etcd/*etcd-peer*key` sh-4.2# etcdctl version etcdctl version: 3.3.10 API version: 3.3 Though less frequent noise than comment 0, the messages are still seen as error in kube-apiserver pods logs (Not seen in etcd pods logs): $ for i in kube-apiserver-ip-10-0-128-51.us-east-2.compute.internal kube-apiserver-ip-10-0-150-69.us-east-2.compute.internal kube-apiserver-ip-10-0-168-163.us-east-2.compute.internal ; do oc logs $i -c kube-apiserver-10 -n openshift-kube-apiserver > $i.log; done $ grep etcdserver kube-apiserver-ip-10-0-1*.log kube-apiserver-ip-10-0-150-69.us-east-2.compute.internal.log:E0627 07:14:47.506978 1 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted ... kube-apiserver-ip-10-0-168-163.us-east-2.compute.internal.log:E0627 09:21:47.934025 1 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted ... $ grep etcdserver kube-apiserver-ip-10-0-1*.log | grep "E0627.*watch chan error: etcdserver: mvcc: required revision has been compacted" | wc -l 38
@Sam
@Sam can we filter for that error in pkg/storage/etcd3/watcher.go:208 ? Now everything coming back from the etcd client, is printed as error in the log, although we know that compaction messages are completely normal and should be info V(4) or something like that.
Forgot to add: my fix https://github.com/kubernetes/kubernetes/pull/68557 is not addressing this message, but only reduced the number of compactors. So we mixed up the client message due to compaction and the compactors code itself.
Moving out of the blocker list until we have a reason to think otherwise.
The LifecycleStale keyword was removed because the bug got commented on recently. The bug assignee was notified.
I am sorry that this Bug has continued on for so long. I would like to clarify what this log means. This is not a bug it is an accounting of an expected condition between etcd client and server. When a watch is created that watch can be against a specifc revision of the keyspace[1],[2]. Every mutative operation against the keyspace will incriment the revision each time a value is writen to a key it also incriments the keys version. Essentailly the revision counter is an accounting of the keyspace at that time. So if we write to key foo with value bar at revision 1000 with key version 3 then write to key foo with the value baz at revision 1001 at key version 4. The value of the key foo value bar is still available at revision 1000 version 3. So if my watch was was created against revision 999 I can efficently range across revisions with the watch vs a range against keyspace which is more expensive. apiserver and k8s use watch extensivly for this reason. The problem is that if the same key has many versions the db can grow in size causing performance issues. To minimize this growth we compact the keyspace on a set revision. Such that any versions of the keys that existed before that revision no longer exist. So giving the above example # keyspace before compaction revision 998 /foo value: bat version: 1 revision 999 /foo value: buz version: 2 revision 1000 /foo value: bar version: 3 revision 1001 /foo value: baz version: 4 # keyspace after compaction at revision 1000 revision 1000 /foo value: bar version: 3 revision 1001 /foo value: baz version: 4 If a watch is created against a compacted revision IE etcdctl watch / --rev 999 Next the compactor compacts at revision 1000 etcdctl compact 1000 The original watch will continue to exist until cancelled by client but if watcher retries it must fail. That is because we can not create a watch against a revision that does not exist. That revision of the keyspace is no longer available. If for example the client was not very smart and continued to create watch against compacted revision this logging would be useful in debug. It is also useful to pair server and client timestamps of events. > watch chan error: etcdserver: mvcc: required revision has been compacted So this means the watch which was created on an explicit (required) revision has been compacted. While this is perhaps "annoying" in the logs I don't see a clear reason to remove it as the logging is correct and it is technically an error made by the client even if it is somewhat expected. [1] https://github.com/etcd-io/etcd/blob/release-3.2/Documentation/learning/data_model.md [2] https://github.com/etcd-io/etcd/blob/release-3.2/Documentation/learning/api.md#revisions
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days