Bug 1537270 - logspam: etcdserver: mvcc: required revision has been compacted
Summary: logspam: etcdserver: mvcc: required revision has been compacted
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Sam Batschelet
QA Contact: Ke Wang
URL:
Whiteboard: LifecycleReset
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-22 19:38 UTC by Vikas Laad
Modified: 2024-02-04 04:25 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-29 15:46:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 68557 0 None closed apiserver: start only one compactor per unique storagebackend transport config 2021-02-18 12:50:11 UTC
Red Hat Bugzilla 1819103 0 high CLOSED openshift-apiserver pod has lots of spam logs "E0331...etcdserver: mvcc: required revision has been compacted" 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 3390441 0 None None None 2019-01-01 09:15:37 UTC

Description Vikas Laad 2018-01-22 19:38:13 UTC
Description of problem:
During a long run of reliability tests I see frequent error messages in master logs. Reliability tests basically use all the quickstart apps and deploy/scale/build/delete them for a week duration.

Jan 22 16:51:38 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:51:38.385024   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:51:46 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:51:46.597975   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:51:54 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:51:54.914722   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:05 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:05.532797   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:15 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:15.753759   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:38 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:38.296244   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:42 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:42.803999   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:45 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:45.608847   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:51 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:51.919464   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
Jan 22 16:52:54 ip-172-31-1-182.us-west-2.compute.internal atomic-openshift-master-api[11590]: E0122 16:52:54.423635   11590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted

Version-Release number of selected component (if applicable):
openshift v3.9.0-0.20.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

How reproducible:


Steps to Reproduce:
1. start creating apps on ocp cluster
2. periodically build delete and create again quickstart apps


Actual results:
Lot of error messages in master logs

Expected results:
These messages should be info.

Additional info:

Comment 2 Stefan Schimanski 2018-09-12 10:00:10 UTC
To be fixed upstream by https://github.com/kubernetes/kubernetes/pull/68557. After review we can possibly backport.

Comment 16 Xingxing Xia 2019-06-27 09:43:14 UTC
Per comment 6 and comment 14, checked in 4.2.0-0.nightly-2019-06-26-043058 env:
$ oc version
Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"9c6dd05e1", GitTreeState:"clean", BuildDate:"2019-06-27T06:30:43Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+b375ce1", GitCommit:"b375ce1", GitTreeState:"clean", BuildDate:"2019-06-26T03:15:38Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}

$ oc rsh -n openshift-etcd etcd-member-ip-10-0-128-51.us-east-2.compute.internal
sh-4.2# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=`ls /etc/ssl/etcd/*etcd-peer*crt` ETCDCTL_KEY=`ls /etc/ssl/etcd/*etcd-peer*key`      
sh-4.2# etcdctl version                                     
etcdctl version: 3.3.10
API version: 3.3

Though less frequent noise than comment 0, the messages are still seen as error in kube-apiserver pods logs (Not seen in etcd pods logs):
$ for i in kube-apiserver-ip-10-0-128-51.us-east-2.compute.internal kube-apiserver-ip-10-0-150-69.us-east-2.compute.internal kube-apiserver-ip-10-0-168-163.us-east-2.compute.internal ; do oc logs $i -c kube-apiserver-10 -n openshift-kube-apiserver > $i.log; done

$ grep etcdserver kube-apiserver-ip-10-0-1*.log
kube-apiserver-ip-10-0-150-69.us-east-2.compute.internal.log:E0627 07:14:47.506978       1 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
...
kube-apiserver-ip-10-0-168-163.us-east-2.compute.internal.log:E0627 09:21:47.934025       1 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
...

$ grep etcdserver kube-apiserver-ip-10-0-1*.log | grep "E0627.*watch chan error: etcdserver: mvcc: required revision has been compacted" | wc -l
38

Comment 17 Stefan Schimanski 2019-06-28 09:21:26 UTC
@Sam

Comment 18 Stefan Schimanski 2019-06-28 09:24:34 UTC
@Sam can we filter for that error in pkg/storage/etcd3/watcher.go:208 ? Now everything coming back from the etcd client, is printed as error in the log, although we know that compaction messages are completely normal and should be info V(4) or something like that.

Comment 19 Stefan Schimanski 2019-06-28 09:26:01 UTC
Forgot to add: my fix https://github.com/kubernetes/kubernetes/pull/68557 is not addressing this message, but only reduced the number of compactors. So we mixed up the client message due to compaction and the compactors code itself.

Comment 24 Stefan Schimanski 2019-08-01 10:05:48 UTC
Moving out of the blocker list until we have a reason to think otherwise.

Comment 41 Michal Fojtik 2020-09-01 21:59:31 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 46 Sam Batschelet 2020-09-29 15:46:57 UTC
I am sorry that this Bug has continued on for so long. I would like to clarify what this log means. This is not a bug it is an accounting of an expected condition between etcd client and server.

When a watch is created that watch can be against a specifc revision of the keyspace[1],[2]. Every mutative operation against the keyspace will incriment the revision each time a value is writen to a key it also incriments the keys version. Essentailly the revision counter is an accounting of the keyspace at that time.  So if we write to key foo with value bar at revision 1000 with key version 3 then write to key foo with the value baz at revision 1001 at key version 4. The value of the key foo value bar is still available at revision 1000 version 3. So if my watch was was created against revision 999 I can efficently range across revisions with the watch vs a range against keyspace which is more expensive. apiserver and k8s use watch extensivly for this reason. The problem is that if the same key has many versions the db can grow in size causing performance issues. To minimize this growth we compact the keyspace on a set revision. Such that any versions of the keys that existed before that revision no longer exist. So giving the above example


# keyspace before compaction
revision 998 /foo value: bat version: 1
revision 999 /foo value: buz version: 2
revision 1000 /foo value: bar version: 3
revision 1001 /foo value: baz version: 4

# keyspace after compaction at revision 1000
revision 1000 /foo value: bar version: 3
revision 1001 /foo value: baz version: 4

If a watch is created against a compacted revision IE 

etcdctl watch / --rev 999

Next the compactor compacts at revision 1000

etcdctl compact 1000

The original watch will continue to exist until cancelled by client but if watcher retries it must fail. That is because we can not create a watch against a revision that does not exist. That revision of the keyspace is no longer available. If for example the client was not very smart and continued to create watch against compacted revision this logging would be useful in debug. It is also useful to pair server and client timestamps of events.

> watch chan error: etcdserver: mvcc: required revision has been compacted

So this means the watch which was created on an explicit (required) revision has been compacted. 

While this is perhaps "annoying" in the logs I don't see a clear reason to remove it as the logging is correct and it is technically an error made by the client even if it is somewhat expected.

[1] https://github.com/etcd-io/etcd/blob/release-3.2/Documentation/learning/data_model.md
[2] https://github.com/etcd-io/etcd/blob/release-3.2/Documentation/learning/api.md#revisions

Comment 48 Red Hat Bugzilla 2024-02-04 04:25:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.