Description of problem: After Upgrade from 4.6.44 to 4.8.28 Openshift OVN-Kubernetes ovnkube-master pods getting restarted multiple times with the error " ovsdb-server/memory-trim-on-compaction on'' failed: exit status 1 and " ovndbchecker.go:118] unable to turn on memory trimming for SB DB, stderr " , cluster unavailable Particularly ovn sbdb and also we have some errors on etcd which makes the ingress unstable Version-Release number of selected component (if applicable): Openshift 4.8.28 AZURE IPI How reproducible: Not Sure what has happened so not sure if it can be reproduced Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: After migrated to 4.8.28 this version customer started to observe this cluster instability issues like etcd leader election , ingress degraded and console down , authentication and OVN-kubernetes etc..
Moving back the status to ON_QA, unless there's something that Mohamed's PR is not capturing. I suppose the status change happened by accident.
There are no logs for lifecycle postStart hooks on success, so I can't see the postStart script run. I don't see "|ovsdb_server|INFO|memory trimming after compaction enabled." in the [ns]bdb container logs, so I can't tell if compaction is enabled. I haven't seen any compaction Database compaction took %"PRIu64"ms" messages yet. There is no retry loop in the [[ -f $OVNNBDB_CTL ]], so this is probably failing silently.
It would be nice to have a command to get the value of "ovsdb-server/memory-trim-on-compaction". There shouldn't be a "set" without a corresponding "get" to verify.
(In reply to Ross Brattain from comment #7) > It would be nice to have a command to get the value of > "ovsdb-server/memory-trim-on-compaction". There shouldn't be a "set" > without a corresponding "get" to verify. if turning on memory trimming failed u will see errors in the events since this is done in PostStart hook the assumption here is ovnnbdb_ctl should have already been there hence no retry, did u check the nbdb and sbdb logs on the elected leader ?
I checked all the logs. I can't get a timestamp of when the postStart hook occurred so I can't compare when it occurred with the timestamps in the log. Maybe there is a logging race? Hence the comment about a "get" command, logs are not reliable for checking current state.
testing notes: editing CNO bindata in running container. oc debug node/$(oc -n openshift-network-operator get pod -l name=network-operator -o jsonpath='{.items[0].spec.nodeName}') chroot /host # edit bindata inside running CNO container vi /proc/$(pgrep -f cluster-network-operator)/root/bindata/network/ovn-kubernetes/self-hosted/ovnkube-master.yaml exit # delete Daemonset to have CNO re-render oc delete -n openshift-ovn-kubernetes daemonset.apps/ovnkube-master # delete CNO to revert oc delete -n openshift-network-operator pod -l name=network-operator
Another issue, log spam is going to break things. We need a set-once type of thing. 2022-04-04T17:01:44.917Z|00115|timeval|WARN|disk: 0 reads, 72 writes 2022-04-04T17:01:44.917Z|00116|timeval|WARN|context switches: 17 voluntary, 9 involuntary 2022-04-04T17:01:44.917Z|00117|coverage|INFO|Skipping details of duplicate event coverage for hash=ddd04a25 2022-04-04T17:01:46.378Z|00118|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:01:56.344Z|00119|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:02:06.345Z|00120|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:02:16.349Z|00121|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:02:26.331Z|00010|timeval(log_fsync3)|WARN|Unreasonably long 1139ms poll interval (0ms user, 0ms system) 2022-04-04T17:02:26.331Z|00011|timeval(log_fsync3)|WARN|context switches: 5 voluntary, 0 involuntary 2022-04-04T17:02:26.331Z|00012|coverage(log_fsync3)|INFO|Skipping details of duplicate event coverage for hash=ddd04a25 2022-04-04T17:02:26.368Z|00122|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:02:36.345Z|00123|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:02:46.370Z|00124|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:02:56.339Z|00125|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:03:02.730Z|00013|timeval(log_fsync3)|WARN|Unreasonably long 3904ms poll interval (0ms user, 0ms system) 2022-04-04T17:03:02.730Z|00014|timeval(log_fsync3)|WARN|context switches: 6 voluntary, 0 involuntary 2022-04-04T17:03:02.730Z|00015|coverage(log_fsync3)|INFO|Skipping details of duplicate event coverage for hash=ddd04a25 2022-04-04T17:03:06.346Z|00126|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:03:16.345Z|00127|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:03:26.346Z|00128|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:03:36.338Z|00129|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:03:46.359Z|00130|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:03:56.350Z|00131|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:04:06.339Z|00132|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:04:16.343Z|00133|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:04:26.346Z|00134|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:04:36.343Z|00135|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:04:46.344Z|00136|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:04:56.342Z|00137|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:05:06.341Z|00138|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:05:16.350Z|00139|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:05:26.341Z|00140|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:05:36.350Z|00141|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:05:46.346Z|00142|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:05:56.344Z|00143|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:06:06.345Z|00144|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:06:16.348Z|00145|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:06:26.338Z|00146|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:06:36.340Z|00147|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:06:46.340Z|00148|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:06:56.343Z|00149|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:07:06.393Z|00150|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:07:16.358Z|00151|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:07:26.343Z|00152|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:07:36.344Z|00153|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:07:46.343Z|00154|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:07:56.343Z|00155|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:08:06.358Z|00156|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:08:16.352Z|00157|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:08:26.368Z|00158|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:08:36.399Z|00159|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:08:46.349Z|00160|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:08:56.352Z|00161|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:09:06.363Z|00162|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:09:16.351Z|00163|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:09:26.344Z|00164|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:09:36.346Z|00165|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:09:46.351Z|00166|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:09:56.344Z|00167|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:10:06.341Z|00168|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:10:16.340Z|00169|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:10:26.354Z|00170|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:10:36.344Z|00171|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:10:46.344Z|00172|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:10:56.357Z|00173|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:11:06.349Z|00174|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:11:16.349Z|00175|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:11:26.353Z|00176|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:11:36.353Z|00177|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:11:46.348Z|00178|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:11:56.353Z|00179|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:12:06.355Z|00180|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:12:16.351Z|00181|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:12:26.349Z|00182|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:12:36.343Z|00183|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:12:46.357Z|00184|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:12:56.342Z|00185|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:13:06.341Z|00186|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:13:16.348Z|00187|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:13:26.341Z|00188|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:13:36.352Z|00189|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:13:46.346Z|00190|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:13:56.354Z|00191|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:14:06.350Z|00192|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:14:16.365Z|00193|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:14:26.347Z|00194|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:14:36.350Z|00195|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:14:46.425Z|00196|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:14:56.342Z|00197|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:15:06.343Z|00198|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:15:16.344Z|00199|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:15:26.341Z|00200|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:15:36.350Z|00201|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:15:46.348Z|00202|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:15:56.345Z|00203|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:16:06.350Z|00204|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:16:16.353Z|00205|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:16:26.353Z|00206|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:16:36.351Z|00207|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:16:46.351Z|00208|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:16:56.354Z|00209|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:17:06.343Z|00210|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:17:16.355Z|00211|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:17:26.343Z|00212|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:17:36.348Z|00213|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:17:46.344Z|00214|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:17:56.403Z|00215|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:18:06.343Z|00216|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:18:16.346Z|00217|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:18:26.346Z|00218|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:18:36.359Z|00219|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:18:46.347Z|00220|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:18:56.349Z|00221|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:19:06.350Z|00222|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:19:16.366Z|00223|ovsdb_server|INFO|memory trimming after compaction enabled. 2022-04-04T17:19:26.361Z|00224|ovsdb_server|INFO|memory trimming after compaction enabled.
That was on 4.11.0-0.ci-2022-04-04-131844
Okay, so I don't know if the log spam will have any functional impact. ~90 bytes every 10 seconds is ~32k per hour, ~759k per day, so it's not like it will impact container log storage. At least it will annoying clear that memory trimming is in fact enabled. I assume the memory trim itself is saving use lots of MB. Moving to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069