Description of problem: Hi , i am trying to verify log retention policy within specific time frame(eg. 60min) for all 3 log sources(app/infra/audit) as per below Doc. and observed the application logs only retain approximately 1hr(sometimes 45min) data. but audit and infra logs was retaining (2 - 3hr) and (18-19hr) data respectively, even though its configured as 60min retention Policy in logging instance yaml. is it intended for audit and infra logs OR is different to what the document suggests ? https://docs.openshift.com/container-platform/4.6/logging/config/cluster-logging-log-store.html i have considered last retained data timestamp by sort by timestamp in ascending order , and sort by descending order as current timestamp in Kibana. screenshots attached for all log sources and retention time. Version-Release number of selected component (if applicable): OCP 4.6 How reproducible: Steps to Reproduce: 1. Setup cluster logging instance 2. edit logging instance to add retention policy(oc edit clusterlogging instance) 3.add timestamp as selected field for all index(app/infra/audit), sort to know the last timestamp of retained data in kibana. Actual results: Expected results: should retain data as per the time frame mentioned in logging instance. Additional info:
Created attachment 1729008 [details] app log last timestamp
Created attachment 1729009 [details] app log current timestamp
Created attachment 1729010 [details] audit log last timestamp
Created attachment 1729011 [details] audit log current timestamp
Created attachment 1729012 [details] infra log last timestamp
Created attachment 1729013 [details] infra log current timestamp
Created attachment 1729014 [details] retention policy yaml config
Hi, please find the updated observation on log retention within below time frame. I had set retention time like( app-2days, infra- 3hrs, audit- 3hrs) in "oc edit clusterlogging instance" yaml. and observed log-type | configured-time | actual-retention-time(in approx.) ========================================================= application - 2days- 2days // OK infra - 3hours- 3hours // OK audit - 3hours - (4-5)hours // NOT OK- expected 3hrs , but its deleting logs older than last 4-5 hours. screenshots attached ,for more details of configured YAML and Kibana UI actual result. Thanks, Sanjay
Created attachment 1732363 [details] app log last time-2d unit
Created attachment 1732364 [details] app log current time-2d unit
Created attachment 1732365 [details] infra log last time-3h unit
Created attachment 1732369 [details] infra log current time-3h unit
Created attachment 1732370 [details] audit log last time-3h unit
Created attachment 1732371 [details] audit log current time-3h unit
Created attachment 1732373 [details] updated-1-logging-yaml
We are expecting this behavior also to occur on x86. Could someone please check this on x86? Thank you
@sabeher2.com Could you please elaborate how you manage to get the audit logs on our managed elasticsearch instance? As per default we don't support storing audit logs in our default managed Elasticsearch store. In addition, can you share the fluentd config map: oc -n openshift-logging get configmap fluentd -o yaml > fluentd.yaml
(In reply to Periklis Tsirakidis from comment #17) > @sabeher2.com > > Could you please elaborate how you manage to get the audit logs on our > managed elasticsearch instance? As per default we don't support storing > audit logs in our default managed Elasticsearch store. > > In addition, can you share the fluentd config map: > > oc -n openshift-logging get configmap fluentd -o yaml > fluentd.yaml @periklis Hi, As default audit logs are not stored in the internal Elasticsearch instance, we have used the Log Forward API(kind: ClusterLogForwarder) to forward audit logs along with(app/infra)logs to the internal Elasticsearch instance(eg. outputRefs:default). I have attached the copy of configured yaml file for reference. RH referenced doc(Forward audit logs to the log store)- https://docs.openshift.com/container-platform/4.6/logging/config/cluster-logging-log-store.html NOTE: regarding- fluentd config map output. currently due to some technical issue, unable to access existing cluster , team is working on to allocate new VMs to setup cluster. once the new cluster is Ready then will share the - fluentd config map output. Thanks.
Created attachment 1740260 [details] audit log forwarding yaml
The retention policy is a mechanism to expose part of the ES rollover api to assist in maintaining indices and stability for the cluster. The retention policy is an input for the rollover conditions and does not directly relate to which documents will be retained. New indicies are created when any of the rollover conditions [1] are satisfied which means there is not necessarily a uniform balance; some indices may be larger or smaller strictly because there are more documents or the documents are larger. Removal of the indices is based upon the creation data of the the index as compared to retention policy, not the age of any of its documents. This means there are possibly many documents removed that do not explicitly meet the retention policy. The source of the logs is in material to how they are curated. The jobs are all the same with the exception of the indices upon which they act. > audit - 3hours - (4-5)hours // NOT OK- expected 3hrs , but its deleting logs older than last 4-5 hours. This observation is not clear to me. It reads like its deleting indices older then 3 hours which is what its configured to do. IMO, there is not a bug here to be fixed. [1] https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-rollover-index.html
As per [1] not a bug to fix here: [1] https://bugzilla.redhat.com/show_bug.cgi?id=1897482#c20