Bug 856009

Summary: audit_log table is not cleaned properly, causing rhevm database to grow excessively
Product: Red Hat Enterprise Virtualization Manager Reporter: Marina Kalinin <mkalinin>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.0.7CC: acathrow, chetan, cpelland, dyasny, emesika, hateya, iheim, jwest, lpeer, oramraz, pablo.iranzo, perobins, Rhev-m-bugs, sgordon, tvvcox, yeylon, ykaul, yzaslavs
Target Milestone: ---Keywords: Regression, ZStream
Target Release: 3.0.8   
Hardware: All   
OS: Linux   
Whiteboard: infra
Fixed In Version: si19 Doc Type: Bug Fix
Doc Text:
Previously, the DeleteAuditLogOlderThanDate command only removed audit logs when the "processed=TRUE" condition was set, causing the database to grow excessively. With this update, "processed=TRUE" is no longer a condition to remove logs. DeleteAuditLogOlderThanDate now removes logs on the periodic time it is configured to do so, which is 30 days by default.
Story Points: ---
Clone Of:
: 859398 (view as bug list) Environment:
Last Closed: 2012-12-04 20:06:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 859398    

Description Marina Kalinin 2012-09-10 23:32:39 UTC
Audit_log table should not hold messages older then 30 days.
Based on default value of the following config field:
>> AuditLogAgingThreashold: "Audit Log Aging Threshold (in days)" (Value
Type: Integer)

In fact, the table is not cleaned.
Running the following sql, we can see the amount of records dated 3 month ago:

~~~
rhevm=# select count(*) from audit_log where log_time <='2012-6-10';
 count 
-------
  2708
(1 row)
~~~

Comment 2 Eli Mesika 2012-09-12 16:12:27 UTC
(In reply to comment #0)
> Audit_log table should not hold messages older then 30 days.
> Based on default value of the following config field:
> >> AuditLogAgingThreashold: "Audit Log Aging Threshold (in days)" (Value
> Type: Integer)
> 
> In fact, the table is not cleaned.
> Running the following sql, we can see the amount of records dated 3 month
> ago:
> 
> ~~~
> rhevm=# select count(*) from audit_log where log_time <='2012-6-10';
>  count 
> -------
>   2708
> (1 row)
> ~~~

Please note that events that are logged in event_notification_hist table will not be removed from the audit_log table even after 30 days since all the event_notification_hist has only the audit_log_id that points to the relevant entry in audit_log
therefor, please see if the run again your query with the following modifications to check that :

select count(*) from audit_log where log_time <='2012-6-10' and audit_log_id not in (select audit_log_id from event_notification_hist);

Comment 3 Marina Kalinin 2012-09-12 16:22:56 UTC
Eli, 
1. Could you please explain the business logic behind this?

2. Does it mean that we are using the implementation of removeAllBeforeDate from 
AuditLogDAOHibernateImpl.java:

public void removeAllBeforeDate(Date cutoff) {
Query query = getSession().createQuery("delete from AuditLog where logTime < :cutoff " +
 "and processed = true " +
 "and id not in (select auditLogId from event_notification_hist)");

Comment 4 Eli Mesika 2012-09-12 20:24:10 UTC
(In reply to comment #3)
> Eli, 
> 1. Could you please explain the business logic behind this?
Yes
We have the rhevm-notification tool that is used to send  selected events by email to subscribed users.
Now, the event_notification_hist table is used to save all sent notifications and has only (in addition to its own info) a pointer to the relevant entry in the audit_log table.
Now, the audit_log_id in event_notification_hist is FK of audit_log and that means that you will not be able to remove the entry in the audit_log without removing the corresponding entry in event_notification_hist
Maybe we need a cleanup mechanism of event_notification_hist as well , but this will be a RFE

> 
> 2. Does it mean that we are using the implementation of removeAllBeforeDate
> from 
> AuditLogDAOHibernateImpl.java:
No, we are calling removeAllBeforeDate from AuditLogCleanupManager: onTimer

> 
> public void removeAllBeforeDate(Date cutoff) {
> Query query = getSession().createQuery("delete from AuditLog where logTime <
> :cutoff " +
>  "and processed = true " +
>  "and id not in (select auditLogId from event_notification_hist)");

BTW , does the query I had modifiedreturned 0 records on your DB ?

Comment 5 Eli Mesika 2012-09-12 21:37:02 UTC
After investigation , problem was that the DeleteAuditLogOlderThenDate removed only records with processed=true.
this flag is used by rhev-notification tool to mark which events were handled (by sending events to subscribers) , so , the default of this flag is false and after rhev-notification tool handles it , it is set to true

Sine the DeleteAuditLogOlderThenDate removes old data (more than 30 days by default) there is no need to relate to this flag and it should be removed from the SP condition.

Comment 6 Eli Mesika 2012-09-16 09:03:17 UTC
http://gerrit.ovirt.org/#/c/8016/

Comment 8 Eli Mesika 2012-09-19 11:34:49 UTC
fixed in commit : 6478893

Comment 13 Pavel Stehlik 2012-11-16 11:22:50 UTC
ok - si24.2