DescriptionShishir Prakash
2014-07-17 17:50:10 UTC
Description of problem:
We have a setup nagios monitoring which logins to rhev and collects the health status. Each login creates a entry in audit_log table in database.
Now the size of this tables reaches to 400MB and the rhev-manager GUI login was very slow . can say almost impossible.
Loking at the log it says "java.lang.OutOfMemoryError: GC overhead limit exceeded " Which does not had any clue that audit_log table may causes the problem.
"2014-07-17 05:17:44,200 ERROR [org.ovirt.engine.core.bll.adbroker.GSSAPIDirContextAuthenticationStrategy] (ajp-/127.0.0.1:8702-20) Kerberos error: java.lang.OutOfMemoryError: GC overhead limit exceeded"
There are few things which i would like to imporve in audit_log.
1) if we are loggin the information in audit log then we also log source IP which can actually give us some clue also.
2) If the audit_log size is huge then a specific alert should generate which can let the admin know that its time for cleanup or we can actually automate the cleanup based on max limit of audit_log table size.
Version-Release number of selected component (if applicable):
3.4
How reproducible:
Hit atleast 1000000 login to reproduce the issue. If the audit_log table size is exceded then you will find the rhev-manager UI will be extremely slow.
shishir-
Added to the previous test (comment 6) a script that does periodically API calls (log-in, do some GET on DC/Cluster/Host and log-out)
Did not succeeded to reproduce the problem with 1000000 records in audit_log table and the script that was running.
I am closing this as CURRENTRELEASE (approved by Oved)