Bug 1013894

Summary: improve logconv.pl performance with large access logs
Product: Red Hat Enterprise Linux 7 Reporter: Nathan Kinder <nkinder>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED CURRENTRELEASE QA Contact: Sankar Ramalingam <sramling>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: jgalipea, nkinder, rmeggins, srkrishn
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.1.6-5.el7 Doc Type: Bug Fix
Doc Text:
Cause: logconv.pl used flat files for temporary storage when analyzing large logs. The files would have to be scanned many times per value. Consequence: Poor logconv.pl performance with very large logs. Fix: Instead of flat files, use Berkeley DB files, via the perl DB_File module and tied hashes and arrays. Result: Much better performance with logconv.pl and large access logs.
Story Points: ---
Clone Of: 1013161 Environment:
Last Closed: 2014-06-13 09:42:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1013161    
Bug Blocks:    

Description Nathan Kinder 2013-10-01 02:14:45 UTC
+++ This bug was initially created as a clone of Bug #1013161 +++

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47387

Analysis of large access logs needs to be much faster.  Some areas for improvement:
* use db files for temp files
Specifically, use tied hashes, where the hashes are tied to database files, using the perl DB_File interface.
 * for simple arrays, use DB_RECNO
 * for hashes where order is not important, use DB_HASH
 * for hashes where order is important, use DB_BTREE
for example:
{{{
my %h1;
tie %h1, "DB_File", "$dbdir/h1.db", O_CREAT|O_RDWR, 0666, $DB_BTREE;
$h1{'e'} = 5;
$h1{'d'} = 4;
$h1{'c'} = 3;
$h1{'b'} = 2;
$h1{'a'} = 1;
while (my($k,$v) = each %h1) {
    print "$k = $v\n";
}
}}}
this prints
{{{
a = 1
b = 2
c = 3
d = 4
e = 5
}}}

* not sure what else - perhaps optimize regular expressions?

Comment 2 srkrishn@redhat.com 2014-01-31 12:05:09 UTC
hi,
i verified the performance with respect to the updated logconv.pl script.i am unable to mark it as verified,hence Sankar will handle it.

Comment 3 Sankar Ramalingam 2014-02-01 02:07:48 UTC
[root@sriram logfile]# logconv.pl -t /var/log/dirsrv/slapd-sriram1/access

Access Log Analyzer 8.0

Command: logconv.pl /var/log/dirsrv/slapd-sriram1/access

Processing 1 Access Log(s)...

[00] /var/log/dirsrv/slapd-sriram1/access	size (bytes):      3069528
     25000 Lines Processed          2075390 of      3069528 bytes (67.613%)


Total Log Lines Analysed:  36936


----------- Access Log Output ------------

Start of Logs:    30/Jan/2014:08:28:57
End of Logs:      30/Jan/2014:08:34:23

Processed Log Time:  0 Hours, 5 Minutes, 26 Seconds


Restarts:                     1
Total Connections:            5277
 - StartTLS Connections:      0
 - LDAPS Connections:         0
 - LDAPI Connections:         0
Peak Concurrent Connections:  2
Total Operations:             10554
Total Results:                10553
Overall Performance:          100.0%

Searches:                     4             (0.01/sec)  (0.74/min)
Modifications:                0             (0.00/sec)  (0.00/min)
Adds:                         5273          (16.17/sec)  (970.49/min)
Deletes:                      0             (0.00/sec)  (0.00/min)
Mod RDNs:                     0             (0.00/sec)  (0.00/min)
Compares:                     0             (0.00/sec)  (0.00/min)
Binds:                        5277          (16.19/sec)  (971.23/min)



----- Top 20 Most Frequent etimes -----

10421           etime=0     
130             etime=1     
2               etime=3     


----- Top 20 Longest etimes -----

etime=3         2         
etime=1         130       
etime=0         10421     

Cleaning up temp files . . .
Done
[root@sriram logfile]# logconv.pl -t /var/log/dirsrv/slapd-sriram1/access | free -m 
             total       used       free     shared    buffers     cached
Mem:          2907       2362        545          0        169       1141
-/+ buffers/cache:       1050       1856
Swap:         9999          0       9999
[root@sriram logfile]#


Build tested - 389-ds-base-1.3.1.6-15

Marking it as verified, as per Sriram's comment.

Comment 4 Sankar Ramalingam 2014-02-13 11:50:35 UTC
Test cases automated in ./logfile/logconv.sh

Comment 5 Ludek Smid 2014-06-13 09:42:58 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.