Bug 1013894 - improve logconv.pl performance with large access logs
Summary: improve logconv.pl performance with large access logs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Rich Megginson
QA Contact: Sankar Ramalingam
URL:
Whiteboard:
Depends On: 1013161
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-01 02:14 UTC by Nathan Kinder
Modified: 2020-09-13 20:35 UTC (History)
4 users (show)

Fixed In Version: 389-ds-base-1.3.1.6-5.el7
Doc Type: Bug Fix
Doc Text:
Cause: logconv.pl used flat files for temporary storage when analyzing large logs. The files would have to be scanned many times per value. Consequence: Poor logconv.pl performance with very large logs. Fix: Instead of flat files, use Berkeley DB files, via the perl DB_File module and tied hashes and arrays. Result: Much better performance with logconv.pl and large access logs.
Clone Of: 1013161
Environment:
Last Closed: 2014-06-13 09:42:58 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 724 0 None None None 2020-09-13 20:35:50 UTC

Description Nathan Kinder 2013-10-01 02:14:45 UTC
+++ This bug was initially created as a clone of Bug #1013161 +++

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47387

Analysis of large access logs needs to be much faster.  Some areas for improvement:
* use db files for temp files
Specifically, use tied hashes, where the hashes are tied to database files, using the perl DB_File interface.
 * for simple arrays, use DB_RECNO
 * for hashes where order is not important, use DB_HASH
 * for hashes where order is important, use DB_BTREE
for example:
{{{
my %h1;
tie %h1, "DB_File", "$dbdir/h1.db", O_CREAT|O_RDWR, 0666, $DB_BTREE;
$h1{'e'} = 5;
$h1{'d'} = 4;
$h1{'c'} = 3;
$h1{'b'} = 2;
$h1{'a'} = 1;
while (my($k,$v) = each %h1) {
    print "$k = $v\n";
}
}}}
this prints
{{{
a = 1
b = 2
c = 3
d = 4
e = 5
}}}

* not sure what else - perhaps optimize regular expressions?

Comment 2 srkrishn@redhat.com 2014-01-31 12:05:09 UTC
hi,
i verified the performance with respect to the updated logconv.pl script.i am unable to mark it as verified,hence Sankar will handle it.

Comment 3 Sankar Ramalingam 2014-02-01 02:07:48 UTC
[root@sriram logfile]# logconv.pl -t /var/log/dirsrv/slapd-sriram1/access

Access Log Analyzer 8.0

Command: logconv.pl /var/log/dirsrv/slapd-sriram1/access

Processing 1 Access Log(s)...

[00] /var/log/dirsrv/slapd-sriram1/access	size (bytes):      3069528
     25000 Lines Processed          2075390 of      3069528 bytes (67.613%)


Total Log Lines Analysed:  36936


----------- Access Log Output ------------

Start of Logs:    30/Jan/2014:08:28:57
End of Logs:      30/Jan/2014:08:34:23

Processed Log Time:  0 Hours, 5 Minutes, 26 Seconds


Restarts:                     1
Total Connections:            5277
 - StartTLS Connections:      0
 - LDAPS Connections:         0
 - LDAPI Connections:         0
Peak Concurrent Connections:  2
Total Operations:             10554
Total Results:                10553
Overall Performance:          100.0%

Searches:                     4             (0.01/sec)  (0.74/min)
Modifications:                0             (0.00/sec)  (0.00/min)
Adds:                         5273          (16.17/sec)  (970.49/min)
Deletes:                      0             (0.00/sec)  (0.00/min)
Mod RDNs:                     0             (0.00/sec)  (0.00/min)
Compares:                     0             (0.00/sec)  (0.00/min)
Binds:                        5277          (16.19/sec)  (971.23/min)



----- Top 20 Most Frequent etimes -----

10421           etime=0     
130             etime=1     
2               etime=3     


----- Top 20 Longest etimes -----

etime=3         2         
etime=1         130       
etime=0         10421     

Cleaning up temp files . . .
Done
[root@sriram logfile]# logconv.pl -t /var/log/dirsrv/slapd-sriram1/access | free -m 
             total       used       free     shared    buffers     cached
Mem:          2907       2362        545          0        169       1141
-/+ buffers/cache:       1050       1856
Swap:         9999          0       9999
[root@sriram logfile]#


Build tested - 389-ds-base-1.3.1.6-15

Marking it as verified, as per Sriram's comment.

Comment 4 Sankar Ramalingam 2014-02-13 11:50:35 UTC
Test cases automated in ./logfile/logconv.sh

Comment 5 Ludek Smid 2014-06-13 09:42:58 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.