Bug 1013894 - improve logconv.pl performance with large access logs
improve logconv.pl performance with large access logs
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Rich Megginson
Sankar Ramalingam
:
Depends On: 1013161
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-30 22:14 EDT by Nathan Kinder
Modified: 2014-06-17 23:00 EDT (History)
4 users (show)

See Also:
Fixed In Version: 389-ds-base-1.3.1.6-5.el7
Doc Type: Bug Fix
Doc Text:
Cause: logconv.pl used flat files for temporary storage when analyzing large logs. The files would have to be scanned many times per value. Consequence: Poor logconv.pl performance with very large logs. Fix: Instead of flat files, use Berkeley DB files, via the perl DB_File module and tied hashes and arrays. Result: Much better performance with logconv.pl and large access logs.
Story Points: ---
Clone Of: 1013161
Environment:
Last Closed: 2014-06-13 05:42:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nathan Kinder 2013-09-30 22:14:45 EDT
+++ This bug was initially created as a clone of Bug #1013161 +++

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47387

Analysis of large access logs needs to be much faster.  Some areas for improvement:
* use db files for temp files
Specifically, use tied hashes, where the hashes are tied to database files, using the perl DB_File interface.
 * for simple arrays, use DB_RECNO
 * for hashes where order is not important, use DB_HASH
 * for hashes where order is important, use DB_BTREE
for example:
{{{
my %h1;
tie %h1, "DB_File", "$dbdir/h1.db", O_CREAT|O_RDWR, 0666, $DB_BTREE;
$h1{'e'} = 5;
$h1{'d'} = 4;
$h1{'c'} = 3;
$h1{'b'} = 2;
$h1{'a'} = 1;
while (my($k,$v) = each %h1) {
    print "$k = $v\n";
}
}}}
this prints
{{{
a = 1
b = 2
c = 3
d = 4
e = 5
}}}

* not sure what else - perhaps optimize regular expressions?
Comment 2 srkrishn@redhat.com 2014-01-31 07:05:09 EST
hi,
i verified the performance with respect to the updated logconv.pl script.i am unable to mark it as verified,hence Sankar will handle it.
Comment 3 Sankar Ramalingam 2014-01-31 21:07:48 EST
[root@sriram logfile]# logconv.pl -t /var/log/dirsrv/slapd-sriram1/access

Access Log Analyzer 8.0

Command: logconv.pl /var/log/dirsrv/slapd-sriram1/access

Processing 1 Access Log(s)...

[00] /var/log/dirsrv/slapd-sriram1/access	size (bytes):      3069528
     25000 Lines Processed          2075390 of      3069528 bytes (67.613%)


Total Log Lines Analysed:  36936


----------- Access Log Output ------------

Start of Logs:    30/Jan/2014:08:28:57
End of Logs:      30/Jan/2014:08:34:23

Processed Log Time:  0 Hours, 5 Minutes, 26 Seconds


Restarts:                     1
Total Connections:            5277
 - StartTLS Connections:      0
 - LDAPS Connections:         0
 - LDAPI Connections:         0
Peak Concurrent Connections:  2
Total Operations:             10554
Total Results:                10553
Overall Performance:          100.0%

Searches:                     4             (0.01/sec)  (0.74/min)
Modifications:                0             (0.00/sec)  (0.00/min)
Adds:                         5273          (16.17/sec)  (970.49/min)
Deletes:                      0             (0.00/sec)  (0.00/min)
Mod RDNs:                     0             (0.00/sec)  (0.00/min)
Compares:                     0             (0.00/sec)  (0.00/min)
Binds:                        5277          (16.19/sec)  (971.23/min)



----- Top 20 Most Frequent etimes -----

10421           etime=0     
130             etime=1     
2               etime=3     


----- Top 20 Longest etimes -----

etime=3         2         
etime=1         130       
etime=0         10421     

Cleaning up temp files . . .
Done
[root@sriram logfile]# logconv.pl -t /var/log/dirsrv/slapd-sriram1/access | free -m 
             total       used       free     shared    buffers     cached
Mem:          2907       2362        545          0        169       1141
-/+ buffers/cache:       1050       1856
Swap:         9999          0       9999
[root@sriram logfile]#


Build tested - 389-ds-base-1.3.1.6-15

Marking it as verified, as per Sriram's comment.
Comment 4 Sankar Ramalingam 2014-02-13 06:50:35 EST
Test cases automated in ./logfile/logconv.sh
Comment 5 Ludek Smid 2014-06-13 05:42:58 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.