Bug 1013161

Summary: improve logconv.pl performance with large access logs
Product: Red Hat Enterprise Linux 6 Reporter: Rich Megginson <rmeggins>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Sankar Ramalingam <sramling>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: jgalipea, nhosoi, nkinder, srkrishn
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.15-34.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1013894 (view as bug list) Environment:
Last Closed: 2014-10-14 07:51:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1013894, 1061410    

Description Rich Megginson 2013-09-27 23:34:51 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47387

Analysis of large access logs needs to be much faster.  Some areas for improvement:
* use db files for temp files
Specifically, use tied hashes, where the hashes are tied to database files, using the perl DB_File interface.
 * for simple arrays, use DB_RECNO
 * for hashes where order is not important, use DB_HASH
 * for hashes where order is important, use DB_BTREE
for example:
{{{
my %h1;
tie %h1, "DB_File", "$dbdir/h1.db", O_CREAT|O_RDWR, 0666, $DB_BTREE;
$h1{'e'} = 5;
$h1{'d'} = 4;
$h1{'c'} = 3;
$h1{'b'} = 2;
$h1{'a'} = 1;
while (my($k,$v) = each %h1) {
    print "$k = $v\n";
}
}}}
this prints
{{{
a = 1
b = 2
c = 3
d = 4
e = 5
}}}

* not sure what else - perhaps optimize regular expressions?

For CBP

Comment 3 srkrishn@redhat.com 2014-08-20 12:57:41 UTC
This bug has been verified as shown below:

Total Log Lines Analysed:  8103117


----------- Access Log Output ------------

Start of Logs:    31/Jan/2012:00:00:16
End of Logs:      01/Feb/2012:00:00:45

real	5m52.450s
user	5m50.612s
sys	0m0.837s


[root@hp-dl360g4-01 ~]# logconv.pl  access.20120131-000045 | free -m
             total       used       free     shared    buffers     cached
Mem:          1876       1797         78          0        108       1373
-/+ buffers/cache:        315       1560
Swap:         4031          0       4031
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3239 root      20   0  142m  16m 2720 R 99.9  0.4   0:04.34 perl           
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3239 root      20   0  142m  16m 2720 R 99.9  0.9   0:05.34 perl          

this was tested on build 1.2.11.15.40

Comment 4 Sankar Ramalingam 2014-08-21 09:07:23 UTC
Based on previous comment from Sriram, marking the bug as Verified.

Comment 5 errata-xmlrpc 2014-10-14 07:51:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1385.html