Hide Forgot
Description of problem: grep -f seems to use many gigs of memory for a large input fail. The resources used seem far greater than the size of the inputs. Steps to Reproduce: $ koji list-tagged f19 | tail -n +3 | awk '{print $1}' > f19 $ koji list-tagged f19-updates | tail -n +3 | awk '{print $1}' > f19-updates $ wc -l f19 f19-updates 13606 371784 f19.tagged 4176 110596 f19-updates.tagged $ grep -f f19-updates f19 Actual results: Uses many gigs of ram: I killed the process as it reached 10GB... Expected results: Memory usage to be more constant in space and time.
For your use case, you should rather use: $ grep -Ff f19-updates f19 Compiling and running cca. 112kB regex which includes wildcards can be really expensive.
Ah yes good point. Okay - dunno if there is anything more that can be done to improve the efficiency - I suppose I was wondering why grep keeps it all in memory but that is surely something for upstream. Likely this can be closed..
(In reply to Jens Petersen from comment #2) > Ah yes good point. Okay - dunno if there is anything more that can be done > to improve the efficiency - I suppose I was wondering why grep keeps it all > in memory but that is surely something for upstream. Likely this can be > closed.. I guess it is not worth to partially read the data from the disk and/or go through the multiple passes - there could be big performance penalty. Also if you need the regex match, you can escape the 'dots' in your pattern file to significantly reduce the space of the problem: $ sed -i 's/\./\\\./g' f19-updates Then it required less than 2 GB of resident memory on my box. Closing as notabug according to previous comments.