| Summary: | grep -f uses lot of memory | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jens Petersen <petersen> |
| Component: | grep | Assignee: | Jaroslav Škarvada <jskarvad> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | jskarvad, lkundrak |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-10-09 09:38:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
For your use case, you should rather use: $ grep -Ff f19-updates f19 Compiling and running cca. 112kB regex which includes wildcards can be really expensive. Ah yes good point. Okay - dunno if there is anything more that can be done to improve the efficiency - I suppose I was wondering why grep keeps it all in memory but that is surely something for upstream. Likely this can be closed.. (In reply to Jens Petersen from comment #2) > Ah yes good point. Okay - dunno if there is anything more that can be done > to improve the efficiency - I suppose I was wondering why grep keeps it all > in memory but that is surely something for upstream. Likely this can be > closed.. I guess it is not worth to partially read the data from the disk and/or go through the multiple passes - there could be big performance penalty. Also if you need the regex match, you can escape the 'dots' in your pattern file to significantly reduce the space of the problem: $ sed -i 's/\./\\\./g' f19-updates Then it required less than 2 GB of resident memory on my box. Closing as notabug according to previous comments. |
Description of problem: grep -f seems to use many gigs of memory for a large input fail. The resources used seem far greater than the size of the inputs. Steps to Reproduce: $ koji list-tagged f19 | tail -n +3 | awk '{print $1}' > f19 $ koji list-tagged f19-updates | tail -n +3 | awk '{print $1}' > f19-updates $ wc -l f19 f19-updates 13606 371784 f19.tagged 4176 110596 f19-updates.tagged $ grep -f f19-updates f19 Actual results: Uses many gigs of ram: I killed the process as it reached 10GB... Expected results: Memory usage to be more constant in space and time.