Bug 1454743

Summary: foreman-proxy memory leak when processing OpenScap report
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: SCAP PluginAssignee: Ondřej Pražák <oprazak>
Status: CLOSED ERRATA QA Contact: Sanket Jagtap <sjagtap>
Severity: high Docs Contact:
Priority: medium    
Version: 6.2.10CC: adprice, bbuckingham, bkearney, dcaplan, dmitri, egolov, ehelms, jcallaha, lzap, mhulan, mverma, omankame, oprazak, pmoravec, rchauhan, sjagtap, sjr, szadok, xdmoon
Target Milestone: UnspecifiedKeywords: PrioBumpPM, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: smart_proxy_openscap-0.5.3.9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1530700 (view as bug list) Environment:
Last Closed: 2018-02-05 13:54:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
objects in memory
none
diagnostic & cleanup-attempt patch for foreman-proxy
none
task-exports rom failed hotfix system none

Description Pavel Moravec 2017-05-23 12:13:54 UTC
Description of problem:
Even with the hotfix from bz1432263, tt is assumed that when a SCAP report contains nonzero "othered" counter, processing such report triggers memory leak in smart-proxy process / foreman-proxy.

No particular reproducer known ATM, but the above statement follows from:
- comparison of "ps aux" snapshots and foreman-ssl_access_ssl.log activities at the times when smart-proxy RSS usage grew
- ruling out proxy-unrelated events
- ruling out events that were present many times when no memory growth was noticed
- just arf_report remained as the only suspected
- identifying what particular SCAP report parameter is present in the "suspicious" reports but not present at other reports (when no mem.increase was seen)

I can provide the "ps aux" snapshots, together with whole /var/log content, if required.


Version-Release number of selected component (if applicable):
rubygem-smart_proxy_openscap-0.5.3.6-2.RHBZ1432263.el7sat.noarch.rpm


How reproducible:
???


Steps to Reproduce:
1. I guess it should be enough to generate a report with "othered" counter nonzero.
2. Meantime, monitor RSS usage of smart-proxy process.


Actual results:
RSS grows by each and every such report processed.


Expected results:
No RSS growth.


Additional info:
RSS growth is very evident - by 400MB to 600MB per one report (sic!).

Comment 1 Pavel Moravec 2017-05-24 08:38:55 UTC
I cant reproduce it by simple sending a report with "othered" counter > 0.

Checking what has been fixed in bz1432263, and being a ruby newbie, shouldn't the same cleanup be applied to:

https://github.com/theforeman/smart_proxy_openscap/blob/master/lib/smart_proxy_openscap/storage_fs.rb#L31

?

If some of the "Proxy::OpenSCAP::StorageFS.new(..)" calls in openscap_api.rb raises exception, shall not we cleanup OpenSCAP ?

Comment 3 Bryan Kearney 2017-05-30 17:52:18 UTC
Pavel, you can or can not reproduce it (reading comment 1)?

Comment 6 Pavel Moravec 2017-05-30 20:17:15 UTC
See the HUGE length of one arf report (Parameters: .. line having 270k - 290k characters each (!!)) - that brings 2 ideas:

- can OpenScap properly handle such huge reports? (well most often yes, and the "problematic" aren't the biggest)
- is it necessary to send by client the descriptions of each and every scap log? the server already has those details.. - this can be hugely improved/simplified with positive impact to memory

Comment 7 Dmitri Dolguikh 2017-05-31 11:00:25 UTC
I think Pavel is correct that calls to OpenSCAP.oscap_init should be followed with calls to OpenSCAP.oscap_cleanup which would deallocate memory and clean up global state.

Another thing that looks like a problem to me is that OpenSCAP.oscap_init makes a call to xmlInitParser() which, according to libxml2 docs is not reentrant. Smart-proxy is multi-threaded, and concurrent calls handled by OpenSCAP module can lead to memory leaks or undefined behavior. To correct this, OpenSCAP module initialization should be changed so that oscap_init is called once during module initialization and oscap_cleanup is called during shutdown.

Comment 8 Pavel Moravec 2017-06-01 20:40:03 UTC
Maybe stupid question but in 

https://github.com/theforeman/smart_proxy_openscap/pull/47/files#diff-fe7a79abfb039a8b67550ad94edcfd93

shall not the new cleanup method also call

@items.destroy if @items

analogously to the others variables declared/used ?

Comment 9 Ondřej Pražák 2017-06-02 08:36:09 UTC
Created attachment 1284363 [details]
objects in memory

We definitely need to clean up the @items somehow as they remain in the memory (Rule [1] and Group [2] inherit from Item). The problem is, when I try @items.map(&:destroy), I run into a segfault. I reached out to the guy who works on openscap and wrote the bindings, I'm sure he'll be able to tell if there is a problem in underlying layers or if I'm just using it wrong.

[1] https://github.com/OpenSCAP/ruby-openscap/blob/master/lib/openscap/xccdf/rule.rb 

[2] https://github.com/OpenSCAP/ruby-openscap/blob/master/lib/openscap/xccdf/group.rb

Comment 10 Pavel Moravec 2017-06-04 15:15:58 UTC
Created attachment 1284748 [details]
diagnostic & cleanup-attempt patch for foreman-proxy

FYI this diagnostic and attempt-to-cleanup-memory patch was provided to the customer.

Tried to cleanup @items and also @results - only those items that are of given data type (cleaning all really segfaults, despite there is no other type than those I clean up - interesting..).

Comment 12 Pavel Moravec 2017-06-18 14:14:18 UTC
Ondra, could you pls. have a look on the reproducer in #c11?

Comment 13 Ondřej Pražák 2017-06-22 06:42:58 UTC
This is very useful, the memory usage on proxy jumped form 0.3% to 12.8% on the first report.

Comment 14 Ondřej Pražák 2017-06-29 11:57:36 UTC
Created redmine issue http://projects.theforeman.org/issues/20164 from this bug

Comment 15 Satellite Program 2017-06-29 14:11:47 UTC
Upstream bug assigned to oprazak

Comment 16 Satellite Program 2017-06-29 14:11:53 UTC
Upstream bug assigned to oprazak

Comment 17 Satellite Program 2017-07-20 12:11:27 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20164 has been resolved.

Comment 25 Lukas Zapletal 2017-09-05 09:24:16 UTC
*** Bug 1487257 has been marked as a duplicate of this bug. ***

Comment 26 Lukas Zapletal 2017-09-05 11:43:19 UTC
SATELLITE 6.2.11 HOTFIX INSTRUCTIONS:

On all Satellite Server and Capsules where rubygem-smart_proxy_openscap is installed perform an upgrade of the package:

yum upgrade http://people.redhat.com/~lzapleta/hotfix/openscap-memory-1454743/el7/rubygem-smart_proxy_openscap-0.5.3.9-1.HOTFIXRHBZ1454743.el7sat.noarch.rpm

If yum requests upgrade of -doc subpackage or using RHEL6 these can be found at: http://people.redhat.com/~lzapleta/hotfix/openscap-memory-1454743/

Restart foreman-proxy process, now every OpenSCAP report request, new process is spawned doing the XML parsing and foreman-proxy ruby process memory consumption is now back to normal.

This hotfix may work with any Satellite 6.2 version which has rubygem-smart_proxy_openscap-0.5.3.8 package version as it upgrades this package to version 0.5.3.9.

Comment 27 maneesh verma 2017-10-04 10:26:25 UTC
Created attachment 1334173 [details]
task-exports rom failed hotfix system

Comment 29 Sanket Jagtap 2017-10-27 07:42:36 UTC
Satellite 6.3.0 snap 21

USER        PID %CPU %MEM    VSZ   RSS TTY	STAT START   TIME COMMAND
foreman+  91962  0.0  0.2 895584 85652 ?        Sl   Oct22   0:18 ruby /usr/share/foreman-proxy/bin/smart-proxy


USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
foreman+  91962  0.0  0.2 895584 85652 ?        Sl   Oct22   0:18 ruby /usr/share/foreman-proxy/bin/smart-proxy
[root@dell-per630-fc-01 ~]# ps -p 91962 -u
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
foreman+  91962  0.0  0.2 895584 85652 ?        Sl   Oct22   0:18 ruby /usr/share/foreman-proxy/bin/smart-proxy
[root@dell-per630-fc-01 ~]# ps -p 91962 -u
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
foreman+  91962  0.0  0.2 895584 85652 ?        Sl   Oct22   0:18 ruby /usr/share/foreman-proxy/bin/smart-proxy
[root@dell-per630-fc-01 ~]# ps -p 91962 -u
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
foreman+  91962  0.0  0.2 895584 85652 ?        Sl   Oct22   0:18 ruby /usr/share/foreman-proxy/bin/smart-proxy
[root@dell-per630-fc-01 ~]# ps -p 91962 -u
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
foreman+  91962  0.0  0.2 895584 85652 ?        Sl   Oct22   0:18 ruby /usr/share/foreman-proxy/bin/smart-proxy



The scap report is with othered counter > 0 is sent to satellite every min 

There is no exponential growth in the RSS Usage

Marking it verified

Comment 31 Lukas Zapletal 2018-01-11 12:41:13 UTC
I can confirm that hotfix from comment 26 will work for Satellite 6.2.11 to 6.2.13 including. Just make sure the package named:

rubygem-smart_proxy_openscap-0.5.3.9-1.HOTFIXRHBZ1454743.el7sat.noarch.rpm

is not changed, yum reinstall it if there is different version.

Comment 36 Sanket Jagtap 2018-01-15 13:55:09 UTC
Build: Satellite 6.2.14 snap1

Running satellite for 3 days with two clients sending reports every min 
With approx >11k reports
[root@ibm-x3650m4-02-vm-01 ~]#  ps -q 31504 -ux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
foreman+ 31504  0.1  0.1 679764 64692 ?        Sl   05:16   0:17 ruby /usr/share/foreman-proxy/bin/smart-proxy

I monitored the usage for ~1hr , but there is no major increase in RSS mem consumption
The two clients are sending report every min to satellite

Comment 39 errata-xmlrpc 2018-02-05 13:54:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0273