Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1344142

Summary: dynflow_executor FD leak and segfault after stopping qpidd
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: QpidAssignee: Mike Cressman <mcressma>
Status: CLOSED ERRATA QA Contact: Jan Hutaƙ <jhutar>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1.8CC: aruzicka, egolov, ehelms, mcressma, mlinden, zhunting, zkraus
Target Milestone: ReleasedKeywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-1.36.0-24.el7amq, qpid-dispatch-1.5.0-4.el7, qpid-proton-0.26.0-3.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-14 19:58:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace when connecting to stopped service without SSL
none
strace when connecting to stopped service with SSL
none
strace when connecting to running service with SSL none

Description Pavel Moravec 2016-06-08 21:04:30 UTC
Description of problem:
Having stopped qpidd service (where dynflow_executor connects to port 5671 due to ListenOnCandlepinEvents task), dynflow_executor exhibits FD leak that ends up (after approx. 90 minutes) by segfault (quite probably due to ulimit preventing new FD to allocate).

The FD leak is due to TCP socket opened towards port 5671. It is not known to me if the bug is in dynflow_executor, ruby or (quite probably) in ruby qpid library (I can help with reproducer here with some advice).

The segfault backtrace suggests the process was trying to open SSL connection towards qpid (see below for bt). Full abrt report to be provided.

Gladly, something(dynflow_executor_monitor?) automatically restarts the segfaulted process - until it segfaults again after next 90minutes.


Version-Release number of selected component (if applicable):
Sat6.1.9:
ruby193-rubygem-dynflow-0.7.7.9-1.el7sat.noarch
ruby193-rubygem-qpid_messaging-0.30.0-1.el7sat.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Have Satellite fully running.

2. get lsof of dynflow_executor process:
pid=$(ps aux | grep dynflow_executor$ | awk '{ print $2 }')
lsof -p $pid | grep TCP | nl

3. Stop qpidd service:
service qpidd stop

4. get lsof again after a while

5. wait 90 minutes for coredump (optional, evident from the FD leak)


Actual results:
2. shows 2 TCP file descriptors) (one for the TCP connection, one "protocol: TCP" that I suspect to be the leak already)

4. shows increasing number of the "protocol: TCP" FDs


Expected results:
4. to show max. one TCP file descriptor


Additional info:
restarting qpidd after a while, the FD leak does not disappear - just no more FD leak to happen.

segfault backtrace:
#0  0x00007f0e76d145f7 in raise () from /lib64/libc.so.6
#1  0x00007f0e76d15ce8 in abort () from /lib64/libc.so.6
#2  0x00007f0e77c5bed5 in rb_bug () from /opt/rh/ruby193/root/usr/lib64/libruby.so.1.9
#3  0x00007f0e77cfbf26 in sigsegv () from /opt/rh/ruby193/root/usr/lib64/libruby.so.1.9
#4  <signal handler called>
#5  0x00007f0e670617d0 in PR_GetIdentitiesLayer () from /lib64/libnspr4.so
#6  0x00007f0e67061801 in PR_PushIOLayer () from /lib64/libnspr4.so
#7  0x00007f0e67e3c847 in ssl_PushIOLayer.constprop.5 () from /lib64/libssl3.so
#8  0x00007f0e67e3d164 in ssl_ImportFD () from /lib64/libssl3.so
#9  0x00007f0e6928ab2b in qpid::sys::ssl::SslSocket::SslSocket(std::string const&, bool) () from /lib64/libqpidcommon.so.10
#10 0x00007f0e68ea03c6 in qpid::client::SslConnector::SslConnector(boost::shared_ptr<qpid::sys::Poller>, qpid::framing::ProtocolVersion, qpid::client::ConnectionSettings const&, qpid::client::ConnectionImpl*) () from /lib64/libqpidclient.so.10
#11 0x00007f0e68ea0b4f in qpid::client::(anonymous namespace)::create(boost::shared_ptr<qpid::sys::Poller>, qpid::framing::ProtocolVersion, qpid::client::ConnectionSettings const&, qpid::client::ConnectionImpl*) () from /lib64/libqpidclient.so.10
#12 0x00007f0e68ebd1ea in qpid::client::Connector::create(std::string const&, boost::shared_ptr<qpid::sys::Poller>, qpid::framing::ProtocolVersion, qpid::client::ConnectionSettings const&, qpid::client::ConnectionImpl*) () from /lib64/libqpidclient.so.10
#13 0x00007f0e68eb788d in qpid::client::ConnectionImpl::open() () from /lib64/libqpidclient.so.10
#14 0x00007f0e68eae47b in qpid::client::Connection::open(qpid::client::ConnectionSettings const&) () from /lib64/libqpidclient.so.10
#15 0x00007f0e68eaeb19 in qpid::client::Connection::open(qpid::Url const&, qpid::client::ConnectionSettings const&) ()
   from /lib64/libqpidclient.so.10
#16 0x00007f0e6961ff9d in qpid::client::amqp0_10::ConnectionImpl::tryConnect() () from /lib64/libqpidmessaging.so.3
#17 0x00007f0e69621564 in qpid::client::amqp0_10::ConnectionImpl::connect(qpid::sys::AbsTime const&) ()
   from /lib64/libqpidmessaging.so.3
#18 0x00007f0e69622a53 in qpid::client::amqp0_10::ConnectionImpl::open() () from /lib64/libqpidmessaging.so.3
#19 0x00007f0e69af9b8a in _wrap_Connection_open(int, unsigned long*, unsigned long) ()
   from /opt/rh/ruby193/root/usr/share/gems/gems/qpid_messaging-0.30.0/lib/cqpid.so
#20 0x00007f0e77d52033 in vm_call_method () from /opt/rh/ruby193/root/usr/lib64/libruby.so.1.9
..

Comment 3 Bryan Kearney 2016-07-08 20:41:57 UTC
Per 6.3 planning, moving out non acked bugs to the backlog

Comment 5 Pavel Moravec 2018-02-23 12:38:48 UTC
Confirmed the bug is in 6.3 GA as well.

Comment 6 Adam Ruzicka 2018-08-16 09:50:57 UTC
Easy reproducer:

1) Stop the qpidd service
2) In one terminal run irb and enter the following

require 'qpid_messaging'
puts "My PID is #{Process.pid}"

3) In other terminal keep checking "lsof -p $PID | grep TCP", where PID is the pid obtained in step 1
4) In first terminal run

connection = Qpid::Messaging::Connection.new(:url => 'localhost:5671', :options => { :transport => 'ssl' })

5) In the first terminal run "connection.open", for each run a new line should appear in the lsof in second terminal

Note: If the :options => part is omitted when creating the issue, the bug won't get triggered.

Comment 7 Adam Ruzicka 2018-08-16 09:51:42 UTC
Created attachment 1476392 [details]
strace when connecting to stopped service without SSL

Comment 8 Adam Ruzicka 2018-08-16 09:52:10 UTC
Created attachment 1476394 [details]
strace when connecting to stopped service with SSL

Comment 9 Adam Ruzicka 2018-08-16 09:58:58 UTC
Created attachment 1476398 [details]
strace when connecting to running service with SSL

Comment 10 Adam Ruzicka 2018-08-16 10:16:07 UTC
Confirmed in 6.4 as well.

From the attached straces we can see that if SSL is enabled, one FD is leaked per connection, even if it succeeds. We can also see that the bug isn't triggered if SSL is not used.

I'm quite convinced this is something what is being done inside the qpidd c++ bindings.

Comment 11 Adam Ruzicka 2018-08-16 10:53:45 UTC
According to Apache JIRA this bug was fixed in qpid-cpp-1.37.0, whereas in satellite repos we ship 1.36.0. Moving to MODIFIED until 1.37 reaches our repos.

Comment 17 Pavel Moravec 2019-02-20 21:43:35 UTC
I can verify that updated packages:

qpid-cpp: 1.36.0-21.el7
qpid-proton: 0.16.0-13.el7sat

fixes this bug, both on Sat6.3 and Sat6.4.

Comment 25 Bryan Kearney 2019-05-14 19:58:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222