Bug 1344142
| Summary: | dynflow_executor FD leak and segfault after stopping qpidd | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | ||||||||
| Component: | Qpid | Assignee: | Mike Cressman <mcressma> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Jan HutaĆ <jhutar> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 6.1.8 | CC: | aruzicka, egolov, ehelms, mcressma, mlinden, zhunting, zkraus | ||||||||
| Target Milestone: | Released | Keywords: | Triaged | ||||||||
| Target Release: | Unused | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | qpid-cpp-1.36.0-24.el7amq, qpid-dispatch-1.5.0-4.el7, qpid-proton-0.26.0-3.el7 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2019-05-14 19:58:03 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Per 6.3 planning, moving out non acked bugs to the backlog Confirmed the bug is in 6.3 GA as well. Easy reproducer:
1) Stop the qpidd service
2) In one terminal run irb and enter the following
require 'qpid_messaging'
puts "My PID is #{Process.pid}"
3) In other terminal keep checking "lsof -p $PID | grep TCP", where PID is the pid obtained in step 1
4) In first terminal run
connection = Qpid::Messaging::Connection.new(:url => 'localhost:5671', :options => { :transport => 'ssl' })
5) In the first terminal run "connection.open", for each run a new line should appear in the lsof in second terminal
Note: If the :options => part is omitted when creating the issue, the bug won't get triggered.
Created attachment 1476392 [details]
strace when connecting to stopped service without SSL
Created attachment 1476394 [details]
strace when connecting to stopped service with SSL
Created attachment 1476398 [details]
strace when connecting to running service with SSL
Confirmed in 6.4 as well. From the attached straces we can see that if SSL is enabled, one FD is leaked per connection, even if it succeeds. We can also see that the bug isn't triggered if SSL is not used. I'm quite convinced this is something what is being done inside the qpidd c++ bindings. According to Apache JIRA this bug was fixed in qpid-cpp-1.37.0, whereas in satellite repos we ship 1.36.0. Moving to MODIFIED until 1.37 reaches our repos. I can verify that updated packages: qpid-cpp: 1.36.0-21.el7 qpid-proton: 0.16.0-13.el7sat fixes this bug, both on Sat6.3 and Sat6.4. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222 |
Description of problem: Having stopped qpidd service (where dynflow_executor connects to port 5671 due to ListenOnCandlepinEvents task), dynflow_executor exhibits FD leak that ends up (after approx. 90 minutes) by segfault (quite probably due to ulimit preventing new FD to allocate). The FD leak is due to TCP socket opened towards port 5671. It is not known to me if the bug is in dynflow_executor, ruby or (quite probably) in ruby qpid library (I can help with reproducer here with some advice). The segfault backtrace suggests the process was trying to open SSL connection towards qpid (see below for bt). Full abrt report to be provided. Gladly, something(dynflow_executor_monitor?) automatically restarts the segfaulted process - until it segfaults again after next 90minutes. Version-Release number of selected component (if applicable): Sat6.1.9: ruby193-rubygem-dynflow-0.7.7.9-1.el7sat.noarch ruby193-rubygem-qpid_messaging-0.30.0-1.el7sat.x86_64 How reproducible: 100% Steps to Reproduce: 1. Have Satellite fully running. 2. get lsof of dynflow_executor process: pid=$(ps aux | grep dynflow_executor$ | awk '{ print $2 }') lsof -p $pid | grep TCP | nl 3. Stop qpidd service: service qpidd stop 4. get lsof again after a while 5. wait 90 minutes for coredump (optional, evident from the FD leak) Actual results: 2. shows 2 TCP file descriptors) (one for the TCP connection, one "protocol: TCP" that I suspect to be the leak already) 4. shows increasing number of the "protocol: TCP" FDs Expected results: 4. to show max. one TCP file descriptor Additional info: restarting qpidd after a while, the FD leak does not disappear - just no more FD leak to happen. segfault backtrace: #0 0x00007f0e76d145f7 in raise () from /lib64/libc.so.6 #1 0x00007f0e76d15ce8 in abort () from /lib64/libc.so.6 #2 0x00007f0e77c5bed5 in rb_bug () from /opt/rh/ruby193/root/usr/lib64/libruby.so.1.9 #3 0x00007f0e77cfbf26 in sigsegv () from /opt/rh/ruby193/root/usr/lib64/libruby.so.1.9 #4 <signal handler called> #5 0x00007f0e670617d0 in PR_GetIdentitiesLayer () from /lib64/libnspr4.so #6 0x00007f0e67061801 in PR_PushIOLayer () from /lib64/libnspr4.so #7 0x00007f0e67e3c847 in ssl_PushIOLayer.constprop.5 () from /lib64/libssl3.so #8 0x00007f0e67e3d164 in ssl_ImportFD () from /lib64/libssl3.so #9 0x00007f0e6928ab2b in qpid::sys::ssl::SslSocket::SslSocket(std::string const&, bool) () from /lib64/libqpidcommon.so.10 #10 0x00007f0e68ea03c6 in qpid::client::SslConnector::SslConnector(boost::shared_ptr<qpid::sys::Poller>, qpid::framing::ProtocolVersion, qpid::client::ConnectionSettings const&, qpid::client::ConnectionImpl*) () from /lib64/libqpidclient.so.10 #11 0x00007f0e68ea0b4f in qpid::client::(anonymous namespace)::create(boost::shared_ptr<qpid::sys::Poller>, qpid::framing::ProtocolVersion, qpid::client::ConnectionSettings const&, qpid::client::ConnectionImpl*) () from /lib64/libqpidclient.so.10 #12 0x00007f0e68ebd1ea in qpid::client::Connector::create(std::string const&, boost::shared_ptr<qpid::sys::Poller>, qpid::framing::ProtocolVersion, qpid::client::ConnectionSettings const&, qpid::client::ConnectionImpl*) () from /lib64/libqpidclient.so.10 #13 0x00007f0e68eb788d in qpid::client::ConnectionImpl::open() () from /lib64/libqpidclient.so.10 #14 0x00007f0e68eae47b in qpid::client::Connection::open(qpid::client::ConnectionSettings const&) () from /lib64/libqpidclient.so.10 #15 0x00007f0e68eaeb19 in qpid::client::Connection::open(qpid::Url const&, qpid::client::ConnectionSettings const&) () from /lib64/libqpidclient.so.10 #16 0x00007f0e6961ff9d in qpid::client::amqp0_10::ConnectionImpl::tryConnect() () from /lib64/libqpidmessaging.so.3 #17 0x00007f0e69621564 in qpid::client::amqp0_10::ConnectionImpl::connect(qpid::sys::AbsTime const&) () from /lib64/libqpidmessaging.so.3 #18 0x00007f0e69622a53 in qpid::client::amqp0_10::ConnectionImpl::open() () from /lib64/libqpidmessaging.so.3 #19 0x00007f0e69af9b8a in _wrap_Connection_open(int, unsigned long*, unsigned long) () from /opt/rh/ruby193/root/usr/share/gems/gems/qpid_messaging-0.30.0/lib/cqpid.so #20 0x00007f0e77d52033 in vm_call_method () from /opt/rh/ruby193/root/usr/lib64/libruby.so.1.9 ..