Bug 1264509
Summary: | qdrouterd utilizes 120-150% cpu time | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Stuart Auchterlonie <sauchter> |
Component: | Infrastructure | Assignee: | Ted Ross <tross> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Katello QA List <katello-qa-list> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.1.1 | CC: | bbuckingham, bkearney, bugzilla_rhn, chorn, cwelton, ddevra, pmoravec, tross, tscherf |
Target Milestone: | Unspecified | Keywords: | Triaged |
Target Release: | Unused | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qpid-dispatch-0.4-10 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-10-17 09:01:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stuart Auchterlonie
2015-09-18 16:02:04 UTC
(In reply to Stuart Auchterlonie from comment #0) > > pmoravec has discussed with tross and identified > https://issues.apache.org/jira/browse/DISPATCH-134 > > as a likely fix for this issue The fix for DISPATCH-134 has already been back-ported into the qpid-dispatch-router-0.4-7 packages. This must be a separate issue. Very trivial reproducer: 1. heartbeats in goferd enabled 2. run few capsule sync / package install / whatever, just to generate some traffic on qdrouterd<->goferd connection. 3. Due to bz1264461, qdrouterd is left with several tens of AMQP connections (number depends on # of syncs done, the more the better for reproducer, 20 is enough). 4. On every such connection, qdrouterd sends "heartbeats" (empty AMQP frames I think) every second - none of the heartbeat is responded by goferd as goferd abandoned the connections (but didnt close). Nothing more is required (i.e. no iptables trick) :-/ Just have sufficiently many AMQPS connections (SSL might play role, can verify if so) where the client does respond on TCP level only. Just a hint: how expensive is gettimeofday function? Noticed in gdb several times: #0 0x00007ffffd0d5ddf in gettimeofday () #1 0x00007f2577a94a5e in pn_i_now () at /usr/src/debug/qpid-dispatch-0.4/src/posix/driver.c:159 #2 0x00007f2577a9593a in qdpn_connector_process (c=c@entry=0x7f2558004240) at /usr/src/debug/qpid-dispatch-0.4/src/posix/driver.c:735 #3 0x00007f2577a9f3dc in process_connector (cxtr=0x7f2558004240, qd_server=0x1680e50) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:324 #4 thread_run (arg=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:622 #5 0x00007f2577611df5 in start_thread (arg=0x7f25689ed700) at pthread_create.c:308 #6 0x00007f2576b6d1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 This is _not_ fixed by bz1264518 / in qpid-dispatch-router-0.4-9 . Reproducer: use script from [1] and wait 10 seconds - just very few "abandoned" connections with heartbeats can cause this [1] https://issues.apache.org/jira/browse/PROTON-1000?focusedCommentId=14909238&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14909238 Closing this BZ as it should be fixed in Satellite 6.1.3, due to "Fixed In Version: qpid-dispatch-0.4-10". That package version is in 6.1.3 errata [1]. [1] https://access.redhat.com/errata/RHBA-2015:1911 |