Bug 1295957
| Summary: | goferd taking 100% CPU after successful reconnect to qdrouterd after a longer time | |||
|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | |
| Component: | katello-agent | Assignee: | Katello Bug Bin <katello-bugs> | |
| Status: | CLOSED ERRATA | QA Contact: | Sachin Ghai <sghai> | |
| Severity: | high | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 6.1.5 | CC: | ahuchcha, bkearney, cdonnell, chrobert, cwelton, david.pinel, ddevra, fjayalat, gabicr, hasingh, hklein, jentrena, kgiusti, kshravag, mcressma, mmccune, mshimura, nmiao, peter.vreman, pmoravec, pmutha, rmahique, sghai, smeyer, sreber, thamilto, wharris | |
| Target Milestone: | Unspecified | Keywords: | Triaged | |
| Target Release: | Unused | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | qpid-proton-0.9-12 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: proton reactor didn't clean up internal pipes correctly when a connection was reset
Consequence: on reconnect, the reactor thread could get into a loop, causing high cpu
Fix: the pipes are now cleaned up properly
Result: and the cpu now does not spin on reconnect
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1301103 (view as bug list) | Environment: | ||
| Last Closed: | 2016-01-28 07:46:59 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1122832, 1301103 | |||
|
Description
Pavel Moravec
2016-01-05 22:14:48 UTC
(In reply to Pavel Moravec from comment #0) > Adding a logging event to > /usr/lib64/python2.7/site-packages/proton/utils.py, line 252, just one such > event is logged every 10 seconds (hint, hint, why 10 seconds? be aware I > have in > /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/connection. > py "MAX_DELAY = 1" so not _this_ parameter). this is easy: /usr/lib/python2.7/site-packages/gofer/messaging/consumer.py: line 106: message, document = self._reader.next(10) Another hint (rather stressing what is hidden in the text): once goferd / proton reactor is in the busy loop, restarting qdrouterd breaks it - after the restart, goferd starts to behave normally. (it feels to me like proton reactor did not update some information about lost connection/link (being restored - _this_ isnt updated), busy waiting to reconnect/reattach event that however had already happened - and qdrouterd restart generates new such event that clears the busy loop) It has been confirmed as proton reactor bug, thanks kgiusti++ . See upstream https://issues.apache.org/jira/browse/PROTON-1090 Verified with satellite 6.1.6 compose 7 (async errata). I was able to reproduce the reported issue with: ]# rpm -qa | grep qpid-proton qpid-proton-c-0.9-11.el7.x86_64 python-qpid-proton-0.9-11.el7.x86_64 --- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28593 root 20 0 1535020 39796 9772 S 100.0 0.1 2:18.76 python root 28593 1 5 04:14 ? 00:00:40 python /usr/bin/goferd --foreground --- Later, I upgraded the rhel7 client with latest compose7 and with following packages: [root@apollo ~]# rpm -qa | grep qpid python-qpid-proton-0.9-12.el7sat.x86_64 qpid-proton-c-0.9-12.el7sat.x86_64 And this time I don't see any process like python/goferd consuming more that 100% CPU on re-connection to qdrouterd. here is the result after fix: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17107 root 20 0 1534464 39516 9796 S 0.3 0.1 0:00.92 python root 17107 1 0 04:53 ? 00:00:01 python /usr/bin/goferd --foreground Will test the fix with rhel6 and rhel5 clients and update here One more observation, while installing packages, I see a sudden increase in CPU usage by 40-50%. Is this expected ? Here is the result on installing a single package on rhel6. CPU usage increases by 70% PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8566 root 20 0 2074m 153m 13m S 70.5 0.6 0:07.85 python root 8566 1 7 13:01 ? 00:00:10 python /usr/bin/goferd Ok, I verified with rhel5.11 and rhel67 and rhel72 clients. I don't see python/goferd consuming more that 100% CPU on re-connection to qdrouterd. However, I do see sudden rise in CPU usage when we install packages on client from sat6 server. @Mike: could you please confirm if its expected ? it is expected that the gofer pytyhon process uses CPU while installing packages as this process is what is interacting with the yum api (not the yum CLI) to download and interact with RPM. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0077 |