Bug 1276176
Summary: | goferd on client does not reconnect back to capsule properly when the qdrouterd on the capsule side restarted | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Neil Miao <nmiao> |
Component: | Foreman Proxy | Assignee: | Jeff Ortel <jortel> |
Status: | CLOSED DUPLICATE | QA Contact: | Katello QA List <katello-qa-list> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.1.3 | CC: | bbuckingham, chrobert, davy.stoffel, ealcaniz, jcallaha, nmiao, peter.vreman, pmoravec, rmahique |
Target Milestone: | Unspecified | Keywords: | Triaged |
Target Release: | Unused | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-03-08 08:52:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1122832 |
Description
Neil Miao
2015-10-29 01:06:06 UTC
Apart from the broken goferd -> qrouterd connection. it seems has another side effect: The goferd daemon is using nearly 100% CPU *all the time* after the connection is broken. [root@nil06 ~]# ps aux | grep 25035 root 25035 95.8 4.0 1089836 40748 ? Ssl Nov02 3227:26 python /usr/bin/goferd --foreground [root@nil06 ~]# lsof -i :5647 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python 25035 root 17u IPv4 96490511 0t0 TCP nil06.devlab.redhat.com:52851->sat6cap01.util.phx1.redhat.com:5647 (CLOSE_WAIT) python 25035 root 22u IPv4 96572826 0t0 TCP nil06.devlab.redhat.com:52936->sat6cap01.util.phx1.redhat.com:5647 (CLOSE_WAIT) The python process is sitting there doing nothing. [root@nil06 ~]# strace -p 25035 Process 25035 attached futex(0x227bab0, FUTEX_WAIT_PRIVATE, 0, NULL^CProcess 25035 detached <detached ...> it happens to me as well. any ideas exactly the same here. How reproducible: only happens overnight, a bit hard to reproduce. => run katello-backup i guess. all our clients goes to this state around the same time. Highly critical, forced to restart all client agents after backup completed but hypervisors are impacted as all clients are running into a VM. I suspect the root cause is https://issues.apache.org/jira/browse/PROTON-1090. Can't it be a combination of https://bugzilla.redhat.com/show_bug.cgi?id=1272596 and https://bugzilla.redhat.com/show_bug.cgi?id=1292026 / https://bugzilla.redhat.com/show_bug.cgi?id=1281947 ? Several such issues have been recently fixed, I recommend upgrading to latest qpid-python qnd gofer versions that should help. In my environment i can confirm that this isssue is fixed with Sat6.1.6 and updating all clients to Sat6-Tools6.1.6 IMHO dup of https://bugzilla.redhat.com/show_bug.cgi?id=1272596, closing this. *** This bug has been marked as a duplicate of bug 1272596 *** I confirmed that the issue is fixed with Sat6.1.6+ |