Bug 496146 - jabberd's router process consumes a ton of the CPU cycles on a RHEL4 530 Satellite
jabberd's router process consumes a ton of the CPU cycles on a RHEL4 530 Sate...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server (Show other bugs)
530
All Linux
high Severity high
: ---
: ---
Assigned To: Devan Goodwin
Steve Salevan
:
Depends On:
Blocks: 456985 486216
  Show dependency treegraph
 
Reported: 2009-04-16 16:38 EDT by Steve Salevan
Modified: 2009-10-28 15:27 EDT (History)
3 users (show)

See Also:
Fixed In Version: sat530
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-10-28 15:27:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steve Salevan 2009-04-16 16:38:14 EDT
Description of problem:
If one installs a 530 Satellite upon a RHEL4 machine, performs some simple tasks, and waits a bit, they will discover that jabberd's 'router' process is consuming over 90% of the available CPU cycles at any given time.

Version-Release number of selected component (if applicable):
530, 4/14 build, RHEL4 platform

How reproducible:
Always (to the best of my knowledge)

Steps to Reproduce:
1. Install 530 upon a RHEL4 system
2. Use this new Satellite for a short while
3. Run 'top'
  
Actual results:
A sizeable chunk of all available CPU cycles (IE: over 70%) is being utilized by jabberd's router process continuously

Expected results:
jabberd's router process consumes CPU time less voraciously

Additional info:
Comment 1 Devan Goodwin 2009-04-24 10:33:53 EDT
Getting a RHEL 4 setup going now, were any special steps necessary to hit this as I was under the impression jabberd was pretty broken in current Satellite builds? Were any clients registered and configured to use osad? Or did this just *happen* with no explicit osad usage?
Comment 2 Devan Goodwin 2009-04-27 09:24:27 EDT
Stock install jabberd seems broken and isn't running, Steve could you confirm if you had to do anything special to get jabberd working to replicate this?

[root@dhcp77-164 ~]# service jabberd status
jabberd router is stopped
[root@dhcp77-164 ~]# ps aux | grep jabberd
root     24746  0.0  0.0  4292  656 pts/2    R+   15:21   0:00 grep jabberd
[root@dhcp77-164 ~]# ps aux | grep router
root     24808  0.0  0.0  3884  656 pts/2    R+   15:23   0:00 grep router
[root@dhcp77-164 ~]#

[root@dhcp77-164 ~]# service jabberd start
Starting Jabber services                                   [  OK  ]
[root@dhcp77-164 ~]# 

Apr 27 15:22:10 dhcp77-164 jabberd: jabberd startup succeeded
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: couldn't open /var/lib/jabberd/pid/c2s.pid for writing: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: process id is 24777, written to /var/run/jabberd/router.pid
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: process id is 24778, written to /var/run/jabberd/resolver.pid
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: attempting connection to router at 127.0.0.1, port=5347
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: [4] [router] write error: Connection refused (111)
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: connection to router closed
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: attempting reconnect (3 left)
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: id: dhcp77-164.rhndev.redhat.com
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: couldn't open /var/lib/jabberd/pid/sm.pid for writing: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: starting up (interval=3, queue=60, keepalive=0, idle=86400)
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: loaded user table (1 users)
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: db: couldn't open environment: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: db: couldn't open environment: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: couldn't open /var/lib/jabberd/pid/s2s.pid for writing: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: [0.0.0.0, port=5347] listening for incoming connections
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: failed to initialise auth module 'db'
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: attempting connection to router at 127.0.0.1, port=5347
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: attempting connection to router at 127.0.0.1, port=5347
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: initialisation of storage driver 'db' failed
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: shutting down
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: shutting down
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: shutting down
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: failed to initialise one or more storage drivers, aborting

[root@dhcp77-164 ~]# service jabberd status
jabberd router is stopped
[root@dhcp77-164 ~]#
Comment 3 Devan Goodwin 2009-04-27 15:39:23 EDT
Ok I cannot reproduce this, installed latest EL4 build, but jabberd service doesn't even start so I'm missing some info on how we got to this state for the bug. 

However in trying to replicate I applied the fixes from bug #493665, got jabberd running, synced channels, setup a client with osad, registered them and verified that osad was working properly.

I'm going to move to modified for now, if this issue resurfaces please add to this ticket with more detailed information on what state the box was in when it happened. Thanks.
Comment 4 Devan Goodwin 2009-04-28 12:41:47 EDT
Hit this today on a RHEL 5 satellite that did not have the above jabberd fixes. There's a lot wrong with it's config, jabberd driver, missing pem, etc. I suspect getting the daemon properly configured as per the above fixes will still solve the issue, so leaving this modified for now.
Comment 5 Devan Goodwin 2009-04-28 12:42:12 EDT
Cancel needinfo.
Comment 6 Devan Goodwin 2009-04-28 12:49:23 EDT
strace -p pid shows nothing, except I can't control + C to stop.

Box had just RHEL 5 channels synced, 1 client (not configured for osad).

Logs all look clean, no repeat error messages coming out. Strange but not really experienced in this realm of debugging.
Comment 7 Steve Salevan 2009-06-01 11:57:17 EDT
Moving to VERIFIED, tested on 5/21 build.
Comment 8 Jan Pazdziora 2009-09-11 06:34:43 EDT
Stage validated on Satellite-5.3.0-RHEL4-re20090820.1. The jabberd service behaves just fine, both upon startup and when doing instant push. Moving to RELEASE_PENDING.
Comment 9 Clifford Perry 2009-10-28 15:27:13 EDT
somehow missed and did not move to CLOSED when 530 GA'd

Note You need to log in before you can comment on or make changes to this bug.