Bug 496146

Summary: jabberd's router process consumes a ton of the CPU cycles on a RHEL4 530 Satellite
Product: Red Hat Satellite 5 Reporter: Steve Salevan <ssalevan>
Component: ServerAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED CURRENTRELEASE QA Contact: Steve Salevan <ssalevan>
Severity: high Docs Contact:
Priority: high    
Version: 530CC: cperry, dgoodwin, jpazdziora
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sat530 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-28 19:27:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 456985, 486216    

Description Steve Salevan 2009-04-16 20:38:14 UTC
Description of problem:
If one installs a 530 Satellite upon a RHEL4 machine, performs some simple tasks, and waits a bit, they will discover that jabberd's 'router' process is consuming over 90% of the available CPU cycles at any given time.

Version-Release number of selected component (if applicable):
530, 4/14 build, RHEL4 platform

How reproducible:
Always (to the best of my knowledge)

Steps to Reproduce:
1. Install 530 upon a RHEL4 system
2. Use this new Satellite for a short while
3. Run 'top'
  
Actual results:
A sizeable chunk of all available CPU cycles (IE: over 70%) is being utilized by jabberd's router process continuously

Expected results:
jabberd's router process consumes CPU time less voraciously

Additional info:

Comment 1 Devan Goodwin 2009-04-24 14:33:53 UTC
Getting a RHEL 4 setup going now, were any special steps necessary to hit this as I was under the impression jabberd was pretty broken in current Satellite builds? Were any clients registered and configured to use osad? Or did this just *happen* with no explicit osad usage?

Comment 2 Devan Goodwin 2009-04-27 13:24:27 UTC
Stock install jabberd seems broken and isn't running, Steve could you confirm if you had to do anything special to get jabberd working to replicate this?

[root@dhcp77-164 ~]# service jabberd status
jabberd router is stopped
[root@dhcp77-164 ~]# ps aux | grep jabberd
root     24746  0.0  0.0  4292  656 pts/2    R+   15:21   0:00 grep jabberd
[root@dhcp77-164 ~]# ps aux | grep router
root     24808  0.0  0.0  3884  656 pts/2    R+   15:23   0:00 grep router
[root@dhcp77-164 ~]#

[root@dhcp77-164 ~]# service jabberd start
Starting Jabber services                                   [  OK  ]
[root@dhcp77-164 ~]# 

Apr 27 15:22:10 dhcp77-164 jabberd: jabberd startup succeeded
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: couldn't open /var/lib/jabberd/pid/c2s.pid for writing: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: starting up
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: process id is 24777, written to /var/run/jabberd/router.pid
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: process id is 24778, written to /var/run/jabberd/resolver.pid
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: attempting connection to router at 127.0.0.1, port=5347
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: [4] [router] write error: Connection refused (111)
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: connection to router closed
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: attempting reconnect (3 left)
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: id: dhcp77-164.rhndev.redhat.com
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: couldn't open /var/lib/jabberd/pid/sm.pid for writing: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: starting up (interval=3, queue=60, keepalive=0, idle=86400)
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: loaded user table (1 users)
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: db: couldn't open environment: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: db: couldn't open environment: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: couldn't open /var/lib/jabberd/pid/s2s.pid for writing: No such file or directory
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: [0.0.0.0, port=5347] listening for incoming connections
Apr 27 15:22:11 dhcp77-164 jabberd/c2s[24781]: failed to initialise auth module 'db'
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: attempting connection to router at 127.0.0.1, port=5347
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: attempting connection to router at 127.0.0.1, port=5347
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: initialisation of storage driver 'db' failed
Apr 27 15:22:11 dhcp77-164 jabberd/router[24777]: shutting down
Apr 27 15:22:11 dhcp77-164 jabberd/resolver[24778]: shutting down
Apr 27 15:22:11 dhcp77-164 jabberd/s2s[24780]: shutting down
Apr 27 15:22:11 dhcp77-164 jabberd/sm[24779]: failed to initialise one or more storage drivers, aborting

[root@dhcp77-164 ~]# service jabberd status
jabberd router is stopped
[root@dhcp77-164 ~]#

Comment 3 Devan Goodwin 2009-04-27 19:39:23 UTC
Ok I cannot reproduce this, installed latest EL4 build, but jabberd service doesn't even start so I'm missing some info on how we got to this state for the bug. 

However in trying to replicate I applied the fixes from bug #493665, got jabberd running, synced channels, setup a client with osad, registered them and verified that osad was working properly.

I'm going to move to modified for now, if this issue resurfaces please add to this ticket with more detailed information on what state the box was in when it happened. Thanks.

Comment 4 Devan Goodwin 2009-04-28 16:41:47 UTC
Hit this today on a RHEL 5 satellite that did not have the above jabberd fixes. There's a lot wrong with it's config, jabberd driver, missing pem, etc. I suspect getting the daemon properly configured as per the above fixes will still solve the issue, so leaving this modified for now.

Comment 5 Devan Goodwin 2009-04-28 16:42:12 UTC
Cancel needinfo.

Comment 6 Devan Goodwin 2009-04-28 16:49:23 UTC
strace -p pid shows nothing, except I can't control + C to stop.

Box had just RHEL 5 channels synced, 1 client (not configured for osad).

Logs all look clean, no repeat error messages coming out. Strange but not really experienced in this realm of debugging.

Comment 7 Steve Salevan 2009-06-01 15:57:17 UTC
Moving to VERIFIED, tested on 5/21 build.

Comment 8 Jan Pazdziora 2009-09-11 10:34:43 UTC
Stage validated on Satellite-5.3.0-RHEL4-re20090820.1. The jabberd service behaves just fine, both upon startup and when doing instant push. Moving to RELEASE_PENDING.

Comment 9 Clifford Perry 2009-10-28 19:27:13 UTC
somehow missed and did not move to CLOSED when 530 GA'd