This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 735109 - gatherd: segfault in libmetricUnixProcess.so
gatherd: segfault in libmetricUnixProcess.so
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sblim-gather (Show other bugs)
6.2
All Linux
unspecified Severity medium
: rc
: ---
Assigned To: Vitezslav Crhonek
qe-baseos-daemons
: TestOnly
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-01 10:09 EDT by Milos Malik
Modified: 2011-12-06 06:57 EST (History)
3 users (show)

See Also:
Fixed In Version: sblim-gather-2.2.3-2.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-06 06:57:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Milos Malik 2011-09-01 10:09:37 EDT
Description of problem:


Version-Release number of selected component (if applicable):
sblim-gather-2.2.3-1.el6

How reproducible:
always

Steps to Reproduce:
# service gatherer status
gatherd is stopped
reposd is stopped
# service gatherer start
Starting gatherd:                                          [  OK  ]
Starting reposd:                                           [  OK  ]
# service gatherer status
gatherd (pid 16358) is running...
reposd (pid 16363) is running...
# gatherctl
help
	h		print this help message
	s		status
	i		init
	t		terminate
	b		start sampling
	e		stop sampling
	l plugin	load plugin
	u plugin	unload plugin
	v plugin	view/list metrics for plugin
	q		quit
	k		kill daemon
	d		start daemon
	c		local trace
s
Status initialized and sampling, 8 plugins and 20 metrics. 
s
Daemon not reachable.
q
# service gatherer status
gatherd dead but subsys locked
reposd (pid 16363) is running...
# 

Actual results:
gatherd[14802]: segfault at 0 ip 0052d5e2 sp b6e66200 error 6 in libmetricUnixProcess.so[52b000+3000]
gatherd[16059]: segfault at 0 ip 005b55e2 sp b6d97200 error 6 in libmetricUnixProcess.so[5b3000+3000]
gatherd[16370]: segfault at 0 ip 001645e2 sp b6e48200 error 6 in libmetricUnixProcess.so[162000+3000]

Expected results:
* no segfaults
Comment 4 Karel Volný 2011-09-08 12:14:32 EDT
I have to know the exact condition which triggers the segfault - at first it seemed I cannot reproduce it:

.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd is stopped
reposd is stopped
.live.[root@s390x-6s-v1 tps]# service gatherer start
Starting gatherd: [  OK  ]
Starting reposd: [  OK  ]
.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd (pid 51111) is running...
reposd (pid 51121) is running...
.live.[root@s390x-6s-v1 tps]# gatherctl
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
q
.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd (pid 51111) is running...
reposd (pid 51121) is running...
.live.[root@s390x-6s-v1 tps]# rpm -q sblim-gather
sblim-gather-2.2.3-1.el6.s390x



but then, after a while, suddenly it became dead:

.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd dead but subsys locked
reposd (pid 51121) is running...


now how long exactly I have to wait, or what action should I take, to be sure that the new version doesn't crash?

I tried to run the checks periodically and it seems the daemon dies when the system clock hits a new minute (0 seconds) - for example:

Sep  8 12:10:00 x86-64-6s-m1 kernel: gatherd[13777]: segfault at 0 ip 00007f385b0622ba sp 00007f385a453cc0 error 6 in libmetricUnixProcess.so[7f385b060000+3000]

- is that the reason, is it enough to wait 61 seconds (in the worst case) then?
Comment 5 Vitezslav Crhonek 2011-09-12 07:06:25 EDT
(In reply to comment #4)
> 
> now how long exactly I have to wait, or what action should I take, to be sure
> that the new version doesn't crash?
> 
> I tried to run the checks periodically and it seems the daemon dies when the
> system clock hits a new minute (0 seconds) - for example:
> 
> Sep  8 12:10:00 x86-64-6s-m1 kernel: gatherd[13777]: segfault at 0 ip
> 00007f385b0622ba sp 00007f385a453cc0 error 6 in
> libmetricUnixProcess.so[7f385b060000+3000]
> 
> - is that the reason, is it enough to wait 61 seconds (in the worst case) then?

The sampling function (metricRetrCPUTime) which caused segfault is called periodically by the daemon every 60 seconds. So it should be okay to wait 61 seconds (or more if you want to see it survive more iterations).
Comment 7 errata-xmlrpc 2011-12-06 06:57:00 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1593.html

Note You need to log in before you can comment on or make changes to this bug.