Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 735109

Summary:	gatherd: segfault in libmetricUnixProcess.so
Product:	Red Hat Enterprise Linux 6	Reporter:	Milos Malik <mmalik>
Component:	sblim-gather	Assignee:	Vitezslav Crhonek <vcrhonek>
Status:	CLOSED ERRATA	QA Contact:	qe-baseos-daemons
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.2	CC:	azelinka, kvolny, rvokal
Target Milestone:	rc	Keywords:	TestOnly
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	sblim-gather-2.2.3-2.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 11:57:00 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Milos Malik 2011-09-01 14:09:37 UTC

Description of problem:


Version-Release number of selected component (if applicable):
sblim-gather-2.2.3-1.el6

How reproducible:
always

Steps to Reproduce:
# service gatherer status
gatherd is stopped
reposd is stopped
# service gatherer start
Starting gatherd:                                          [  OK  ]
Starting reposd:                                           [  OK  ]
# service gatherer status
gatherd (pid 16358) is running...
reposd (pid 16363) is running...
# gatherctl
help
	h		print this help message
	s		status
	i		init
	t		terminate
	b		start sampling
	e		stop sampling
	l plugin	load plugin
	u plugin	unload plugin
	v plugin	view/list metrics for plugin
	q		quit
	k		kill daemon
	d		start daemon
	c		local trace
s
Status initialized and sampling, 8 plugins and 20 metrics. 
s
Daemon not reachable.
q
# service gatherer status
gatherd dead but subsys locked
reposd (pid 16363) is running...
# 

Actual results:
gatherd[14802]: segfault at 0 ip 0052d5e2 sp b6e66200 error 6 in libmetricUnixProcess.so[52b000+3000]
gatherd[16059]: segfault at 0 ip 005b55e2 sp b6d97200 error 6 in libmetricUnixProcess.so[5b3000+3000]
gatherd[16370]: segfault at 0 ip 001645e2 sp b6e48200 error 6 in libmetricUnixProcess.so[162000+3000]

Expected results:
* no segfaults

Comment 4 Karel Volný 2011-09-08 16:14:32 UTC

I have to know the exact condition which triggers the segfault - at first it seemed I cannot reproduce it:

.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd is stopped
reposd is stopped
.live.[root@s390x-6s-v1 tps]# service gatherer start
Starting gatherd: [  OK  ]
Starting reposd: [  OK  ]
.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd (pid 51111) is running...
reposd (pid 51121) is running...
.live.[root@s390x-6s-v1 tps]# gatherctl
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
s
Status initialized and sampling, 12 plugins and 24 metrics. 
q
.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd (pid 51111) is running...
reposd (pid 51121) is running...
.live.[root@s390x-6s-v1 tps]# rpm -q sblim-gather
sblim-gather-2.2.3-1.el6.s390x



but then, after a while, suddenly it became dead:

.live.[root@s390x-6s-v1 tps]# service gatherer status
gatherd dead but subsys locked
reposd (pid 51121) is running...


now how long exactly I have to wait, or what action should I take, to be sure that the new version doesn't crash?

I tried to run the checks periodically and it seems the daemon dies when the system clock hits a new minute (0 seconds) - for example:

Sep  8 12:10:00 x86-64-6s-m1 kernel: gatherd[13777]: segfault at 0 ip 00007f385b0622ba sp 00007f385a453cc0 error 6 in libmetricUnixProcess.so[7f385b060000+3000]

- is that the reason, is it enough to wait 61 seconds (in the worst case) then?

Comment 5 Vitezslav Crhonek 2011-09-12 11:06:25 UTC

(In reply to comment #4)
> 
> now how long exactly I have to wait, or what action should I take, to be sure
> that the new version doesn't crash?
> 
> I tried to run the checks periodically and it seems the daemon dies when the
> system clock hits a new minute (0 seconds) - for example:
> 
> Sep  8 12:10:00 x86-64-6s-m1 kernel: gatherd[13777]: segfault at 0 ip
> 00007f385b0622ba sp 00007f385a453cc0 error 6 in
> libmetricUnixProcess.so[7f385b060000+3000]
> 
> - is that the reason, is it enough to wait 61 seconds (in the worst case) then?

The sampling function (metricRetrCPUTime) which caused segfault is called periodically by the daemon every 60 seconds. So it should be okay to wait 61 seconds (or more if you want to see it survive more iterations).

Comment 7 errata-xmlrpc 2011-12-06 11:57:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1593.html