Bug 1136166

Summary: pmfind segfaults when avahi not running
Product: [Fedora] Fedora EPEL Reporter: Marko Myllynen <myllynen>
Component: pcpAssignee: Dave Brolley <brolley>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: el6CC: brolley, fche, mgoodwin, nathans, pcp, scox
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-3.10.2-2.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-27 10:07:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marko Myllynen 2014-09-02 05:27:09 UTC
Description of problem:
localhost:~> service avahi-daemon status
avahi-daemon is stopped
zsh: exit 3     service avahi-daemon status
localhost:~> pmfind -s pmcd                   
Discovered pmcd servers:
zsh: segmentation fault  pmfind -s pmcd
localhost:~> 

Version-Release number of selected component (if applicable):
3.9.9

Comment 1 Nathan Scott 2014-09-02 05:33:38 UTC
Any stack trace handy there Marko?  That might speed up the diagnosis process.

Could you take a look please Dave?  (feel free to toss it back to me if there is
insufficient time before weeks end).

Thanks.

Comment 2 Marko Myllynen 2014-09-02 05:41:16 UTC
(In reply to Nathan Scott from comment #1)
> Any stack trace handy there Marko?  That might speed up the diagnosis
> process.

Ok, I'll attach one from my test system as a private attachment.

Thanks.

Comment 4 Dave Brolley 2014-09-02 19:36:58 UTC
I am, unfortunately, unable to reproduce this on any of my systems.

The seg fault occurs while pmfind is trying to print the list of discovered servers which should be empty, given that avahi-daemon is not running. This suggests that somehow __pmDiscoverServicesWithOptions() returned a positive result. In an error situation, the result should be negative.

Since no mechanism was specified, a NULL mechanism would have been passed to __pmDiscoverServicesWithOptions() which would cause it to attempt both the "avahi" and the "probe" mechanisms. When presented with a NULL mechanism, __pmProbeDiscoverServices() returns zero, since no network has been specified. Therefore __pmAvahiDiscoverServices() must have returned the positive result. I see no obvious initialization problems nor do I see an obvious point at which a positive error code is produced.

Is there some way we can have access to a machine on which this failure can be reproduced?

Comment 5 Dave Brolley 2014-09-02 19:38:02 UTC
I should also add that I was unable to examine the supplied core file using either crash or gdb.

Comment 6 Marko Myllynen 2014-09-03 06:10:26 UTC
When running with gdb -args pmfind -s pmcd I get:

...
(gdb) run
Starting program: /usr/bin/pmfind -s pmcd
[Thread debugging using libthread_db enabled]
Discovered pmcd servers:

Program received signal SIGSEGV, Segmentation fault.
discovery (spec=<value optimized out>) at pmfind.c:158
158		    printf("  %s\n", urls[i]);
Missing separate debuginfos
(gdb) bt full
#0  discovery (spec=<value optimized out>) at pmfind.c:158
        i = <value optimized out>
        sts = 26
        urls = 0x0
#1  0x0000000000401040 in main (argc=3, argv=0x7fffffffdf78) at pmfind.c:214
        service = 0x7fffffffe361 "pmcd"
        c = <value optimized out>
        sts = <value optimized out>
        total = <value optimized out>
(gdb)

Comment 7 Nathan Scott 2014-09-03 06:25:26 UTC
Looks like fallout from that time where an error code was accidentally being returned as a positive instead of negative value at one point, Dave?  Marko's running 3.9.9 so I'm hoping this may be something you've fixed already.

Comment 8 Dave Brolley 2014-09-03 16:04:27 UTC
I think I've figured this out.

In __pmAvahiDiscoverServices(), the error code is sometimes set to an avahi error code, either by calling avahi_client_errno() or as set by avahi_client_new().

When __pmAvahiDiscoverServices() returns the error code, it assumes that it is positive and negates it. However, it turns out that the avahi error codes are already negative, so we end up returning a positive value which pmfind(1) interprets as the number of urls discovered.

In Marko's gdb session, we see that sts == 26. It turns out that -26 is the avahi error code AVAHI_ERR_NO_DAEMON, as is expected in this scenario.

Comment 9 Dave Brolley 2014-09-03 16:36:58 UTC
commit 5c7e21e64c6cd1c40dfcbfc7a58678ee37a3f6b3 on the brolley/dev branch of the pcpfans repository

Comment 10 Fedora Update System 2014-09-05 08:53:50 UTC
pcp-3.9.10-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/pcp-3.9.10-1.fc21

Comment 11 Fedora Update System 2014-09-05 08:54:39 UTC
pcp-3.9.10-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/pcp-3.9.10-1.fc20

Comment 12 Fedora Update System 2014-09-05 08:55:30 UTC
pcp-3.9.10-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/pcp-3.9.10-1.fc19

Comment 13 Fedora Update System 2014-09-05 08:56:24 UTC
pcp-3.9.10-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/pcp-3.9.10-1.el6

Comment 14 Fedora Update System 2014-09-05 08:57:16 UTC
pcp-3.9.10-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/pcp-3.9.10-1.el5

Comment 15 Fedora Update System 2014-09-06 00:59:54 UTC
Package pcp-3.9.10-1.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing pcp-3.9.10-1.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-10197/pcp-3.9.10-1.fc21
then log in and leave karma (feedback).

Comment 16 Marko Myllynen 2014-09-08 05:17:48 UTC
Segfault is gone in 3.9.10 but I think the message is rather misleading. But if you think it's good enough, please feel free to close this one.

localhost:~> pmfind -s pmcd
pmfind: service pmcd discovery failure: Text file busy
zsh: exit 2     pmfind -s pmcd
localhost:~>

Comment 17 Dave Brolley 2014-09-08 14:23:24 UTC
Yes, the message is misleading. A separate bug report for that would probably be best.

Comment 18 Marko Myllynen 2014-09-09 07:14:19 UTC
(In reply to Dave Brolley from comment #17)
> Yes, the message is misleading. A separate bug report for that would
> probably be best.

Ok, I just filed https://bugzilla.redhat.com/show_bug.cgi?id=1139529.

Thanks.

Comment 19 Fedora Update System 2014-09-09 22:11:12 UTC
Package pcp-3.9.10-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing pcp-3.9.10-1.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-10376/pcp-3.9.10-1.fc20
then log in and leave karma (feedback).

Comment 20 Fedora Update System 2014-09-15 16:00:13 UTC
pcp-3.9.10-4.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/pcp-3.9.10-4.fc19

Comment 21 Fedora Update System 2014-09-15 16:00:53 UTC
pcp-3.9.10-4.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/pcp-3.9.10-4.fc20

Comment 22 Fedora Update System 2014-09-15 16:01:33 UTC
pcp-3.9.10-4.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/pcp-3.9.10-4.fc21

Comment 23 Fedora Update System 2014-09-27 10:07:15 UTC
pcp-3.9.10-4.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2014-10-31 08:02:16 UTC
pcp-3.10.0-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/pcp-3.10.0-1.fc20

Comment 25 Fedora Update System 2014-10-31 08:03:28 UTC
pcp-3.10.0-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/pcp-3.10.0-1.fc19

Comment 26 Fedora Update System 2014-11-10 18:27:12 UTC
pcp-3.10.0-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 27 Fedora Update System 2014-11-10 18:27:44 UTC
pcp-3.10.0-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 28 Fedora Update System 2015-01-26 17:10:01 UTC
pcp-3.10.2-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/pcp-3.10.2-1.el5

Comment 29 Fedora Update System 2015-02-04 21:31:14 UTC
pcp-3.10.2-2.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/pcp-3.10.2-2.el5

Comment 30 Fedora Update System 2015-02-20 20:41:50 UTC
pcp-3.10.2-2.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.