586752 – dlm_controld.pcmk segfault on early startup

Bug 586752 - dlm_controld.pcmk segfault on early startup

Summary: dlm_controld.pcmk segfault on early startup

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	pacemaker
Sub Component:
Version:	14
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Andrew Beekhof
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-04-28 10:12 UTC by Oliver Heinz
Modified:	2011-01-31 16:42 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-01-31 16:42:32 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Andrew Beekhof's patch to fix this issue (417 bytes, patch) 2010-04-28 10:12 UTC, Oliver Heinz	no flags	Details \| Diff
View All

Description Oliver Heinz 2010-04-28 10:12:15 UTC

Created attachment 409748 [details]
Andrew Beekhof's patch to fix this issue

Description of problem:
dlm_controld.pcmk segfaults on startup if network uses vlan, bonding or bridging and corosync/pacemaker is invoked too early

Version-Release number of selected component (if applicable):
bug and patch testet on 3.0.7 ubuntu lucid packages

How reproducible:
Configure any of the obove on top of the raw interface and start corosync before the network settles.

Additional info:
The issue is discussed here http://oss.clusterlabs.org/pipermail/pacemaker/2010-April/005954.html
 
Andrew Beekhof <andrew> posted the attached patch that fixes this issue.


gdb output is:
Core was generated by `dlm_controld.pcmk -q 0'.
Program terminated with signal 11, Segmentation fault.
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
        in ../sysdeps/x86_64/multiarch/../strlen.S
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
#1  0x00007f499565cd46 in *__GI___strdup (s=0x0) at strdup.c:42
#2  0x0000000000403f0c in dlm_process_node (key=<value optimized out>, value=0x1864a30, user_data=0x62a4f8) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:136
#3  0x00007f4995cdbd73 in IA__g_hash_table_foreach (hash_table=0x1866050, func=0x403e40 <dlm_process_node>, user_data=0x62a4f8) at /build/buildd/glib2.0-2.24.0/glib/ghash.c:1325
#4  0x0000000000403c9e in update_cluster () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:82
#5  0x0000000000415a4a in loop () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:986
#6  0x000000000041659c in main (argc=<value optimized out>, argv=<value optimized out>) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:1295



hth,
Oliver

Comment 1 Andrew Beekhof 2010-04-28 12:08:13 UTC

Patch fa24b46 resolving this issue has been committed in cluster.git
   http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=fa24b460c51aa0c47d0842703feea8bca0ed66b7

Essentially, the dlm was trying to create a configfs entry for a node with no address.
This lead to a NULL pointer being dereferenced and the dlm crashing.

The above mentioned patch now checks for a valid address before continuing.

Comment 2 Andrew Beekhof 2010-04-29 13:22:44 UTC

Sorry, set the wrong status.

Comment 3 Bug Zapper 2010-07-30 11:29:34 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.