Bug 586752 - dlm_controld.pcmk segfault on early startup
Summary: dlm_controld.pcmk segfault on early startup
Alias: None
Product: Fedora
Classification: Fedora
Component: pacemaker
Version: 14
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Andrew Beekhof
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2010-04-28 10:12 UTC by Oliver Heinz
Modified: 2011-01-31 16:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-01-31 16:42:32 UTC
Type: ---

Attachments (Terms of Use)
Andrew Beekhof's patch to fix this issue (417 bytes, patch)
2010-04-28 10:12 UTC, Oliver Heinz
no flags Details | Diff

Description Oliver Heinz 2010-04-28 10:12:15 UTC
Created attachment 409748 [details]
Andrew Beekhof's patch to fix this issue

Description of problem:
dlm_controld.pcmk segfaults on startup if network uses vlan, bonding or bridging and corosync/pacemaker is invoked too early

Version-Release number of selected component (if applicable):
bug and patch testet on 3.0.7 ubuntu lucid packages

How reproducible:
Configure any of the obove on top of the raw interface and start corosync before the network settles.

Additional info:
The issue is discussed here http://oss.clusterlabs.org/pipermail/pacemaker/2010-April/005954.html
Andrew Beekhof <andrew@beekhof.net> posted the attached patch that fixes this issue.

gdb output is:
Core was generated by `dlm_controld.pcmk -q 0'.
Program terminated with signal 11, Segmentation fault.
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
        in ../sysdeps/x86_64/multiarch/../strlen.S
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
#1  0x00007f499565cd46 in *__GI___strdup (s=0x0) at strdup.c:42
#2  0x0000000000403f0c in dlm_process_node (key=<value optimized out>, value=0x1864a30, user_data=0x62a4f8) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:136
#3  0x00007f4995cdbd73 in IA__g_hash_table_foreach (hash_table=0x1866050, func=0x403e40 <dlm_process_node>, user_data=0x62a4f8) at /build/buildd/glib2.0-2.24.0/glib/ghash.c:1325
#4  0x0000000000403c9e in update_cluster () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:82
#5  0x0000000000415a4a in loop () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:986
#6  0x000000000041659c in main (argc=<value optimized out>, argv=<value optimized out>) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:1295


Comment 1 Andrew Beekhof 2010-04-28 12:08:13 UTC
Patch fa24b46 resolving this issue has been committed in cluster.git

Essentially, the dlm was trying to create a configfs entry for a node with no address.
This lead to a NULL pointer being dereferenced and the dlm crashing.

The above mentioned patch now checks for a valid address before continuing.

Comment 2 Andrew Beekhof 2010-04-29 13:22:44 UTC
Sorry, set the wrong status.

Comment 3 Bug Zapper 2010-07-30 11:29:34 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:

Note You need to log in before you can comment on or make changes to this bug.