Bug 586752 - dlm_controld.pcmk segfault on early startup
dlm_controld.pcmk segfault on early startup
Product: Fedora
Classification: Fedora
Component: pacemaker (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: Andrew Beekhof
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2010-04-28 06:12 EDT by Oliver Heinz
Modified: 2011-01-31 11:42 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-01-31 11:42:32 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Andrew Beekhof's patch to fix this issue (417 bytes, patch)
2010-04-28 06:12 EDT, Oliver Heinz
no flags Details | Diff

  None (edit)
Description Oliver Heinz 2010-04-28 06:12:15 EDT
Created attachment 409748 [details]
Andrew Beekhof's patch to fix this issue

Description of problem:
dlm_controld.pcmk segfaults on startup if network uses vlan, bonding or bridging and corosync/pacemaker is invoked too early

Version-Release number of selected component (if applicable):
bug and patch testet on 3.0.7 ubuntu lucid packages

How reproducible:
Configure any of the obove on top of the raw interface and start corosync before the network settles.

Additional info:
The issue is discussed here http://oss.clusterlabs.org/pipermail/pacemaker/2010-April/005954.html
Andrew Beekhof <andrew@beekhof.net> posted the attached patch that fixes this issue.

gdb output is:
Core was generated by `dlm_controld.pcmk -q 0'.
Program terminated with signal 11, Segmentation fault.
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
        in ../sysdeps/x86_64/multiarch/../strlen.S
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
#1  0x00007f499565cd46 in *__GI___strdup (s=0x0) at strdup.c:42
#2  0x0000000000403f0c in dlm_process_node (key=<value optimized out>, value=0x1864a30, user_data=0x62a4f8) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:136
#3  0x00007f4995cdbd73 in IA__g_hash_table_foreach (hash_table=0x1866050, func=0x403e40 <dlm_process_node>, user_data=0x62a4f8) at /build/buildd/glib2.0-2.24.0/glib/ghash.c:1325
#4  0x0000000000403c9e in update_cluster () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:82
#5  0x0000000000415a4a in loop () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:986
#6  0x000000000041659c in main (argc=<value optimized out>, argv=<value optimized out>) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:1295

Comment 1 Andrew Beekhof 2010-04-28 08:08:13 EDT
Patch fa24b46 resolving this issue has been committed in cluster.git

Essentially, the dlm was trying to create a configfs entry for a node with no address.
This lead to a NULL pointer being dereferenced and the dlm crashing.

The above mentioned patch now checks for a valid address before continuing.
Comment 2 Andrew Beekhof 2010-04-29 09:22:44 EDT
Sorry, set the wrong status.
Comment 3 Bug Zapper 2010-07-30 07:29:34 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:

Note You need to log in before you can comment on or make changes to this bug.