Created attachment 409748 [details] Andrew Beekhof's patch to fix this issue Description of problem: dlm_controld.pcmk segfaults on startup if network uses vlan, bonding or bridging and corosync/pacemaker is invoked too early Version-Release number of selected component (if applicable): bug and patch testet on 3.0.7 ubuntu lucid packages How reproducible: Configure any of the obove on top of the raw interface and start corosync before the network settles. Additional info: The issue is discussed here http://oss.clusterlabs.org/pipermail/pacemaker/2010-April/005954.html Andrew Beekhof <andrew> posted the attached patch that fixes this issue. gdb output is: Core was generated by `dlm_controld.pcmk -q 0'. Program terminated with signal 11, Segmentation fault. #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31 in ../sysdeps/x86_64/multiarch/../strlen.S #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31 #1 0x00007f499565cd46 in *__GI___strdup (s=0x0) at strdup.c:42 #2 0x0000000000403f0c in dlm_process_node (key=<value optimized out>, value=0x1864a30, user_data=0x62a4f8) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:136 #3 0x00007f4995cdbd73 in IA__g_hash_table_foreach (hash_table=0x1866050, func=0x403e40 <dlm_process_node>, user_data=0x62a4f8) at /build/buildd/glib2.0-2.24.0/glib/ghash.c:1325 #4 0x0000000000403c9e in update_cluster () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/pacemaker.c:82 #5 0x0000000000415a4a in loop () at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:986 #6 0x000000000041659c in main (argc=<value optimized out>, argv=<value optimized out>) at /usr/src/packages/redhat-cluster/3.0.7/redhat-cluster-3.0.7/group/dlm_controld/main.c:1295 hth, Oliver
Patch fa24b46 resolving this issue has been committed in cluster.git http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=fa24b460c51aa0c47d0842703feea8bca0ed66b7 Essentially, the dlm was trying to create a configfs entry for a node with no address. This lead to a NULL pointer being dereferenced and the dlm crashing. The above mentioned patch now checks for a valid address before continuing.
Sorry, set the wrong status.
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping