Bug 459585 - dlm_recoverd in D state when using IPv6 to comunicate between nodes
dlm_recoverd in D state when using IPv6 to comunicate between nodes
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity medium
: rc
: ---
Assigned To: David Teigland
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-20 08:53 EDT by Fabio Massimo Di Nitto
Modified: 2009-01-20 15:19 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:19:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
debug info from rhel5-1 node (2.35 KB, text/plain)
2008-08-21 03:27 EDT, Fabio Massimo Di Nitto
no flags Details
debug info from rhel5-2 node (2.17 KB, text/plain)
2008-08-21 03:28 EDT, Fabio Massimo Di Nitto
no flags Details

  None (edit)
Description Fabio Massimo Di Nitto 2008-08-20 08:53:28 EDT
Version-Release number of selected component (if applicable):

kernel 2.6.18-92.1.10.el5

How reproducible:

always

Steps to Reproduce:
1. setup nodes to talk in IPv6
2. start cman
3. try to start rgmanager (with a configured service) on both nodes or try to perform a gfs/gfs2 mount operation on both nodes (it does not need to be at the same time).

Actual results:

dlm_recoverd is in D state. The invoking process cannot be killed.

Additional info:

I didn't have an option to verify if this problem is trictly related to the kernel side or userland side of dlm.
Comment 1 David Teigland 2008-08-20 10:44:40 EDT
Let's reduce the steps above to just the following:
0. setup ipv6
1. make sure <dlm log_debug="1"/> is in cluster.conf
2. ccsd
3. cman_tool join
4. cman_tool nodes -a
5. fenced
6. fence_tool join
7. dlm_controld -D  (doesn't fork)
8. dlm_tool join test

Initially, I'm guessing that the problem is in dlm_controld, and we'll
want to look at the function add_configfs_node() under "set the address".
Comment 2 Fabio Massimo Di Nitto 2008-08-21 03:27:06 EDT
I was able to reproduce it with this reduced test case.

0. done
1. done
1b. mounted configfs and modprobed kernel modules from a clean boot.
2. done
3. done
4a. cman_tool status to verify ipv6 connectivity.
4b. cman_tool nodes -a done
4c. groupd: done
5. done
6. done
7. done
8. executed first on rhel5-1 and then on rhel5-2

output from:

4a, 4b, fence_tool dump and 7 from both node in attachment.

when running 8. on second node, dlm_recoverd is in D state.
Comment 3 Fabio Massimo Di Nitto 2008-08-21 03:27:47 EDT
Created attachment 314688 [details]
debug info from rhel5-1 node
Comment 4 Fabio Massimo Di Nitto 2008-08-21 03:28:14 EDT
Created attachment 314689 [details]
debug info from rhel5-2 node
Comment 5 David Teigland 2008-08-21 15:57:37 EDT
Those logs all look fine.  Does ps ax -o pid,stat,cmd,wchan show
dlm_controld blocked on anything specific?  Does anything appear in
/var/log/messages, especially dlm messages, esp "connecting to" messages?
I wonder if the call to bind() in Lon's recent patch could have anything
to do with it?
Comment 6 Fabio Massimo Di Nitto 2008-08-22 05:04:28 EDT
ahh good catch:

repeating the exact same steps as above, /var/log/messages:

From node1:
Aug 22 10:59:33 rhel5-1 kernel: dlm: Using TCP for communications
Aug 22 10:59:39 rhel5-1 kernel: dlm: connecting to 2
Aug 22 10:59:39 rhel5-1 kernel: dlm: connect from non cluster node

From node2:
Aug 22 11:02:31 rhel5-2 kernel: dlm: Using TCP for communications
Aug 22 11:02:31 rhel5-2 kernel: dlm: connecting to 1
Aug 22 11:02:31 rhel5-2 kernel: dlm: connect from non cluster node
Comment 7 David Teigland 2008-08-28 13:39:32 EDT
patch for this seems to work, queueing it for upstream 2.6.28
Comment 8 RHEL Product and Program Management 2008-08-28 13:48:29 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 9 David Teigland 2008-09-03 10:41:46 EDT
posted to rhkernel

Date: Wed, 3 Sep 2008 09:02:21 -0500
From: David Teigland <teigland@redhat.com>
To: rhkernel-list@redhat.com
Subject: [RHEL5.3 PATCH] dlm: fix address compare
Message-ID: <20080903140221.GD22775@redhat.com>
Comment 10 Don Zickus 2008-09-11 15:44:09 EDT
in kernel-2.6.18-111.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 14 errata-xmlrpc 2009-01-20 15:19:47 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.