Bug 459585 - dlm_recoverd in D state when using IPv6 to comunicate between nodes
Summary: dlm_recoverd in D state when using IPv6 to comunicate between nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-08-20 12:53 UTC by Fabio Massimo Di Nitto
Modified: 2009-01-20 20:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 20:19:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
debug info from rhel5-1 node (2.35 KB, text/plain)
2008-08-21 07:27 UTC, Fabio Massimo Di Nitto
no flags Details
debug info from rhel5-2 node (2.17 KB, text/plain)
2008-08-21 07:28 UTC, Fabio Massimo Di Nitto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Fabio Massimo Di Nitto 2008-08-20 12:53:28 UTC
Version-Release number of selected component (if applicable):

kernel 2.6.18-92.1.10.el5

How reproducible:

always

Steps to Reproduce:
1. setup nodes to talk in IPv6
2. start cman
3. try to start rgmanager (with a configured service) on both nodes or try to perform a gfs/gfs2 mount operation on both nodes (it does not need to be at the same time).

Actual results:

dlm_recoverd is in D state. The invoking process cannot be killed.

Additional info:

I didn't have an option to verify if this problem is trictly related to the kernel side or userland side of dlm.

Comment 1 David Teigland 2008-08-20 14:44:40 UTC
Let's reduce the steps above to just the following:
0. setup ipv6
1. make sure <dlm log_debug="1"/> is in cluster.conf
2. ccsd
3. cman_tool join
4. cman_tool nodes -a
5. fenced
6. fence_tool join
7. dlm_controld -D  (doesn't fork)
8. dlm_tool join test

Initially, I'm guessing that the problem is in dlm_controld, and we'll
want to look at the function add_configfs_node() under "set the address".

Comment 2 Fabio Massimo Di Nitto 2008-08-21 07:27:06 UTC
I was able to reproduce it with this reduced test case.

0. done
1. done
1b. mounted configfs and modprobed kernel modules from a clean boot.
2. done
3. done
4a. cman_tool status to verify ipv6 connectivity.
4b. cman_tool nodes -a done
4c. groupd: done
5. done
6. done
7. done
8. executed first on rhel5-1 and then on rhel5-2

output from:

4a, 4b, fence_tool dump and 7 from both node in attachment.

when running 8. on second node, dlm_recoverd is in D state.

Comment 3 Fabio Massimo Di Nitto 2008-08-21 07:27:47 UTC
Created attachment 314688 [details]
debug info from rhel5-1 node

Comment 4 Fabio Massimo Di Nitto 2008-08-21 07:28:14 UTC
Created attachment 314689 [details]
debug info from rhel5-2 node

Comment 5 David Teigland 2008-08-21 19:57:37 UTC
Those logs all look fine.  Does ps ax -o pid,stat,cmd,wchan show
dlm_controld blocked on anything specific?  Does anything appear in
/var/log/messages, especially dlm messages, esp "connecting to" messages?
I wonder if the call to bind() in Lon's recent patch could have anything
to do with it?

Comment 6 Fabio Massimo Di Nitto 2008-08-22 09:04:28 UTC
ahh good catch:

repeating the exact same steps as above, /var/log/messages:

From node1:
Aug 22 10:59:33 rhel5-1 kernel: dlm: Using TCP for communications
Aug 22 10:59:39 rhel5-1 kernel: dlm: connecting to 2
Aug 22 10:59:39 rhel5-1 kernel: dlm: connect from non cluster node

From node2:
Aug 22 11:02:31 rhel5-2 kernel: dlm: Using TCP for communications
Aug 22 11:02:31 rhel5-2 kernel: dlm: connecting to 1
Aug 22 11:02:31 rhel5-2 kernel: dlm: connect from non cluster node

Comment 7 David Teigland 2008-08-28 17:39:32 UTC
patch for this seems to work, queueing it for upstream 2.6.28

Comment 8 RHEL Program Management 2008-08-28 17:48:29 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 David Teigland 2008-09-03 14:41:46 UTC
posted to rhkernel

Date: Wed, 3 Sep 2008 09:02:21 -0500
From: David Teigland <teigland>
To: rhkernel-list
Subject: [RHEL5.3 PATCH] dlm: fix address compare
Message-ID: <20080903140221.GD22775>

Comment 10 Don Zickus 2008-09-11 19:44:09 UTC
in kernel-2.6.18-111.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 14 errata-xmlrpc 2009-01-20 20:19:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.