Bug 338511 - Clvmd fails to start after a node reboot
Clvmd fails to start after a node reboot
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm-kernel (Show other bugs)
4
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
: ZStream
Depends On:
Blocks: 362641
  Show dependency treegraph
 
Reported: 2007-10-18 13:51 EDT by Tom Tracy
Modified: 2009-04-16 16:01 EDT (History)
10 users (show)

See Also:
Fixed In Version: RHBA-2007-0995
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-21 16:56:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Console of the node rebooting (49.32 KB, image/jpeg)
2007-10-18 13:51 EDT, Tom Tracy
no flags Details
Full dmesg (107.94 KB, text/plain)
2007-10-24 09:52 EDT, Lon Hohberger
no flags Details
cluster.conf (1.46 KB, text/plain)
2007-10-24 16:07 EDT, Lon Hohberger
no flags Details
Fix, pass 1 (905 bytes, patch)
2007-10-25 17:44 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Tom Tracy 2007-10-18 13:51:24 EDT
Description of problem:  Boot 4 node cluster. Clvmd starts and all volumes are
mounted. Then reboot any of the nodes, the node exits the cluster nornally. On
the node startup, clvmd fails to start and the system hangs there. 
        The only way for the rebooted system to start clvmd successfully is to
reboot the whole cluster

Version-Release number of selected component (if applicable):

lvm2-cluster-debuginfo-2.02.21-7.el4
lvm2-cluster-2.02.21-7.el4
lvm2-2.02.21-5.el4

How reproducible: Every Time


Steps to Reproduce:
1. First boot up cluster (4 node) everything starts nornally. No problems
2. Reboot node, there is no  I/O / testing happening
3. Cluster (ccs,cman,fence,qdisk) all start
4. Clvmd fails to start and hangs. Node never comes up  

Actual results: All other nodes report the node to be rebooted. Then does not
report the node joining the cluster

Oct 18 12:41:05 et-virt08 kernel: CMAN: node et-virt10.lab.boston.redhat.com has
been removed from the cluster : Missed too many heartbeats
Oct 18 12:41:06 et-virt08 fenced[6664]: et-virt10.lab.boston.redhat.com not a
cluster member after 0 sec post_fail_delay
Oct 18 12:41:06 et-virt08 fenced[6664]: fencing node
"et-virt10.lab.boston.redhat.com"
Oct 18 12:41:15 et-virt08 fenced[6664]: fence "et-virt10.lab.boston.redhat.com"
success



Expected results: Clvmd should start and activate volumes so GFS can mount the
volumes.


Additional info:
Comment 1 Tom Tracy 2007-10-18 13:51:24 EDT
Created attachment 231321 [details]
Console of the node rebooting
Comment 3 Corey Marthaler 2007-10-23 19:03:09 EDT
Lon and myself took a look at this cluster and the first thing we saw was that
clvmd was taking up 80Mb on two nodes. We then rebooted the whole cluster and
noticed that the root drive on et-virt09 appears to be bad. We started clvmd on
all nodes by hand and rebooted et-virt10 afterwards. When it was back up, we
reproduced this bug when attempting to start clvmd on et-virt10. 
We then rebooted all again, turned on all debugging in the lvm.conf file, and
once again attempted to cause this bug, but unfortunately it didn't happen.
Comment 4 Lon Hohberger 2007-10-24 09:44:58 EDT
lvs is stuck talking to clvmd:

(gdb) bt
#0  0x00000036c980b162 in __read_nocancel () from /lib64/tls/libpthread.so.0
#1  0x000000000044efd3 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags"
is not available.
)
    at locking/cluster_locking.c:115
#2  0x000000000044f2c6 in _lock_resource (cmd=Variable "cmd" is not available.
)
    at locking/cluster_locking.c:410
#3  0x000000000043b646 in _lock_vol (cmd=0x684240, 
    resource=0x7fbfffb450 "oradbnas", flags=33) at locking/locking.c:237
#4  0x000000000043b859 in lock_vol (cmd=0x684240, vol=0x69eeb0 "oradbnas", 
    flags=33) at locking/locking.c:270
#5  0x000000000041e87f in process_each_lv (cmd=0x684240, argc=Variable "argc" is
not available.
)
    at toollib.c:324
#6  0x000000000041cd7a in _report (cmd=0x684240, argc=0, argv=0x7fbffff970, 
    report_type=Variable "report_type" is not available.
) at reporter.c:329
#7  0x0000000000414710 in lvm_run_command (cmd=0x684240, argc=0, 
    argv=0x7fbffff970) at lvmcmdline.c:935
#8  0x0000000000415492 in lvm2_main (argc=1, argv=0x7fbffff968,
is_static=Variable "is_static" is not available.
)
    at lvmcmdline.c:1423
#9  0x00000036c911c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#10 0x000000000040c42a in _start ()
#11 0x0000007fbffff958 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x0000007fbffffb4a in ?? ()
#15 0x0000000000000000 in ?? ()

==================================

clvmd is stuck in kernel-mode in the DLM:

clvmd         D 000001013a47f8d8     0  6276      1                7499 (NOTLB)
000001010c9b1da8 0000000000000002 ffffffffa0290457 000001010000006a 
       0000000000000000 0000000000000001 000001000104da80 000000018013250d
       0000010135e007f0 0000000000001ccb 
Call Trace:<ffffffffa0290457>{:dlm:dlm_recoverd+0}
<ffffffff8030c72d>{wait_for_completion+167} 
       <ffffffff8013416c>{default_wake_function+0}
<ffffffff8013416c>{default_wake_function+0} 
       <ffffffffa026c3f1>{:cman:kcl_join_service+381}
<ffffffffa0287ddb>{:dlm:dlm_new_lockspace+1418}
       <ffffffffa0282878>{:dlm:dlm_write+2060} <ffffffff8017a4b2>{vfs_write+207} 
       <ffffffff8017a59a>{sys_write+69} <ffffffff8011026a>{system_call+126}

... digging deeper.
Comment 5 Lon Hohberger 2007-10-24 09:45:53 EDT
Note on boot, the dlm whines about "connect from non-cluster node".  I haven't
seen this message before, and I wonder if it's related to a class of bugs in
CMAN on RHEL5 where node-0 (i.e. qdisk) is getting treated as a node.  I didn't
think this was existent on RHEL4, but I'll double check.
Comment 6 Lon Hohberger 2007-10-24 09:52:38 EDT
Created attachment 236221 [details]
Full dmesg
Comment 7 Lon Hohberger 2007-10-24 09:52:52 EDT
(including stack traces)
Comment 8 Lon Hohberger 2007-10-24 09:56:47 EDT
It's waiting for the clvmd lockspace to finish joining:

DLM Lock Space:  "clvmd"                             3   3 join      S-6,20,3
[4 1 3]
Comment 9 Lon Hohberger 2007-10-24 11:04:39 EDT
CMAN nodes:

crash> p *(struct cluster_node *)0x1010d799c00
$35 = {
  list = {
    next = 0x1010d799880, 
    prev = 0x1013959e300
  }, 
  name = 0x1013b621280 "et-virt08.lab.boston.redhat.com", 
  addr_list = {
    next = 0x10005f68d00, 
    prev = 0x10005f68d00
  }, 
  us = 0, 
  node_id = 1, 
  state = NODESTATE_MEMBER, 
  last_seq_recv = 1629, 
  last_ackneeded_seq_recv = 603, 
  last_seq_acked = 18, 
  last_seq_sent = 18, 
  votes = 1, 
  expected_votes = 3, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4299878162, 
  join_time = {
    tv_sec = 1193232725, 
    tv_usec = 798085
  }
}
crash> p *(struct cluster_node *)0x1010d799880
$36 = {
  list = {
    next = 0x1010d4c3b00, 
    prev = 0x1010d799c00
  }, 
  name = 0x1013b621520 "et-virt09.lab.boston.redhat.com", 
  addr_list = {
    next = 0x10005f5ed80, 
    prev = 0x10005f5ed80
  }, 
  us = 0, 
  node_id = 2, 
  state = NODESTATE_DEAD, 
  last_seq_recv = 0, 
  last_ackneeded_seq_recv = 0, 
  last_seq_acked = 0, 
  last_seq_sent = 0, 
  votes = 1, 
  expected_votes = 3, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4294749092, 
  join_time = {
    tv_sec = 1193232725, 
    tv_usec = 798082
  }
}
crash> p *(struct cluster_node *)0x1010d4c3b00
$37 = {
  list = {
    next = 0xffffffffa027ce40, 
    prev = 0x1010d799880
  }, 
  name = 0x1013b6214c0 "et-virt10.lab.boston.redhat.com", 
  addr_list = {
    next = 0x10005f68c40, 
    prev = 0x10005f68c40
  }, 
  us = 1, 
  node_id = 3, 
  state = NODESTATE_MEMBER, 
  last_seq_recv = 0, 
  last_ackneeded_seq_recv = 0, 
  last_seq_acked = 0, 
  last_seq_sent = 14, 
  votes = 1, 
  expected_votes = 3, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4294749093, 
  join_time = {
    tv_sec = 1193232723, 
    tv_usec = 110389
  }
}

Quorum device node:

crash> p *(struct cluster_node *)0x000001010cee2a80
$38 = {
  list = {
    next = 0x0, 
    prev = 0x0
  }, 
  name = 0x101341a08e0 "/dev/sdh", 
  addr_list = {
    next = 0x1010cee2a98, 
    prev = 0x1010cee2a98
  }, 
  us = 0, 
  node_id = 0, 
  state = NODESTATE_MEMBER, 
  last_seq_recv = 0, 
  last_ackneeded_seq_recv = 0, 
  last_seq_acked = 0, 
  last_seq_sent = 0, 
  votes = 3, 
  expected_votes = 0, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4300014856, 
  join_time = {
    tv_sec = 0, 
    tv_usec = 0
  }
}

Comment 11 Lon Hohberger 2007-10-24 13:35:45 EDT
crash> set 6280
    PID: 6280
COMMAND: "dlm_recoverd"
   TASK: 1010c9b67f0  [THREAD_INFO: 1010ca08000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE 
crash> bt
PID: 6280   TASK: 1010c9b67f0       CPU: 0   COMMAND: "dlm_recoverd"
 #0 [1010ca09c28] schedule at ffffffff8030c4e9
 #1 [1010ca09d00] dlm_wait_function at ffffffffa028f2f1
 #2 [1010ca09d60] __wake_up at ffffffff80134213
 #3 [1010ca09dc0] rcom_send_message at ffffffffa028ef3e
 #4 [1010ca09e00] dlm_wait_status_low at ffffffffa028f441
 #5 [1010ca09e50] nodes_reconfig_wait at ffffffffa028ac98
 #6 [1010ca09e70] ls_nodes_init at ffffffffa028b0cc
 #7 [1010ca09eb0] dlm_recoverd at ffffffffa0291179
 #8 [1010ca09f20] kthread at ffffffff8014b907
 #9 [1010ca09f50] kernel_thread at ffffffff80110f47
Comment 12 Lon Hohberger 2007-10-24 16:07:12 EDT
Created attachment 236581 [details]
cluster.conf
Comment 13 Lon Hohberger 2007-10-25 16:31:05 EDT
=== Output from et-virt08 ===
dlm: connection from 344fa8c0
dlm: connect from non cluster node
dlm: connection from 344fa8c0
dlm: connect from non cluster node
dlm: connection from 364fa8c0
dlm: connect from non cluster node
dlm: connection from 354fa8c0
dlm: connect from non cluster node

344fa8c0 = 192.168.79.52
354fa8c0 = 192.168.79.53
364fa8c0 = 192.168.79.54

The other nodes in the cluster are routing packets out from the wrong IP
addresses.  The DLM needs to source-route the packets from the correct IPs, or
we need to ensure all packets get routed out the right IP address.
Comment 14 Lon Hohberger 2007-10-25 16:32:04 EDT
On the test cluster, everything works with a clean boot - this is because Oracle
isn't running yet.  Once Oracle starts up, it adds several VIPs to the
interfaces the cluster is using to communicate.

This causes routing problems.
Comment 15 Lon Hohberger 2007-10-25 16:33:47 EDT
When we reboot the machine, the nodes are supposed to be sourcing the packets
from their node IP address, but this isn't working correctly.  This in turn
means the packets are being sourced from another IP address on the interface,
which isn't necessarily the node IP address.  This causes the DLM to reject the
connections, causing the hang.

Comment 16 Lon Hohberger 2007-10-25 17:44:15 EDT
Created attachment 238121 [details]
Fix, pass 1

Source-routes the connect from the local_addr that we got from CMAN rather than
letting the kernel decide where to source the packets from.

This eliminates the need to work around the problem using routing or iptables.
Comment 17 Lon Hohberger 2007-10-25 17:59:25 EDT
A similar patch could be made for upstream too.
Comment 18 Lon Hohberger 2007-10-26 10:14:43 EDT
Patch for upstream submitted to cluster-devel.  Only works for TCP; sctp may or
may not need a similar patch.

https://www.redhat.com/archives/cluster-devel/2007-October/msg00220.html
Comment 19 Lon Hohberger 2007-10-26 10:47:13 EDT
You can tinker around this using IPtables too.  On *all* nodes, do something like:

iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_1_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_1_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
...
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_N_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>

Ex:

[root@et-virt09 ~]# iptables -t nat -A POSTROUTING -s 192.168.76.0/22 \
                   -m tcp -p tcp -d 192.168.79.125 \
                   -j SNAT --to-source 192.168.79.94
Comment 20 Lon Hohberger 2007-10-26 16:42:24 EDT
Note that this only seems to happen if multiple "non-secondary" addresses are
showing up in 'ip addr list'.
Comment 23 Lon Hohberger 2007-10-30 13:42:35 EDT
Patch in CVS.
Comment 24 Lon Hohberger 2007-10-30 13:44:34 EDT
Patch in RHEL4 + RHEL46 branches.
Comment 30 Lon Hohberger 2007-11-09 16:24:30 EST
Updated workaround:

You can tinker around this using IPtables too.  On *all* nodes, do something like:

iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_1_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_2_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
...
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_N_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>

Ex:

[root@et-virt09 ~]# iptables -t nat -A POSTROUTING -s 192.168.76.0/22 \
                   -m tcp -p tcp -d 192.168.79.125 \
                   -j SNAT --to-source 192.168.79.94
Comment 32 errata-xmlrpc 2007-11-21 16:56:01 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0995.html

Note You need to log in before you can comment on or make changes to this bug.