Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 338511

Summary:

Clvmd fails to start after a node reboot

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Tom Tracy <ttracy>

Component:

dlm-kernel

Assignee:

Christine Caulfield <ccaulfie>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

agk, ccaulfie, cluster-maint, cmarthal, dwysocha, jbrassow, lhh, mbroz, prockai, rkenna

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0995

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-11-21 21:56:01 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

362641

Attachments:

Description	Flags
Console of the node rebooting	none
Full dmesg	none
cluster.conf	none
Fix, pass 1	none

Description Tom Tracy 2007-10-18 17:51:24 UTC

Description of problem:  Boot 4 node cluster. Clvmd starts and all volumes are
mounted. Then reboot any of the nodes, the node exits the cluster nornally. On
the node startup, clvmd fails to start and the system hangs there. 
        The only way for the rebooted system to start clvmd successfully is to
reboot the whole cluster

Version-Release number of selected component (if applicable):

lvm2-cluster-debuginfo-2.02.21-7.el4
lvm2-cluster-2.02.21-7.el4
lvm2-2.02.21-5.el4

How reproducible: Every Time


Steps to Reproduce:
1. First boot up cluster (4 node) everything starts nornally. No problems
2. Reboot node, there is no  I/O / testing happening
3. Cluster (ccs,cman,fence,qdisk) all start
4. Clvmd fails to start and hangs. Node never comes up  

Actual results: All other nodes report the node to be rebooted. Then does not
report the node joining the cluster

Oct 18 12:41:05 et-virt08 kernel: CMAN: node et-virt10.lab.boston.redhat.com has
been removed from the cluster : Missed too many heartbeats
Oct 18 12:41:06 et-virt08 fenced[6664]: et-virt10.lab.boston.redhat.com not a
cluster member after 0 sec post_fail_delay
Oct 18 12:41:06 et-virt08 fenced[6664]: fencing node
"et-virt10.lab.boston.redhat.com"
Oct 18 12:41:15 et-virt08 fenced[6664]: fence "et-virt10.lab.boston.redhat.com"
success



Expected results: Clvmd should start and activate volumes so GFS can mount the
volumes.


Additional info:

Comment 1 Tom Tracy 2007-10-18 17:51:24 UTC

Created attachment 231321 [details]
Console of the node rebooting

Comment 3 Corey Marthaler 2007-10-23 23:03:09 UTC

Lon and myself took a look at this cluster and the first thing we saw was that
clvmd was taking up 80Mb on two nodes. We then rebooted the whole cluster and
noticed that the root drive on et-virt09 appears to be bad. We started clvmd on
all nodes by hand and rebooted et-virt10 afterwards. When it was back up, we
reproduced this bug when attempting to start clvmd on et-virt10. 
We then rebooted all again, turned on all debugging in the lvm.conf file, and
once again attempted to cause this bug, but unfortunately it didn't happen.

Comment 4 Lon Hohberger 2007-10-24 13:44:58 UTC

lvs is stuck talking to clvmd:

(gdb) bt
#0  0x00000036c980b162 in __read_nocancel () from /lib64/tls/libpthread.so.0
#1  0x000000000044efd3 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags"
is not available.
)
    at locking/cluster_locking.c:115
#2  0x000000000044f2c6 in _lock_resource (cmd=Variable "cmd" is not available.
)
    at locking/cluster_locking.c:410
#3  0x000000000043b646 in _lock_vol (cmd=0x684240, 
    resource=0x7fbfffb450 "oradbnas", flags=33) at locking/locking.c:237
#4  0x000000000043b859 in lock_vol (cmd=0x684240, vol=0x69eeb0 "oradbnas", 
    flags=33) at locking/locking.c:270
#5  0x000000000041e87f in process_each_lv (cmd=0x684240, argc=Variable "argc" is
not available.
)
    at toollib.c:324
#6  0x000000000041cd7a in _report (cmd=0x684240, argc=0, argv=0x7fbffff970, 
    report_type=Variable "report_type" is not available.
) at reporter.c:329
#7  0x0000000000414710 in lvm_run_command (cmd=0x684240, argc=0, 
    argv=0x7fbffff970) at lvmcmdline.c:935
#8  0x0000000000415492 in lvm2_main (argc=1, argv=0x7fbffff968,
is_static=Variable "is_static" is not available.
)
    at lvmcmdline.c:1423
#9  0x00000036c911c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#10 0x000000000040c42a in _start ()
#11 0x0000007fbffff958 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x0000007fbffffb4a in ?? ()
#15 0x0000000000000000 in ?? ()

==================================

clvmd is stuck in kernel-mode in the DLM:

clvmd         D 000001013a47f8d8     0  6276      1                7499 (NOTLB)
000001010c9b1da8 0000000000000002 ffffffffa0290457 000001010000006a 
       0000000000000000 0000000000000001 000001000104da80 000000018013250d
       0000010135e007f0 0000000000001ccb 
Call Trace:<ffffffffa0290457>{:dlm:dlm_recoverd+0}
<ffffffff8030c72d>{wait_for_completion+167} 
       <ffffffff8013416c>{default_wake_function+0}
<ffffffff8013416c>{default_wake_function+0} 
       <ffffffffa026c3f1>{:cman:kcl_join_service+381}
<ffffffffa0287ddb>{:dlm:dlm_new_lockspace+1418}
       <ffffffffa0282878>{:dlm:dlm_write+2060} <ffffffff8017a4b2>{vfs_write+207} 
       <ffffffff8017a59a>{sys_write+69} <ffffffff8011026a>{system_call+126}

... digging deeper.

Comment 5 Lon Hohberger 2007-10-24 13:45:53 UTC

Note on boot, the dlm whines about "connect from non-cluster node".  I haven't
seen this message before, and I wonder if it's related to a class of bugs in
CMAN on RHEL5 where node-0 (i.e. qdisk) is getting treated as a node.  I didn't
think this was existent on RHEL4, but I'll double check.

Comment 6 Lon Hohberger 2007-10-24 13:52:38 UTC

Created attachment 236221 [details]
Full dmesg

Comment 7 Lon Hohberger 2007-10-24 13:52:52 UTC

(including stack traces)

Comment 8 Lon Hohberger 2007-10-24 13:56:47 UTC

It's waiting for the clvmd lockspace to finish joining:

DLM Lock Space:  "clvmd"                             3   3 join      S-6,20,3
[4 1 3]

Comment 9 Lon Hohberger 2007-10-24 15:04:39 UTC

CMAN nodes:

crash> p *(struct cluster_node *)0x1010d799c00
$35 = {
  list = {
    next = 0x1010d799880, 
    prev = 0x1013959e300
  }, 
  name = 0x1013b621280 "et-virt08.lab.boston.redhat.com", 
  addr_list = {
    next = 0x10005f68d00, 
    prev = 0x10005f68d00
  }, 
  us = 0, 
  node_id = 1, 
  state = NODESTATE_MEMBER, 
  last_seq_recv = 1629, 
  last_ackneeded_seq_recv = 603, 
  last_seq_acked = 18, 
  last_seq_sent = 18, 
  votes = 1, 
  expected_votes = 3, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4299878162, 
  join_time = {
    tv_sec = 1193232725, 
    tv_usec = 798085
  }
}
crash> p *(struct cluster_node *)0x1010d799880
$36 = {
  list = {
    next = 0x1010d4c3b00, 
    prev = 0x1010d799c00
  }, 
  name = 0x1013b621520 "et-virt09.lab.boston.redhat.com", 
  addr_list = {
    next = 0x10005f5ed80, 
    prev = 0x10005f5ed80
  }, 
  us = 0, 
  node_id = 2, 
  state = NODESTATE_DEAD, 
  last_seq_recv = 0, 
  last_ackneeded_seq_recv = 0, 
  last_seq_acked = 0, 
  last_seq_sent = 0, 
  votes = 1, 
  expected_votes = 3, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4294749092, 
  join_time = {
    tv_sec = 1193232725, 
    tv_usec = 798082
  }
}
crash> p *(struct cluster_node *)0x1010d4c3b00
$37 = {
  list = {
    next = 0xffffffffa027ce40, 
    prev = 0x1010d799880
  }, 
  name = 0x1013b6214c0 "et-virt10.lab.boston.redhat.com", 
  addr_list = {
    next = 0x10005f68c40, 
    prev = 0x10005f68c40
  }, 
  us = 1, 
  node_id = 3, 
  state = NODESTATE_MEMBER, 
  last_seq_recv = 0, 
  last_ackneeded_seq_recv = 0, 
  last_seq_acked = 0, 
  last_seq_sent = 14, 
  votes = 1, 
  expected_votes = 3, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4294749093, 
  join_time = {
    tv_sec = 1193232723, 
    tv_usec = 110389
  }
}

Quorum device node:

crash> p *(struct cluster_node *)0x000001010cee2a80
$38 = {
  list = {
    next = 0x0, 
    prev = 0x0
  }, 
  name = 0x101341a08e0 "/dev/sdh", 
  addr_list = {
    next = 0x1010cee2a98, 
    prev = 0x1010cee2a98
  }, 
  us = 0, 
  node_id = 0, 
  state = NODESTATE_MEMBER, 
  last_seq_recv = 0, 
  last_ackneeded_seq_recv = 0, 
  last_seq_acked = 0, 
  last_seq_sent = 0, 
  votes = 3, 
  expected_votes = 0, 
  leave_reason = 0, 
  incarnation = 0, 
  last_hello = 4300014856, 
  join_time = {
    tv_sec = 0, 
    tv_usec = 0
  }
}

Comment 11 Lon Hohberger 2007-10-24 17:35:45 UTC

crash> set 6280
    PID: 6280
COMMAND: "dlm_recoverd"
   TASK: 1010c9b67f0  [THREAD_INFO: 1010ca08000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE 
crash> bt
PID: 6280   TASK: 1010c9b67f0       CPU: 0   COMMAND: "dlm_recoverd"
 #0 [1010ca09c28] schedule at ffffffff8030c4e9
 #1 [1010ca09d00] dlm_wait_function at ffffffffa028f2f1
 #2 [1010ca09d60] __wake_up at ffffffff80134213
 #3 [1010ca09dc0] rcom_send_message at ffffffffa028ef3e
 #4 [1010ca09e00] dlm_wait_status_low at ffffffffa028f441
 #5 [1010ca09e50] nodes_reconfig_wait at ffffffffa028ac98
 #6 [1010ca09e70] ls_nodes_init at ffffffffa028b0cc
 #7 [1010ca09eb0] dlm_recoverd at ffffffffa0291179
 #8 [1010ca09f20] kthread at ffffffff8014b907
 #9 [1010ca09f50] kernel_thread at ffffffff80110f47

Comment 12 Lon Hohberger 2007-10-24 20:07:12 UTC

Created attachment 236581 [details]
cluster.conf

Comment 13 Lon Hohberger 2007-10-25 20:31:05 UTC

=== Output from et-virt08 ===
dlm: connection from 344fa8c0
dlm: connect from non cluster node
dlm: connection from 344fa8c0
dlm: connect from non cluster node
dlm: connection from 364fa8c0
dlm: connect from non cluster node
dlm: connection from 354fa8c0
dlm: connect from non cluster node

344fa8c0 = 192.168.79.52
354fa8c0 = 192.168.79.53
364fa8c0 = 192.168.79.54

The other nodes in the cluster are routing packets out from the wrong IP
addresses.  The DLM needs to source-route the packets from the correct IPs, or
we need to ensure all packets get routed out the right IP address.

Comment 14 Lon Hohberger 2007-10-25 20:32:04 UTC

On the test cluster, everything works with a clean boot - this is because Oracle
isn't running yet.  Once Oracle starts up, it adds several VIPs to the
interfaces the cluster is using to communicate.

This causes routing problems.

Comment 15 Lon Hohberger 2007-10-25 20:33:47 UTC

When we reboot the machine, the nodes are supposed to be sourcing the packets
from their node IP address, but this isn't working correctly.  This in turn
means the packets are being sourced from another IP address on the interface,
which isn't necessarily the node IP address.  This causes the DLM to reject the
connections, causing the hang.

Comment 16 Lon Hohberger 2007-10-25 21:44:15 UTC

Created attachment 238121 [details]
Fix, pass 1

Source-routes the connect from the local_addr that we got from CMAN rather than
letting the kernel decide where to source the packets from.

This eliminates the need to work around the problem using routing or iptables.

Comment 17 Lon Hohberger 2007-10-25 21:59:25 UTC

A similar patch could be made for upstream too.

Comment 18 Lon Hohberger 2007-10-26 14:14:43 UTC

Patch for upstream submitted to cluster-devel.  Only works for TCP; sctp may or
may not need a similar patch.

https://www.redhat.com/archives/cluster-devel/2007-October/msg00220.html

Comment 19 Lon Hohberger 2007-10-26 14:47:13 UTC

You can tinker around this using IPtables too.  On *all* nodes, do something like:

iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_1_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_1_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
...
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_N_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>

Ex:

[root@et-virt09 ~]# iptables -t nat -A POSTROUTING -s 192.168.76.0/22 \
                   -m tcp -p tcp -d 192.168.79.125 \
                   -j SNAT --to-source 192.168.79.94

Comment 20 Lon Hohberger 2007-10-26 20:42:24 UTC

Note that this only seems to happen if multiple "non-secondary" addresses are
showing up in 'ip addr list'.

Comment 23 Lon Hohberger 2007-10-30 17:42:35 UTC

Patch in CVS.

Comment 24 Lon Hohberger 2007-10-30 17:44:34 UTC

Patch in RHEL4 + RHEL46 branches.

Comment 30 Lon Hohberger 2007-11-09 21:24:30 UTC

Updated workaround:

You can tinker around this using IPtables too.  On *all* nodes, do something like:

iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_1_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_2_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>
...
iptables -t nat -A POSTROUTING -s <cluster_subnet/mask> \
         -m tcp -p tcp -d <node_N_ip_addr> \
         -j SNAT --to-source <my_cluster_ip>

Ex:

[root@et-virt09 ~]# iptables -t nat -A POSTROUTING -s 192.168.76.0/22 \
                   -m tcp -p tcp -d 192.168.79.125 \
                   -j SNAT --to-source 192.168.79.94

Comment 32 errata-xmlrpc 2007-11-21 21:56:01 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0995.html