Description of problem: I was running different cmirror operations with the test cmirror_lock_stress last night and had one of my nodes (taft-03) killed by qdisk (on taft-01). Apr 15 14:28:38 taft-01 qdiskd[7551]: <notice> Writing eviction notice for node 3 Apr 15 14:28:38 taft-01 kernel: dlm: invalid h_nodeid 0 from 3 lockspace 10002 Apr 15 14:28:39 taft-01 qdiskd[7551]: <notice> Node 3 evicted Apr 15 14:27:56 taft-03 openais[7443]: [TOTEM] Retransmit List: 285b43 Apr 15 14:28:11 taft-03 openais[7443]: [TOTEM] Retransmit List: 2863f7 Apr 15 14:28:36 taft-03 openais[7443]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application Apr 15 14:28:36 taft-03 dlm_controld[7476]: cluster is down, exiting Apr 15 14:28:36 taft-03 clogd[7563]: cpg_dispatch failed: SA_AIS_ERR_LIBRARY There were RX-ERRs on the net, but no more than normal or any of the other clusters on different nets of mine, and I don't remember seeing this exact issue before. [root@taft-02 ~]# netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg bond0 1500 0 244675073 41362 0 0 84496567 0 0 0 BMmRU eth0 1500 0 1179872 0 0 0 469495 0 0 0 BMRU eth2 1500 0 34491855 74 0 0 10562068 0 0 0 BMsRU eth3 1500 0 30327558 16317 0 0 10562072 0 0 0 BMsRU eth4 1500 0 32549337 8808 0 0 10562069 0 0 0 BMsRU eth5 1500 0 37722207 12981 0 0 10562072 0 0 0 BMsRU eth6 1500 0 26951635 59 0 0 10562074 0 0 0 BMsRU eth7 1500 0 26646002 200 0 0 10562071 0 0 0 BMsRU eth8 1500 0 26921660 582 0 0 10562070 0 0 0 BMsRU eth9 1500 0 29064819 2341 0 0 10562071 0 0 0 BMsRU lo 16436 0 105 0 0 0 105 0 0 0 LRU [root@taft-01 ~]# cman_tool nodes Node Sts Inc Joined Name 0 M 0 2009-04-14 11:19:28 /dev/disk/by-id/scsi-3600805f3000a05b0000000008e75000c-part1 1 M 37144 2009-04-14 11:17:59 taft-01-bond 2 M 37148 2009-04-14 11:18:38 taft-02-bond 3 X 37152 taft-03-bond 4 M 37148 2009-04-14 11:18:38 taft-04-bond I'll post the logs from the 4 nodes in this cluster. Version-Release number of selected component (if applicable): 2.6.18-128.el5 cman-2.0.99-1.el5 openais-0.80.3-22.el5
Created attachment 339905 [details] log from taft-01
Created attachment 339906 [details] log from taft-02
Created attachment 339907 [details] log from taft-03
Created attachment 339908 [details] log from taft-04
FYI - I hit an issue that appears very similar to this one today while running the test revolver. It may have been a timing issue where one of the three nodes came back up faster than the one node left had sorted though the other leaving the cluster. I'll post the logs for this issue.. Scenario iteration 0.2 started at Fri Apr 17 10:56:49 CDT 2009 Sleeping 1 minute(s) to let the I/O get its lock count up... Senario: DLM kill Quorum plus one Those picked to face the revolver... taft-03-bond taft-02-bond taft-01-bond Feeling lucky taft-03-bond? Well do ya? Go'head make my day... Feeling lucky taft-02-bond? Well do ya? Go'head make my day... Feeling lucky taft-01-bond? Well do ya? Go'head make my day... Verifying nodes were removed from cluster Verified taft-01-bond was removed on taft-04-bond Verified taft-02-bond was removed on taft-04-bond Verified taft-03-bond was removed on taft-04-bond Verifying that the dueler(s) are alive still not all alive, sleeping another 10 seconds still not all alive, sleeping another 10 seconds still not all alive, sleeping another 10 seconds still not all alive, sleeping another 10 seconds still not all alive, sleeping another 10 seconds still not all alive, sleeping another 10 seconds still not all alive, sleeping another 10 seconds All killed nodes are back up (able to be pinged), making sure they're qarshable... still not all qarshable, sleeping another 10 seconds All killed nodes are now qarshable Verifying that recovery properly took place (on the nodes that stayed in the cluster) checking that all of the cluster nodes are now/still cman members... checking fence recovery (state of each service)... checking dlm recovery (state of each service)... checking gfs recovery (state of each service)... checking gfs2 recovery (state of each service)... checking fence recovery (node membership of each service)... checking dlm recovery (node membership of each service)... checking gfs recovery (node membership of each service)... checking gfs2 recovery (node membership of each service)... Verifying that clvmd was started properly on the dueler(s) clvmd is not running on taft-01-bond
Created attachment 340037 [details] new log from taft-01
Created attachment 340038 [details] new log from taft-02
Created attachment 340040 [details] new log from taft-03
Created attachment 340041 [details] new log from taft-04
Nate and I were chasing this on RHEL4 too -- using the deadline scheduler helped, but did not entirely resolve the issue. As a start, switch the i/o scheduler to the deadline scheduler; then we can tune to make the cluster more flexible. I also have another fix for the 'undead' messages if you would like.
It's important to note that Nate's cluster also has an MSA1000
Ok, another person using a very different sort of array (EMC² Symmetrix) reported a similar problem on RHEL 5.3. At the time time of eviction, I/O to the same array (though a different LUN) had very strange iostat numbers. For example: avgqu-sz - 30780484020872.61 (that's not a typo) await - 5.03 svctm - 74906.50 I think we need to cross-reference this data on one or both of the MSAs in use and see if we can reproduce this.
Created attachment 342375 [details] iostat numbers
See: https://bugzilla.redhat.com/show_bug.cgi?id=490147#c9 https://bugzilla.redhat.com/show_bug.cgi?id=490147#c10 https://bugzilla.redhat.com/show_bug.cgi?id=490147#c11
*** Bug 514627 has been marked as a duplicate of this bug. ***