Bug 496130

Summary:

node mysteriously evicted by qdisk

Product:

Red Hat Enterprise Linux 5

Reporter:

Corey Marthaler <cmarthal>

Component:

cman

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED WORKSFORME

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

high

Version:

5.3

CC:

cluster-maint, edamato, nstraz, samuel.kielek, smayhew, tao

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-10-27 20:41:50 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

500450

Bug Blocks:

499522

Attachments:

Description	Flags
log from taft-01	none
log from taft-02	none
log from taft-03	none
log from taft-04	none
new log from taft-01	none
new log from taft-02	none
new log from taft-03	none
new log from taft-04	none
iostat numbers	none

Description Corey Marthaler 2009-04-16 19:24:52 UTC

Description of problem:
I was running different cmirror operations with the test cmirror_lock_stress last night and had one of my nodes (taft-03) killed by qdisk (on taft-01).

Apr 15 14:28:38 taft-01 qdiskd[7551]: <notice> Writing eviction notice for node 3
Apr 15 14:28:38 taft-01 kernel: dlm: invalid h_nodeid 0 from 3 lockspace 10002
Apr 15 14:28:39 taft-01 qdiskd[7551]: <notice> Node 3 evicted


Apr 15 14:27:56 taft-03 openais[7443]: [TOTEM] Retransmit List: 285b43
Apr 15 14:28:11 taft-03 openais[7443]: [TOTEM] Retransmit List: 2863f7
Apr 15 14:28:36 taft-03 openais[7443]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
Apr 15 14:28:36 taft-03 dlm_controld[7476]: cluster is down, exiting
Apr 15 14:28:36 taft-03 clogd[7563]: cpg_dispatch failed: SA_AIS_ERR_LIBRARY

There were RX-ERRs on the net, but no more than normal or any of the other clusters on different nets of mine, and I don't remember seeing this exact issue before.

[root@taft-02 ~]# netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0      1500   0 244675073  41362      0      0 84496567      0      0      0 BMmRU
eth0       1500   0  1179872      0      0      0   469495      0      0      0 BMRU
eth2       1500   0 34491855     74      0      0 10562068      0      0      0 BMsRU
eth3       1500   0 30327558  16317      0      0 10562072      0      0      0 BMsRU
eth4       1500   0 32549337   8808      0      0 10562069      0      0      0 BMsRU
eth5       1500   0 37722207  12981      0      0 10562072      0      0      0 BMsRU
eth6       1500   0 26951635     59      0      0 10562074      0      0      0 BMsRU
eth7       1500   0 26646002    200      0      0 10562071      0      0      0 BMsRU
eth8       1500   0 26921660    582      0      0 10562070      0      0      0 BMsRU
eth9       1500   0 29064819   2341      0      0 10562071      0      0      0 BMsRU
lo        16436   0      105      0      0      0      105      0      0      0 LRU

[root@taft-01 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-04-14 11:19:28  /dev/disk/by-id/scsi-3600805f3000a05b0000000008e75000c-part1
   1   M  37144   2009-04-14 11:17:59  taft-01-bond
   2   M  37148   2009-04-14 11:18:38  taft-02-bond
   3   X  37152                        taft-03-bond
   4   M  37148   2009-04-14 11:18:38  taft-04-bond

I'll post the logs from the 4 nodes in this cluster.

Version-Release number of selected component (if applicable):
2.6.18-128.el5
cman-2.0.99-1.el5
openais-0.80.3-22.el5

Comment 1 Corey Marthaler 2009-04-16 19:32:58 UTC

Created attachment 339905 [details]
log from taft-01

Comment 2 Corey Marthaler 2009-04-16 19:33:28 UTC

Created attachment 339906 [details]
log from taft-02

Comment 3 Corey Marthaler 2009-04-16 19:33:59 UTC

Created attachment 339907 [details]
log from taft-03

Comment 4 Corey Marthaler 2009-04-16 19:34:38 UTC

Created attachment 339908 [details]
log from taft-04

Comment 5 Corey Marthaler 2009-04-17 16:18:28 UTC

FYI - I hit an issue that appears very similar to this one today while running the test revolver. It may have been a timing issue where one of the three nodes came back up faster than the one node left had sorted though the other leaving the cluster.

I'll post the logs for this issue..

Scenario iteration 0.2 started at Fri Apr 17 10:56:49 CDT 2009
Sleeping 1 minute(s) to let the I/O get its lock count up...
Senario: DLM kill Quorum plus one

Those picked to face the revolver... taft-03-bond taft-02-bond taft-01-bond
Feeling lucky taft-03-bond? Well do ya? Go'head make my day...
Feeling lucky taft-02-bond? Well do ya? Go'head make my day...
Feeling lucky taft-01-bond? Well do ya? Go'head make my day...

Verifying nodes were removed from cluster
Verified taft-01-bond was removed on taft-04-bond
Verified taft-02-bond was removed on taft-04-bond
Verified taft-03-bond was removed on taft-04-bond
Verifying that the dueler(s) are alive
still not all alive, sleeping another 10 seconds
still not all alive, sleeping another 10 seconds
still not all alive, sleeping another 10 seconds
still not all alive, sleeping another 10 seconds
still not all alive, sleeping another 10 seconds
still not all alive, sleeping another 10 seconds
still not all alive, sleeping another 10 seconds
All killed nodes are back up (able to be pinged), making sure they're qarshable...
still not all qarshable, sleeping another 10 seconds
All killed nodes are now qarshable

Verifying that recovery properly took place (on the nodes that stayed in the cluster)
checking that all of the cluster nodes are now/still cman members...
checking fence recovery (state of each service)...
checking dlm recovery (state of each service)...
checking gfs recovery (state of each service)...
checking gfs2 recovery (state of each service)...
checking fence recovery (node membership of each service)...
checking dlm recovery (node membership of each service)...
checking gfs recovery (node membership of each service)...
checking gfs2 recovery (node membership of each service)...

Verifying that clvmd was started properly on the dueler(s)
clvmd is not running on taft-01-bond

Comment 6 Corey Marthaler 2009-04-17 16:26:05 UTC

Created attachment 340037 [details]
new log from taft-01

Comment 7 Corey Marthaler 2009-04-17 16:26:32 UTC

Created attachment 340038 [details]
new log from taft-02

Comment 8 Corey Marthaler 2009-04-17 16:27:01 UTC

Created attachment 340040 [details]
new log from taft-03

Comment 9 Corey Marthaler 2009-04-17 16:27:31 UTC

Created attachment 340041 [details]
new log from taft-04

Comment 10 Lon Hohberger 2009-04-29 16:54:36 UTC

Nate and I were chasing this on RHEL4 too -- using the deadline scheduler helped, but did not entirely resolve the issue.

As a start, switch the i/o scheduler to the deadline scheduler; then we can tune to make the cluster more flexible.

I also have another fix for the 'undead' messages if you would like.

Comment 12 Lon Hohberger 2009-04-29 16:57:22 UTC

It's important to note that Nate's cluster also has an MSA1000

Comment 13 Lon Hohberger 2009-05-04 20:26:44 UTC

Ok, another person using a very different sort of array (EMC² Symmetrix) reported a similar problem on RHEL 5.3.  At the time time of eviction, I/O to the same array (though a different LUN) had very strange iostat numbers.  For example:

avgqu-sz - 30780484020872.61  (that's not a typo)
await    - 5.03
svctm    - 74906.50

I think we need to cross-reference this data on one or both of the MSAs in use and see if we can reproduce this.

Comment 14 Lon Hohberger 2009-05-04 20:27:52 UTC

Created attachment 342375 [details]
iostat numbers

Comment 15 Lon Hohberger 2009-05-05 16:20:49 UTC

See: 

https://bugzilla.redhat.com/show_bug.cgi?id=490147#c9
https://bugzilla.redhat.com/show_bug.cgi?id=490147#c10
https://bugzilla.redhat.com/show_bug.cgi?id=490147#c11

Comment 18 Lon Hohberger 2009-09-22 21:03:56 UTC

*** Bug 514627 has been marked as a duplicate of this bug. ***