438832 – qdisk can't get write out during i/o load with network device

Bug 438832 - qdisk can't get write out during i/o load with network device

Summary: qdisk can't get write out during i/o load with network device

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cman
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-03-25 14:42 UTC by Corey Marthaler
Modified:	2009-04-16 22:38 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-04-07 13:57:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Corey Marthaler 2008-03-25 14:42:54 UTC

Description of problem:
It appears the qdisk doesn't work with aoe devices (and possibly other network
devices). As soon as any i/o starts to the non qdisk devices, (even a simple dd
or mkfs), the node doing the io will miss it's updates, get evicted from the
cluster, and then end up fenced every time. 

I have had no issues as of yet with qdisk on my scsi storage clusters.

NODE DOING I/O:
Mar 24 16:50:16 hayes-03 openais[3288]: [CMAN ] quorum device registered
Mar 24 16:50:16 hayes-03 qdiskd[3219]: <notice> Score sufficient for master
operation (1/1; required=1); upgrading
Mar 24 16:50:22 hayes-03 qdiskd[3219]: <info> Node 1 is the master
Mar 24 16:51:03 hayes-03 kernel: dlm: Using TCP for communications
Mar 24 16:51:03 hayes-03 openais[3288]: [cpg.c:1018] lib_init_fn:
conn=0x2579290, pi=0x2573010
Mar 24 16:51:03 hayes-03 openais[3288]: [cpg.c:1180] got trackstart request on
0x2579290
Mar 24 16:51:03 hayes-03 openais[3288]: [cpg.c:1031] got join request on
0x2579290, pi=0x2573010, pi->pid=0
Mar 24 16:51:03 hayes-03 openais[3288]: [cpg.c:0816] got procjoin message from
cluster node 3
Mar 24 16:51:03 hayes-03 openais[3288]: [cpg.c:0393] Sending new joinlist (1
elements) to clients
Mar 24 16:51:03 hayes-03 openais[3288]: [cpg.c:1114] got mcast request on 0x2586980
Mar 24 16:51:04 hayes-03 openais[3288]: [cpg.c:0816] got procjoin message from
cluster node 1
Mar 24 16:51:04 hayes-03 openais[3288]: [cpg.c:0393] Sending new joinlist (2
elements) to clients
Mar 24 16:51:04 hayes-03 openais[3288]: [cpg.c:1114] got mcast request on 0x2586980
Mar 24 16:51:04 hayes-03 openais[3288]: [cpg.c:1114] got mcast request on 0x2586980
Mar 24 16:51:04 hayes-03 kernel: dlm: connecting to 1
Mar 24 16:51:04 hayes-03 kernel: dlm: got connection from 1
Mar 24 16:51:04 hayes-03 clvmd: Cluster LVM daemon started - connected to CMAN
Mar 24 16:51:05 hayes-03 openais[3288]: [cpg.c:0816] got procjoin message from
cluster node 2
Mar 24 16:51:05 hayes-03 openais[3288]: [cpg.c:0393] Sending new joinlist (3
elements) to clients
Mar 24 16:51:05 hayes-03 openais[3288]: [cpg.c:1114] got mcast request on 0x2586980
Mar 24 16:51:05 hayes-03 openais[3288]: [cpg.c:1114] got mcast request on 0x2586980
Mar 24 16:51:05 hayes-03 kernel: dlm: connecting to 2
Mar 24 16:51:44 hayes-03 openais[3288]: [CMAN ] lost contact with quorum device
Mar 24 16:51:44 hayes-03 kernel: dlm: closing connection to node 0
Mar 24 16:51:45 hayes-03 openais[3288]: [CMAN ] cman killed by node 1 because we
were killed by cman_tool or other application
Mar 24 16:51:45 hayes-03 dlm_controld[3310]: cluster is down, exiting
Mar 24 16:51:45 hayes-03 gfs_controld[3316]: cluster is down, exiting
Mar 24 16:51:45 hayes-03 fenced[3304]: cluster is down, exiting
Mar 24 16:51:45 hayes-03 kernel: dlm: closing connection to node 3
Mar 24 16:51:45 hayes-03 kernel: dlm: closing connection to node 2
Mar 24 16:51:45 hayes-03 kernel: dlm: closing connection to node 1
Mar 24 16:52:10 hayes-03 ccsd[3278]: Unable to connect to cluster infrastructure
after 30 seconds.
Mar 24 16:52:41 hayes-03 ccsd[3278]: Unable to connect to cluster infrastructure
after 60 seconds.



MASTER:
Mar 24 16:51:07 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (2/10)
Mar 24 16:51:08 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (3/10)
Mar 24 16:51:09 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (4/10)
Mar 24 16:51:10 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (5/10)
Mar 24 16:51:11 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (6/10)
Mar 24 16:51:12 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (7/10)
Mar 24 16:51:13 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (8/10)
Mar 24 16:51:14 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (9/10)
Mar 24 16:51:15 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (10/10)
Mar 24 16:51:16 hayes-01 qdiskd[3867]: <debug> Node 3 missed an update (11/10)
Mar 24 16:51:16 hayes-01 qdiskd[3867]: <notice> Writing eviction notice for node 3
Mar 24 16:51:16 hayes-01 qdiskd[3867]: <debug> Telling CMAN to kill the node
Mar 24 16:51:16 hayes-01 qdiskd[3867]: <debug> Node 3 DOWN
Mar 24 16:51:16 hayes-01 kernel: dlm: invalid h_nodeid 0 from 3 lockspace 10003
Mar 24 16:51:17 hayes-01 qdiskd[3867]: <notice> Node 3 evicted



Version-Release number of selected component (if applicable):
2.6.18-85.el5
openais-0.80.3-14.el5
cman-2.0.80-1.el5

Comment 1 Lon Hohberger 2008-03-25 18:59:17 UTC

Basically, what happens here is that the AoE driver gets saturated and qdiskd
can't get an I/O out in a timely manner.  I didn't see a way to switch the AoE
scheduler to deadline :(

Comment 2 Lon Hohberger 2008-04-07 13:57:03 UTC

Timely access to shared storage is required for qdiskd operations.  Most
block-level devices allow you to change the scheduler; normally in this case, we
would have you change the scheduler for the device to the 'deadline' scheduler
(which is sort of a realtime I/O scheduler).

Unfortunately, however, the AOE driver doesn't seem to have a way to change the
I/O scheduler, so there's nothing that can be done about this bug report at this
time.  You could increase your quorum disk timeouts to a few minutes (instead of
10 seconds), but there's still no way to guarantee the necessary I/Os will get
out in time.

Note You need to log in before you can comment on or make changes to this bug.