Bug 223462 - ioctl() for kill_node takes almost 6 seconds in some cases
Summary: ioctl() for kill_node takes almost 6 seconds in some cases
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-01-19 15:49 UTC by Lon Hohberger
Modified: 2009-04-16 20:01 UTC (History)
1 user (show)

Fixed In Version: RHBA-2007:0135
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-16 15:54:21 UTC
Embargoed:


Attachments (Terms of Use)

Description Lon Hohberger 2007-01-19 15:49:18 UTC
Description of problem:

Qdiskd allows node counts to drop below half.  During this time, qdiskd may need
to call the cman_kill_node API, which can take up to 6 seconds to complete. 
This causes problems with qdiskd - which expectes somewhat realtime behavior.  A
6 second interruption is a bit much:

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
send(4, "<167>Jan 19 09:35:55 qdiskd[3916"..., 73, MSG_NOSIGNAL) = 73 <0.000009>

vvvvvvvvvvvvv
ioctl(3, 0x4004780c, 0x3)               = 0 <5.997827>
^^^^^^^^^^^^^

gettid()                                = 3916 <0.000007>
time(NULL)                              = 1169170561 <0.000007>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000010>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
send(4, "<167>Jan 19 09:36:01 qdiskd[3916"..., 78, MSG_NOSIGNAL) = 78 <0.000011>
time(NULL)                              = 1169170561 <0.000006>
gettimeofday({1169170561, 438972}, NULL) = 0 <0.000006>
time(NULL)                              = 1169170561 <0.000007>
lseek(5, 16384, SEEK_SET)               = 16384 <0.000007>
write(5, "\36\273\336\0\352\232p b\371W\224P\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
<0.000327>
gettimeofday({1169170561, 439470}, NULL) = 0 <0.000007>
gettid()                                = 3916 <0.000007>
time(NULL)                              = 1169170561 <0.000007>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000010>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
send(4, "<167>Jan 19 09:36:01 qdiskd[3916"..., 73, MSG_NOSIGNAL) = 73 <0.000010>
ioctl(3, 0x4004780c, 0x4)               = 0 <5.99724/0>
select(4, [3], NULL, NULL, {0, 0})      = 0 (Timeout) <0.000012>


Version-Release number of selected component (if applicable): cman-1.0.11-0,
cman-kernel

How reproducible:  Sometimes.


Steps to Reproduce:
1. Start 9 nodes
2. Start qdisk
3. Kill 5 nodes at the same time
  
Actual results: Remaning nodes go crazy and reboot because the qdisk master
takes 12+ seconds to complete, with a maximum of 10 seconds specified in
cluster.conf.

Expected results: Immediate return or near-immediate return.

Comment 1 Lon Hohberger 2007-01-19 15:51:32 UTC
Note, in the above strace, it's taking 6 seconds per node to complete the
cman_kill_node() function.

Comment 2 Christine Caulfield 2007-01-19 16:10:26 UTC
What's happening is that cman is busy trying to make sense of a cluster that's
just had 4 nodes (deliberately) killed on it and so it sending messages for
state transitions. The way this works is that cman will do an (internal) state
transition for each node that leaves.

Now, the way a dead node gets detected during a state transition is that it
doesn't respond to messages, hence the similar message in syslog, and that
timeout is set to 5 (messages, one per second).

Because cman is not a windowed protocol each message sent has to wait until the
previous messages have been acknowledged or ignored. As mentioned above it takes
5 seconds for the node to ignore a message, so the sending process (qdiskd) is
waiting 5 seconds for each node to timeout before it can send its message.

Comment 3 Christine Caulfield 2007-01-19 16:21:53 UTC
Fix by forcing all kill messages that need ACKS (basically those that come from
userland) to queue rather than wait.

Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/membership.c,v  <--  membership.c
new revision: 1.44.2.26; previous revision: 1.44.2.25
done



Note You need to log in before you can comment on or make changes to this bug.