Bug 223462 - ioctl() for kill_node takes almost 6 seconds in some cases
ioctl() for kill_node takes almost 6 seconds in some cases
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-19 10:49 EST by Lon Hohberger
Modified: 2009-04-16 16:01 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007:0135
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-16 11:54:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Lon Hohberger 2007-01-19 10:49:18 EST
Description of problem:

Qdiskd allows node counts to drop below half.  During this time, qdiskd may need
to call the cman_kill_node API, which can take up to 6 seconds to complete. 
This causes problems with qdiskd - which expectes somewhat realtime behavior.  A
6 second interruption is a bit much:

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
send(4, "<167>Jan 19 09:35:55 qdiskd[3916"..., 73, MSG_NOSIGNAL) = 73 <0.000009>

vvvvvvvvvvvvv
ioctl(3, 0x4004780c, 0x3)               = 0 <5.997827>
^^^^^^^^^^^^^

gettid()                                = 3916 <0.000007>
time(NULL)                              = 1169170561 <0.000007>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000010>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
send(4, "<167>Jan 19 09:36:01 qdiskd[3916"..., 78, MSG_NOSIGNAL) = 78 <0.000011>
time(NULL)                              = 1169170561 <0.000006>
gettimeofday({1169170561, 438972}, NULL) = 0 <0.000006>
time(NULL)                              = 1169170561 <0.000007>
lseek(5, 16384, SEEK_SET)               = 16384 <0.000007>
write(5, "\36\273\336\0\352\232p b\371W\224P\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
<0.000327>
gettimeofday({1169170561, 439470}, NULL) = 0 <0.000007>
gettid()                                = 3916 <0.000007>
time(NULL)                              = 1169170561 <0.000007>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000010>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 <0.000009>
send(4, "<167>Jan 19 09:36:01 qdiskd[3916"..., 73, MSG_NOSIGNAL) = 73 <0.000010>
ioctl(3, 0x4004780c, 0x4)               = 0 <5.99724/0>
select(4, [3], NULL, NULL, {0, 0})      = 0 (Timeout) <0.000012>


Version-Release number of selected component (if applicable): cman-1.0.11-0,
cman-kernel

How reproducible:  Sometimes.


Steps to Reproduce:
1. Start 9 nodes
2. Start qdisk
3. Kill 5 nodes at the same time
  
Actual results: Remaning nodes go crazy and reboot because the qdisk master
takes 12+ seconds to complete, with a maximum of 10 seconds specified in
cluster.conf.

Expected results: Immediate return or near-immediate return.
Comment 1 Lon Hohberger 2007-01-19 10:51:32 EST
Note, in the above strace, it's taking 6 seconds per node to complete the
cman_kill_node() function.
Comment 2 Christine Caulfield 2007-01-19 11:10:26 EST
What's happening is that cman is busy trying to make sense of a cluster that's
just had 4 nodes (deliberately) killed on it and so it sending messages for
state transitions. The way this works is that cman will do an (internal) state
transition for each node that leaves.

Now, the way a dead node gets detected during a state transition is that it
doesn't respond to messages, hence the similar message in syslog, and that
timeout is set to 5 (messages, one per second).

Because cman is not a windowed protocol each message sent has to wait until the
previous messages have been acknowledged or ignored. As mentioned above it takes
5 seconds for the node to ignore a message, so the sending process (qdiskd) is
waiting 5 seconds for each node to timeout before it can send its message.
Comment 3 Christine Caulfield 2007-01-19 11:21:53 EST
Fix by forcing all kill messages that need ACKS (basically those that come from
userland) to queue rather than wait.

Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/membership.c,v  <--  membership.c
new revision: 1.44.2.26; previous revision: 1.44.2.25
done

Note You need to log in before you can comment on or make changes to this bug.