Bug 442102 - openais full library queue causes node to be fenced
openais full library queue causes node to be fenced
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais (Show other bugs)
5.2
All Linux
urgent Severity low
: rc
: ---
Assigned To: Steven Dake
: ZStream
Depends On:
Blocks: 509888
  Show dependency treegraph
 
Reported: 2008-04-11 14:39 EDT by Corey Marthaler
Modified: 2016-04-26 10:25 EDT (History)
5 users (show)

See Also:
Fixed In Version: 0.80.3-19.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:46:50 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2008-04-11 14:39:09 EDT
Description of problem:
During cmirror device failure testing on the latest rhel5 builds, I saw this
issue where after a 'Library queue is full' message on taft-02, it got fenced
due to dm-log-clustered timing out.

Here's what the test was doing:

Enabling device sdg on taft-01
Enabling device sdg on taft-02
Enabling device sdg on taft-03
Enabling device sdg on taft-04

Recreating PV /dev/sdg1
  WARNING: Volume group helter_skelter is not consistent
  WARNING: Volume Group helter_skelter is not consistent
  WARNING: Volume group helter_skelter is not consistent
Extending the recreated PV back into VG helter_skelter
Since we can't yet up convert existing mirrors, down converting to linear(s)
on taft-02 before re-converting back to original mirror(s)
Up converting linear(s) back to mirror(s) on taft-02...
taft-02: lvconvert -m 2 -b helter_skelter/nonsyncd_secondary_3legs_1
/dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sdf1:0-150
<fail name="taft-02_1" pid="27866" time="Fri Apr 11 11:07:08 2008" type="cmd"
duration="1207930028" ec="127" />
ALL STOP!
Didn't receive heartbeat for 120 seconds
couldn't up convert mirror nonsyncd_secondary_3legs_1 on taft-02

Here are the failure messages on taft-02:
Apr 11 11:01:46 taft-02 qarshd[10829]: Running cmdline: lvconvert -m 2 -b
helter_skelter/nonsyncd_secondary_3legs_1 /dev/sde1:0-1000 /dev/sdg1:0-1000
/dev/sdh1:0-1000 /dev/sdf1:0-150
Apr 11 11:01:57 taft-02 openais[6685]: [MAIN ] Library queue is full,
disconnecting library connection.
Apr 11 11:02:03 taft-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_RESUME:1209961 - retrying
Apr 11 11:02:18 taft-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_RESUME:1209962 - retrying
Apr 11 11:02:33 taft-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_RESUME:1209963 - retrying

Here are the messages from the node doing the fencing:
Apr 11 11:02:48 taft-01 clogd[6681]: 95:31) SEQ#=882307, UUID=AHec9A4F,
TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1
Apr 11 11:02:48 taft-01 clogd[6681]: 96:32) SEQ#=882309, UUID=AHec9A4F,
TYPE=(null), ORIG=1, RESP=NO
Apr 11 11:02:48 taft-01 clogd[6681]: 97:33) SEQ#=882309, UUID=AHec9A4F,
TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1
Apr 11 11:02:48 taft-01 clogd[6681]: 98:34) SEQ#=882311, UUID=AHec9A4F,
TYPE=(null), ORIG=1, RESP=NO
Apr 11 11:02:48 taft-01 clogd[6681]: 99:35) SEQ#=882311, UUID=AHec9A4F,
TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1
Apr 11 11:02:48 taft-01 kernel: device-mapper: dm-log-clustered: Excessive delay
in request processing, 14 sec for DM_CLOG_SET_REGION_SYNC
Apr 11 11:03:18 taft-01 fenced[6643]: taft-02 not a cluster member after 30 sec
post_fail_delay
Apr 11 11:03:18 taft-01 fenced[6643]: fencing node "taft-02"

Version-Release number of selected component (if applicable):
2.6.18-88.el5
openais-0.80.3-15.el5
lvm2-2.02.32-3.el5
kmod-cmirror-0.1.8-1.el5
Comment 1 Kiersten (Kerri) Anderson 2008-09-09 11:33:15 EDT
Possibly blocking cluster mirror progress at this point,  Setting flags to get consideration/discussion for 5.3.
Comment 3 Steven Dake 2008-09-22 22:20:27 EDT
The ipc system is designed such that there is one thread that uses the poll system call in a loop.  When the outbound IPC kernel buffers are filled up, the poll thread is interrupted via the pthread_kill library function (which also sets POLLOUT) so that poll will allow fallthrough when outbound messages must be sent but the kernel queue is full and then becomes available.

It is possible with this design that the number of messages being sent to the ipc connection by openais_response_send occurs at a faster rate then the poll thread can be processed.  This causes poll to enter a loop always being interrupted by pthread_exit as new messages are queued for the ipc connection.  The result is that new outgoing messages are not sent in this state to the ipc connection, even though that ipc connection has buffer space available in the kernel queues for outgoing messages.

This problem is easily fixed by only sending a signal via pthread_exit once when the queue has filled to interrupt the blocking poll call instead of every time a new message is queued.
Comment 4 Steven Dake 2008-09-23 13:55:26 EDT
The cmirror clogd daemon has a priority inversion with openais.  The fix requires a change to clogd to run at sched_rr 99 and also for the ipc dispatch thread to run at sched_rr 99 to avoid starvation of the outbound processing queues.
Comment 5 Steven Dake 2008-09-23 13:56:08 EDT
openais problems fixed in version openais-0.80.3-19.el5
Comment 8 errata-xmlrpc 2009-01-20 15:46:50 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0074.html

Note You need to log in before you can comment on or make changes to this bug.