Bug 435341 - clvmd hangs during startup because vgscan hangs
Summary: clvmd hangs during startup because vgscan hangs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: lvm2-cluster
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-02-28 19:37 UTC by Shane Bradley
Modified: 2018-10-19 20:07 UTC (History)
6 users (show)

Fixed In Version: RHBA-2008-0806
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-25 19:26:29 UTC
Embargoed:


Attachments (Terms of Use)
Information requested from eng. (697.05 KB, application/x-gzip)
2008-03-03 15:33 UTC, Shane Bradley
no flags Details
sysrq output (137.83 KB, application/octet-stream)
2008-03-04 15:50 UTC, Robert Munilla
no flags Details
sysrq output 2 (68.84 KB, text/plain)
2008-03-04 15:53 UTC, Robert Munilla
no flags Details
Patch for testing (1.03 KB, patch)
2008-03-13 11:22 UTC, Christine Caulfield
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0806 0 normal SHIPPED_LIVE lvm2-cluster bug fix and enhancement update 2008-07-25 19:26:14 UTC

Description Shane Bradley 2008-02-28 19:37:56 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12

Description of problem:
clvmd will fail to start when a node was fenced off from cluster.
node started back up, but clvmd failed to start.

The status says it is running, but none of the lvm commands will work.
segfaults are created when running pvs and none of the cluster volumes are activated.

Feb 27 15:55:34 blade112 kernel: pvs[12756]: segfault at
0000000000000000 rip 00000000004443e4 rsp 0000007fbfffd778 error 4

If the "vgscan" line is commented out from /etc/init.d/clvmd, then
everything is activated and mounted correctly for gfs.

--sbradley


Version-Release number of selected component (if applicable):
cman-1.0.17-0-x86_64

How reproducible:
Always


Steps to Reproduce:
1. start ccsd, cman, fenced
2. start clvmd



Actual Results:
node should join cluster.
clvmd should start, vgscan, and then activate all clustered volumes found.


Expected Results:
clvmd will hang on starting and lvm commands will fail.
clustered volume groups will not be activated and cannot be mounted.


Additional info:
Removing the vgscan from the /etc/init.d/clvmd script will allow the node to run clvmd correctly. however, subsequents vgscan I believe will fail.

Comment 2 Christine Caulfield 2008-02-29 11:33:31 UTC
Can we have debugging information from clvmd (running clvmd -d and capture
stderr) when the vgscan and failing commands are run please ? Ideally annotated
so we can see which logs parts related to which commands), and also the result
of running the commands with -vvvvv (eg pvs -vvvvvvv).

If the commands are segfaulting then it would also be useful to install the
debug packages and get a gdb traceback of the failure too.

Comment 3 Shane Bradley 2008-03-03 15:33:02 UTC
Created attachment 296619 [details]
Information requested from eng.

Comment 5 Christine Caulfield 2008-03-04 10:11:13 UTC
It's waiting for the DLM. 

Looking at the clvmd log I would guess that it's stuck in a dlm_write so that
the dlm hasn't responded to the locking request at all, and the kill hasn't
caused the write to return. That, in turn causes the main loop to hang waiting
for the child thread to complete and prevents further activity in clvmd.

There are no obvious anomalies in the DLM log, it seems to be running and
finished recovery. There are some odd "clvmd lockspace already in use" messages
then I can't account for; I doubt they are a cause but they shouldn't be there,
it might be an indication of mixed-up lockspace device nodes.

The next thing to get, if possible is a sysrq-T - that should tell us where the
dlm is waiting and which kernel locks are being held.

Comment 6 Robert Munilla 2008-03-04 15:50:13 UTC
Created attachment 296752 [details]
sysrq output

Comment 7 Robert Munilla 2008-03-04 15:53:23 UTC
Created attachment 296753 [details]
sysrq output 2

Comment 9 Christine Caulfield 2008-03-04 16:43:57 UTC
There are a lot of processes in find_extend_vma (including clvmd) which might
point to an out-of-memory problem.

Looking in /proc/slabinfo, nothing leaps out as particularly bad though.

Comment 13 Nate Straz 2008-03-05 22:23:22 UTC
I hit something like this on our ppc cluster, but while running GULM.  All four
nodes were coming up at the same time.  The three GULM servers hung for a while
in clvmd startup.  Two of the three nodes eventually became unstuck.  I don't
think the last node will become unstuck because it doesn't have any outstanding
locks in gulm, nor is waiting for any locks.  It does have one vgscan process
still running.

Comment 18 Christine Caulfield 2008-03-13 11:22:14 UTC
Created attachment 297921 [details]
Patch for testing

This bug looks very similar to bz#435491 (the clvmd bits, not the networking
issues). Looking at the reproduction we got (comment 47) I think I spotted a
race in clvmd. If you're up to making another test RPM this might be worth a
go.

It includes the non-blocking patch too as I still think that's a good idea.

Comment 19 Christine Caulfield 2008-03-14 08:33:17 UTC
I ran the test on corey's cluster with this patch in place and it still hung.

However, looking at clvmd under gdb shows different symptoms. All of the clvmd
process are in a normal quiescent state and all LVM commands can connect. The
hang is purely due to the other processes being held behind the VG lock.

I'm going to restart the tests with clvmd logging enabled to gather more
information. I'm actually away today but I might be able to check on the status
later on.

Comment 21 Jonathan Earl Brassow 2008-03-14 19:14:59 UTC
One of the mirrors (on the QA cluster) is stuck because it is trying to write to
the log device while it is suspended.  There are two ways that this can happen:

1) [Most likely]  The mirror log server can move around as machines come and go.
 If a machine incorrectly assumes control as the server and issues a disk
operation while the log its suspended, it will freeze the whole cluster.  This
is because the log server is stuck disk waiting - no-one can contact it, and a
new server can not be elected without the stuck server's vote (or fencing).

2) The top level mirror is being resumed before the lower level devices are
resumed.   This would cause a result similar to #1 where the server would be
stuck; but it would not be the fault of the mirror log server.

We have seen this issue before, but it is not the basis for this bug.  You may
have gotten past the original issue and stumbled back upon an old issue.


Comment 22 Christine Caulfield 2008-03-17 09:38:05 UTC
> We have seen this issue before, but it is not the basis for this bug.  You may
> have gotten past the original issue and stumbled back upon an old issue.

I agree. I suspect this means that my clvmd fix has done its job. I'll check it
in to CVS.

Checking in daemons/clvmd/clvmd.c;
/cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v  <--  clvmd.c
new revision: 1.44; previous revision: 1.43
done


Comment 24 RHEL Program Management 2008-03-18 13:28:03 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 25 Christine Caulfield 2008-03-18 13:29:14 UTC
*** Bug 435491 has been marked as a duplicate of this bug. ***

Comment 27 Jonathan Earl Brassow 2008-03-20 19:26:51 UTC
The issues mentioned in comment #21 should be cleared now.  You can always tell
if the issue in comment #21 has arisen, because 'dmsetup info' will show a log
device as SUSPENDED and cluster_log_servd will be in the 'D' state.


Comment 29 Christine Caulfield 2008-03-28 13:00:38 UTC
corey was still experiencing some hangs even after using clvmd with the above patch.

The commit below fixes an uninitialised variable that caused this.

Checking in daemons/clvmd/clvmd.c;
/cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v  <--  clvmd.c
new revision: 1.45; previous revision: 1.44
done

I've set the state back to POST because we almost certainly need both of these
patches.

Comment 30 Christine Caulfield 2008-03-28 13:27:02 UTC
Put back to modified, I'll use bz# 435491 for this checkin.

Comment 33 Mark Hlawatschek 2008-05-09 07:57:31 UTC
I observed a similar problem:

When clvmd is started on only a subset of all joined cluster nodes (as it 
always is the case at some time during a cluster startup) a vgscan causes 
clvmd to deadlock.

Please note, that I already applied Christines patches for clvmd.c 1.43-1.45

This behavior can be reproduced with the following process:

1. on all cluster nodes do
 start cluster services:
# ccsd
# cman_tool join -w
# fence_tool join -c -w 

2. on only one cluster node do
 start clvmd
# clvmd
# vgscan

 

Comment 34 Mark Hlawatschek 2008-05-09 08:09:05 UTC
While doing some debugging, I noticed, that the deadlock comes from here:

clvmd wants to reply that not all cluster nodes are active (clvmd.c: 
add_reply_to_list) but the mutex is not initialized yet.

clvmd.c:
1284 static void add_reply_to_list(struct local_client *client, int status,
1285                               const char *csid, const char *buf, int len)
1286 {
1287         struct node_reply *reply;
1288
1289         pthread_mutex_lock(&client->bits.localsock.reply_mutex);

The mutex initialization is done in clvmd.c:read_from_local_sock after the 
check if all clvmds are running ;-)

check for all clvmds:

clvmd.c:
 988                 /* Only run the command if all the cluster nodes are 
running CLVMD */
 989                 if (((inheader->flags & CLVMD_FLAG_LOCAL) == 0) &&
 990                     (check_all_clvmds_running(thisfd) == -1)) {

initialization of the mutex:

clvmd.c:
1063                 pthread_mutex_init(&thisfd->bits.localsock.reply_mutex, 
NULL);

In my opinion the mutex initialization should be done before the clmvd 
verification.





Comment 35 Mark Hlawatschek 2008-05-09 08:23:08 UTC
Please note, that removing the deadlock situation leads to another problem:

if the check_all_clvmds_running verification in clvmd.c fails as expected, the 
vgscan and cgchange commands also fail. 
This leads to undefined situations during a multi node cluster startup.
I.e. during a cluster startup when the required numer of nodes, fulfilling the 
quorum requirement are coming up, the clvmd cluster is in an inconsistent 
state for some time. Some nodes would have already started the clvmd some 
wouldn't. This would cause several check_all_clvmds_running calls to fail and 
therefore vgscan and vgchange commands would also fail. 

Comment 36 Christine Caulfield 2008-05-09 10:04:18 UTC
Comment #34 you're right - thanks for spotting this. I've checked the fix in as
revision 1.46 of clvmd.c.

Comment #35 needs a little more thought and communication with the init scripts
I suspect. I'm not sure what they do.


Comment 37 Mark Hlawatschek 2008-05-09 11:26:15 UTC
The clvmd initscript basically does the following steps:

clvmd -T20 -t 90
vgscan 
vgchange -ayl

There are no clmvd cluster consistency checks included. How could this be 
done ?

Comment 38 Christine Caulfield 2008-06-04 08:25:54 UTC
The commands in comment #37 should work in theory, because vgchange -aly only
activates the volumes on the local node.

In practice however, it doesn't, because vgscan issues a command around the
whole cluster to backup metadata and re-populate the cache.

Comment 41 errata-xmlrpc 2008-07-25 19:26:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0806.html



Note You need to log in before you can comment on or make changes to this bug.