Bug 203916 - groupd daemon segfault and mount hang
groupd daemon segfault and mount hang
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
Cluster QE
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-24 10:03 EDT by Robert Peterson
Modified: 2009-04-16 18:45 EDT (History)
2 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-10-11 10:04:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch to fix the problem (1.83 KB, text/plain)
2006-08-24 10:03 EDT, Robert Peterson
no flags Details
Better patch for mount hangs (3.11 KB, patch)
2006-08-25 10:38 EDT, Robert Peterson
no flags Details | Diff

  None (edit)
Description Robert Peterson 2006-08-24 10:03:15 EDT
Description of problem:
If I mount a certain number of gfs mount points on any given node
in a cluster, the mount will hang and the groupd daemon will segfault.

How reproducible:
Always

Steps to Reproduce:
From a fresh boot of a 5-node (smoke) cluster:
1. service cman start on all 5 nodes
2. service clvmd start on all 5 nodes
3. Mount a gfs file system on 4 out of 5 nodes.
4. Mount a gfs2 file system on the same 4 nodes.
5. On the fifth node, do five mounts of different file systems.

mount -tgfs /dev/Smoke_Cluster/Smoke_Cluster2 /mnt/SmokeCluster2/
mount -tgfs /dev/Smoke_Cluster/Smoke_Cluster3 /mnt/SmokeCluster3/
mount -tgfs /dev/Smoke_Cluster/Smoke_Cluster4 /mnt/SmokeCluster4/
mount -tgfs /dev/Smoke_Cluster/Smoke_Cluster5 /mnt/SmokeCluster5/
mount -tgfs /dev/Smoke_Cluster/Smoke_Cluster6 /mnt/SmokeCluster6/
  
Actual results:
The first four mounts work correctly.  The fifth mount hangs 
(but you can interrupt it) and the groupd daemon segfaults, 
causing the other daemons to stop as well.

Expected results:
You should be able to mount more than 4 gfs mount points.

Additional info:
The groupd daemon was allocating two chunks of memory for node
information relating to the mounts.  The array was allocated at 16
entries.  When the 17th entry was needed, only one of the arrays
was increased in size.  When the second array was used, the segfault
occurred.
Comment 1 Robert Peterson 2006-08-24 10:03:15 EDT
Created attachment 134812 [details]
Proposed patch to fix the problem
Comment 2 Robert Peterson 2006-08-25 10:38:21 EDT
Created attachment 134920 [details]
Better patch for mount hangs

The previous patch was still hanging because the pollfd array
that was allocated did not initialize its 'revents'.  That caused 
the system to try to execute revents that didn't exist, and somehow
that caused the hang.  One side-effect was socket write errors in 
the daemon.

Also, in the process of debugging this, I learned that the 
gfs_controld daemon was also not dynamically growing its pollfd either.
That did have ramifications, but I don't know the full extent of that.
I do know that group_tool -v would not show you the proper list of
groups when gfs_controld did not dynamically grow its list.  This
occurred when a node tried to allocate 5 or more gfs mount points.
When gfs_controld is allowed to dynamically grow its pollfd array,
the proper group list is displayed by group_tool.

This improved patch fixes both problems and I've tested it by
mounting ten GFS file systems without a problem.

I highly suspect these problems might have been causing some or
most of the problems encountered by the QE team with the tank
and such.
Comment 3 Nate Straz 2006-10-11 09:23:15 EDT
Our original mount hangs seem to be fixed.   We'll file new bugs when we run
into new hangs.
Comment 4 Nate Straz 2006-10-11 10:03:22 EDT
Passing through verified to get metric correct.
Comment 5 Nate Straz 2007-12-13 12:22:10 EST
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Note You need to log in before you can comment on or make changes to this bug.