Bug 168853

Summary: GFS mount hangs if two nodes try to mount same filesystem simultaneously
Product: [Retired] Red Hat Cluster Suite Reporter: Henry Harris <henry.harris>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, kanderso, lhh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-11-21 21:32:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
GFS mount info from /var/log/messages on node1
none
GFS mount info from /var/log/messages on node1
none
GFS mount info from /var/log/messages on node2
none
Mount command hanging on node1
none
Mount command hanging on node2 none

Description Henry Harris 2005-09-20 17:50:52 UTC
Description of problem: Two nodes trying to mount the same GFS filesystem at 
the same time causes the mount command on both nodes to hang.


Version-Release number of selected component (if applicable):


How reproducible: Every time if mounts happen at exactly the same time


Steps to Reproduce:
1. Bring up two node cluster
2. Mount the same GFS file system from both nodes at exactly the same time
3.
  
Actual results: Neither node completes the mount command


Expected results: Both nodes should mount the filesystem


Additional info: Check the last entries in the two files, node1_mounts.txt and 
node2_mounts.txt.  You will see that node 2 started mounting the filesystem 
(fsid=sqa:snap5src)first and replayed all 16 journals.  Before the mount 
command completed, node 1 started mounting the same filesystem and hung before 
the message "Trying to acquire journal lock" could be printed.

Comment 1 Henry Harris 2005-09-20 17:55:20 UTC
Created attachment 119034 [details]
GFS mount info from /var/log/messages on node1

Comment 2 Henry Harris 2005-09-20 17:55:32 UTC
Created attachment 119035 [details]
GFS mount info from /var/log/messages on node1

Comment 3 Henry Harris 2005-09-20 17:56:51 UTC
Created attachment 119036 [details]
GFS mount info from /var/log/messages on node2

Comment 4 Henry Harris 2005-09-20 17:57:47 UTC
Created attachment 119037 [details]
Mount command hanging on node1

Comment 5 Henry Harris 2005-09-20 17:58:09 UTC
Created attachment 119038 [details]
Mount command hanging on node2

Comment 6 David Teigland 2005-09-20 20:48:53 UTC
Is this gfs code for RHEL4U2 or RHEL4U3?
The following information would be useful from both
nodes after they hang:
- output of cman_tool services
- output of /proc/cluster/lock_dlm/debug
- output of /proc/cluster/dlm_debug
- output of /proc/cluster/sm_debug


Comment 7 David Teigland 2005-09-26 16:32:46 UTC
I'm also curious about how "simultaneous" the two mounts are.
I've been running tests for days that repeated do the following
without any problem:
   ssh -x node1 mount /gfs &; ssh -x node2 mount /gfs

What is it that's initiating the mount commands in the case
above?  Is it any more simultaneous than my example?


Comment 8 Henry Harris 2005-10-14 22:50:21 UTC
This bug was seen with GFS-kernel-smp-2.6.9-41.2.x86_64.  We have been running 
version 42.1 lately and have not seen the bug for awhile.  

Since I haven't seen the bug recently, I can't provide the output you 
requested.  The mounts were done on boot up.  The problem seemed to be if the 
second node started replaying the journals before the first node had completed 
replaying all 16 journals.