Bug 168853 - GFS mount hangs if two nodes try to mount same filesystem simultaneously
GFS mount hangs if two nodes try to mount same filesystem simultaneously
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-09-20 13:50 EDT by Henry Harris
Modified: 2009-04-16 16:30 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-11-21 16:32:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
GFS mount info from /var/log/messages on node1 (393.79 KB, text/plain)
2005-09-20 13:55 EDT, Henry Harris
no flags Details
GFS mount info from /var/log/messages on node1 (393.79 KB, text/plain)
2005-09-20 13:55 EDT, Henry Harris
no flags Details
GFS mount info from /var/log/messages on node2 (331.46 KB, text/plain)
2005-09-20 13:56 EDT, Henry Harris
no flags Details
Mount command hanging on node1 (311 bytes, text/plain)
2005-09-20 13:57 EDT, Henry Harris
no flags Details
Mount command hanging on node2 (311 bytes, text/plain)
2005-09-20 13:58 EDT, Henry Harris
no flags Details

  None (edit)
Description Henry Harris 2005-09-20 13:50:52 EDT
Description of problem: Two nodes trying to mount the same GFS filesystem at 
the same time causes the mount command on both nodes to hang.


Version-Release number of selected component (if applicable):


How reproducible: Every time if mounts happen at exactly the same time


Steps to Reproduce:
1. Bring up two node cluster
2. Mount the same GFS file system from both nodes at exactly the same time
3.
  
Actual results: Neither node completes the mount command


Expected results: Both nodes should mount the filesystem


Additional info: Check the last entries in the two files, node1_mounts.txt and 
node2_mounts.txt.  You will see that node 2 started mounting the filesystem 
(fsid=sqa:snap5src)first and replayed all 16 journals.  Before the mount 
command completed, node 1 started mounting the same filesystem and hung before 
the message "Trying to acquire journal lock" could be printed.
Comment 1 Henry Harris 2005-09-20 13:55:20 EDT
Created attachment 119034 [details]
GFS mount info from /var/log/messages on node1
Comment 2 Henry Harris 2005-09-20 13:55:32 EDT
Created attachment 119035 [details]
GFS mount info from /var/log/messages on node1
Comment 3 Henry Harris 2005-09-20 13:56:51 EDT
Created attachment 119036 [details]
GFS mount info from /var/log/messages on node2
Comment 4 Henry Harris 2005-09-20 13:57:47 EDT
Created attachment 119037 [details]
Mount command hanging on node1
Comment 5 Henry Harris 2005-09-20 13:58:09 EDT
Created attachment 119038 [details]
Mount command hanging on node2
Comment 6 David Teigland 2005-09-20 16:48:53 EDT
Is this gfs code for RHEL4U2 or RHEL4U3?
The following information would be useful from both
nodes after they hang:
- output of cman_tool services
- output of /proc/cluster/lock_dlm/debug
- output of /proc/cluster/dlm_debug
- output of /proc/cluster/sm_debug
Comment 7 David Teigland 2005-09-26 12:32:46 EDT
I'm also curious about how "simultaneous" the two mounts are.
I've been running tests for days that repeated do the following
without any problem:
   ssh -x node1 mount /gfs &; ssh -x node2 mount /gfs

What is it that's initiating the mount commands in the case
above?  Is it any more simultaneous than my example?
Comment 8 Henry Harris 2005-10-14 18:50:21 EDT
This bug was seen with GFS-kernel-smp-2.6.9-41.2.x86_64.  We have been running 
version 42.1 lately and have not seen the bug for awhile.  

Since I haven't seen the bug recently, I can't provide the output you 
requested.  The mounts were done on boot up.  The problem seemed to be if the 
second node started replaying the journals before the first node had completed 
replaying all 16 journals.

Note You need to log in before you can comment on or make changes to this bug.