Bug 1000535 - glusterd is non-operational,after creating large number of files from fuse mount and also running, gluster volume status <vol-name> {fd, inode} repeatedly
glusterd is non-operational,after creating large number of files from fuse mo...
Status: CLOSED EOL
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
SATHEESARAN
glusterd
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-23 11:18 EDT by SATHEESARAN
Modified: 2015-12-03 12:22 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-03 12:22:25 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
tar-ed sosreports (11.84 MB, application/gzip)
2013-08-23 11:18 EDT, SATHEESARAN
no flags Details

  None (edit)
Description SATHEESARAN 2013-08-23 11:18:19 EDT
Created attachment 789621 [details]
tar-ed sosreports

Description of problem:
=======================
I have hit with error, "Connection failed. Please check if gluster daemon is operational.", after creating lots of files on fuse mount, and also running 'gluster volume status <vol-name> {fd,inode}' repeatedly

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.4.0.21rhs-1

How reproducible:
=================
Haven't tried to reproduce

Steps to Reproduce:
==================
Providing the steps, I followed to hit this issue
1. Created a distribute volume with 2 bricks
(i.e) gluster volume create <vol-name> <brick1> <brick2>
NOTE: open-behind is off, by default

2. Started the volume
(i.e) gluster volume start <vol-name>

3. Fuse mounted the volume in 2 clients [RHEL6.4]
(i.e) mount.glusterfs <server>:<vol-name> <mount-point>

4. Created 1000 files and wrote C code to have its fd open <C Code is attached>

5. Created 5 directories, and started created files in to all the 5 directories
concurrently
(i.e) for i in {1..10000};do dd if=/dev/urandom of=file$i bs=128k count=2;done

6. While step 5 is in progress, on RHS Node, executed 'gluster volume status <vol-name> {fd,inode}' repeatedly
(i.e) while true; do gluster volume status distvol fd; gluster volume status distvol inode;done

Actual results:
==============
After files are created, I could see errors in RHS Node - "Connection failed. Please check if gluster daemon is operational."

Expected results:
=================
glusterd should be operational

Additional info:
===============

1. RHS Nodes
============
10.70.37.44
10.70.37.86
10.70.37.79
10.70.37.205

2. gluster volume info
=======================
[Fri Aug 23 15:00:50 UTC 2013 root@10.70.37.86:~ ] # gluster volume status
Status of volume: distvol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.86:/rhs/brick1/dir1                      49152   Y       1907
NFS Server on localhost                                 2049    Y       1917
NFS Server on 10.70.37.79                               2049    Y       1683
NFS Server on 10.70.37.205                              2049    Y       1701
 
There are no active volume tasks
[Fri Aug 23 15:00:54 UTC 2013 root@10.70.37.86:~ ] # gluster volume info
 
Volume Name: distvol
Type: Distribute
Volume ID: 1695aff9-d0b5-4f0c-a1c4-3e8e39c20682
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.70.37.44:/rhs/brick1/dir1
Brick2: 10.70.37.86:/rhs/brick1/dir1
[Fri Aug 23 15:00:58 UTC 2013 root@10.70.37.86:~ ] # gluster pool list
UUID                                    Hostname        State
d8ab6586-8682-4ea1-b6ef-9afdeec1ee2a    10.70.37.79     Connected 
5a1f5689-17ae-4b0e-9e16-d8ecffded2bb    10.70.37.205    Connected 
524c9e95-7119-40a6-8f77-c644cefe8994    10.70.37.44     Disconnected 
f4237514-eb66-4b38-afb4-0dcb1d8c9091    localhost       Connected

3. Client
=========
The volume is fuse mounted on clients : 10.70.36.32 and 10.70.36.33
Both are RHEL6.4
Mount point : /mnt/distvol  ----> for both clients <------
Comment 1 SATHEESARAN 2013-08-23 11:30:43 EDT
Missed to mention this, all commands are executed from 10.70.37.44

Observations
============

1. There are no issues related to "Non-privileged port" in gluster logs

2. glusterd is non-operational
[Fri Aug 23 15:29:35 UTC 2013 root@10.70.37.44:~ ] # service glusterd status
glusterd dead but pid file exists

3. I could see following errors in glusterd logs (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log)
in 10.70.37.44

<snip>
[2013-08-23 13:18:58.963638] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994
[2013-08-23 13:18:58.963660] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-08-23 13:18:58.992886] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2013-08-23 13:18:58.992930] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2013-08-23 13:18:58.992953] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed
[2013-08-23 13:18:59.021538] E [glusterd-utils.c:182:glusterd_unlock] 0-management: Cluster lock not held!
[2013-08-23 13:18:59.149164] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:20:59.304545] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:20:59.304625] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994
[2013-08-23 13:20:59.304648] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-08-23 13:20:59.482698] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:20:59.485110] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:20:59.485308] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on d8ab6586-8682-4ea1-b6ef-9afdeec1ee2a. Please check log file for details.
[2013-08-23 13:25:22.844930] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on f4237514-eb66-4b38-afb4-0dcb1d8c9091. Please check log file for details.
[2013-08-23 13:25:22.845132] E [glusterd-syncop.c:823:gd_lock_op_phase] 0-management: Failed to acquire lock
[2013-08-23 13:25:22.885899] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2013-08-23 13:25:22.885928] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2013-08-23 13:25:22.885948] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed
[2013-08-23 13:25:22.915214] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:25:22.915567] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:25:22.915612] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994
[2013-08-23 13:25:22.915633] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-08-23 13:25:22.926613] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2013-08-23 13:25:22.926638] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2013-08-23 13:25:22.926654] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed
[2013-08-23 13:25:22.927461] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2013-08-23 13:25:22.927477] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2013-08-23 13:25:22.927494] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed
[2013-08-23 13:25:22.927508] E [glusterd-utils.c:182:glusterd_unlock] 0-management: Cluster lock not held!
[2013-08-23 13:26:59.940718] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:26:59.942584] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:26:59.942862] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on d8ab6586-8682-4ea1-b6ef-9afdeec1ee2a. Please check log file for details.
[2013-08-23 13:30:51.475167] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on f4237514-eb66-4b38-afb4-0dcb1d8c9091. Please check log file for details.
[2013-08-23 13:30:51.475268] E [glusterd-syncop.c:823:gd_lock_op_phase] 0-management: Failed to acquire lock
[2013-08-23 13:30:51.488286] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol
[2013-08-23 13:30:51.488348] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994
[2013-08-23 13:30:51.488358] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-08-23 13:30:51.493518] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2013-08-23 13:30:51.493529] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2013-08-23 13:30:51.493540] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed
[2013-08-23 13:30:51.493550] E [glusterd-utils.c:182:glusterd_unlock] 0-management: Cluster lock not held!
[2013-08-23 13:30:51.494706] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.494827] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on f4237514-eb66-4b38-afb4-0dcb1d8c9091. Please check log file for details.
[2013-08-23 13:30:51.498678] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.498734] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.498753] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.498777] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.498802] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.501365] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
[2013-08-23 13:30:51.504045] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details.
</snip>
Comment 2 SATHEESARAN 2013-08-23 11:51:42 EDT
Fuse mount Info
===============
[Fri Aug 23 15:51:11 UTC 2013 root@10.70.36.32:/mnt/distvol ] # df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/vg_rhsclient8-lv_root
              ext4     50G  3.0G   44G   7% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda1     ext4    485M   65M  396M  14% /boot
/dev/mapper/vg_rhsclient8-lv_home
              ext4    1.8T  7.7G  1.7T   1% /home
10.70.37.44:distvol
    fuse.glusterfs    170G   14G  157G   8% /mnt/distvol

[Fri Aug 23 15:50:12 UTC 2013 root@10.70.36.33:/mnt/distvol ] # df -Th
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/vg_rhsclient9-lv_root
              ext4     50G  6.8G   40G  15% /
tmpfs        tmpfs    7.8G     0  7.8G   0% /dev/shm
/dev/sda1     ext4    485M   91M  369M  20% /boot
/dev/mapper/vg_rhsclient9-lv_home
              ext4    1.8T   17G  1.7T   1% /home
10.70.37.44:distvol
    fuse.glusterfs    170G   14G  157G   8% /mnt/distvol
Comment 3 Vivek Agarwal 2015-12-03 12:22:25 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.