Hide Forgot
Created attachment 789621 [details] tar-ed sosreports Description of problem: ======================= I have hit with error, "Connection failed. Please check if gluster daemon is operational.", after creating lots of files on fuse mount, and also running 'gluster volume status <vol-name> {fd,inode}' repeatedly Version-Release number of selected component (if applicable): ============================================================ glusterfs-3.4.0.21rhs-1 How reproducible: ================= Haven't tried to reproduce Steps to Reproduce: ================== Providing the steps, I followed to hit this issue 1. Created a distribute volume with 2 bricks (i.e) gluster volume create <vol-name> <brick1> <brick2> NOTE: open-behind is off, by default 2. Started the volume (i.e) gluster volume start <vol-name> 3. Fuse mounted the volume in 2 clients [RHEL6.4] (i.e) mount.glusterfs <server>:<vol-name> <mount-point> 4. Created 1000 files and wrote C code to have its fd open <C Code is attached> 5. Created 5 directories, and started created files in to all the 5 directories concurrently (i.e) for i in {1..10000};do dd if=/dev/urandom of=file$i bs=128k count=2;done 6. While step 5 is in progress, on RHS Node, executed 'gluster volume status <vol-name> {fd,inode}' repeatedly (i.e) while true; do gluster volume status distvol fd; gluster volume status distvol inode;done Actual results: ============== After files are created, I could see errors in RHS Node - "Connection failed. Please check if gluster daemon is operational." Expected results: ================= glusterd should be operational Additional info: =============== 1. RHS Nodes ============ 10.70.37.44 10.70.37.86 10.70.37.79 10.70.37.205 2. gluster volume info ======================= [Fri Aug 23 15:00:50 UTC 2013 root.37.86:~ ] # gluster volume status Status of volume: distvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.86:/rhs/brick1/dir1 49152 Y 1907 NFS Server on localhost 2049 Y 1917 NFS Server on 10.70.37.79 2049 Y 1683 NFS Server on 10.70.37.205 2049 Y 1701 There are no active volume tasks [Fri Aug 23 15:00:54 UTC 2013 root.37.86:~ ] # gluster volume info Volume Name: distvol Type: Distribute Volume ID: 1695aff9-d0b5-4f0c-a1c4-3e8e39c20682 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.70.37.44:/rhs/brick1/dir1 Brick2: 10.70.37.86:/rhs/brick1/dir1 [Fri Aug 23 15:00:58 UTC 2013 root.37.86:~ ] # gluster pool list UUID Hostname State d8ab6586-8682-4ea1-b6ef-9afdeec1ee2a 10.70.37.79 Connected 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb 10.70.37.205 Connected 524c9e95-7119-40a6-8f77-c644cefe8994 10.70.37.44 Disconnected f4237514-eb66-4b38-afb4-0dcb1d8c9091 localhost Connected 3. Client ========= The volume is fuse mounted on clients : 10.70.36.32 and 10.70.36.33 Both are RHEL6.4 Mount point : /mnt/distvol ----> for both clients <------
Missed to mention this, all commands are executed from 10.70.37.44 Observations ============ 1. There are no issues related to "Non-privileged port" in gluster logs 2. glusterd is non-operational [Fri Aug 23 15:29:35 UTC 2013 root.37.44:~ ] # service glusterd status glusterd dead but pid file exists 3. I could see following errors in glusterd logs (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) in 10.70.37.44 <snip> [2013-08-23 13:18:58.963638] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994 [2013-08-23 13:18:58.963660] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock [2013-08-23 13:18:58.992886] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2013-08-23 13:18:58.992930] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2013-08-23 13:18:58.992953] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed [2013-08-23 13:18:59.021538] E [glusterd-utils.c:182:glusterd_unlock] 0-management: Cluster lock not held! [2013-08-23 13:18:59.149164] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:20:59.304545] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:20:59.304625] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994 [2013-08-23 13:20:59.304648] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock [2013-08-23 13:20:59.482698] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:20:59.485110] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:20:59.485308] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on d8ab6586-8682-4ea1-b6ef-9afdeec1ee2a. Please check log file for details. [2013-08-23 13:25:22.844930] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on f4237514-eb66-4b38-afb4-0dcb1d8c9091. Please check log file for details. [2013-08-23 13:25:22.845132] E [glusterd-syncop.c:823:gd_lock_op_phase] 0-management: Failed to acquire lock [2013-08-23 13:25:22.885899] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2013-08-23 13:25:22.885928] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2013-08-23 13:25:22.885948] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed [2013-08-23 13:25:22.915214] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:25:22.915567] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:25:22.915612] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994 [2013-08-23 13:25:22.915633] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock [2013-08-23 13:25:22.926613] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2013-08-23 13:25:22.926638] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2013-08-23 13:25:22.926654] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed [2013-08-23 13:25:22.927461] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2013-08-23 13:25:22.927477] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2013-08-23 13:25:22.927494] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed [2013-08-23 13:25:22.927508] E [glusterd-utils.c:182:glusterd_unlock] 0-management: Cluster lock not held! [2013-08-23 13:26:59.940718] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:26:59.942584] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:26:59.942862] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on d8ab6586-8682-4ea1-b6ef-9afdeec1ee2a. Please check log file for details. [2013-08-23 13:30:51.475167] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on f4237514-eb66-4b38-afb4-0dcb1d8c9091. Please check log file for details. [2013-08-23 13:30:51.475268] E [glusterd-syncop.c:823:gd_lock_op_phase] 0-management: Failed to acquire lock [2013-08-23 13:30:51.488286] I [glusterd-handler.c:3495:__glusterd_handle_status_volume] 0-management: Received status volume req for volume distvol [2013-08-23 13:30:51.488348] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: 524c9e95-7119-40a6-8f77-c644cefe8994, lock held by: 524c9e95-7119-40a6-8f77-c644cefe8994 [2013-08-23 13:30:51.488358] E [glusterd-syncop.c:1153:gd_sync_task_begin] 0-management: Unable to acquire lock [2013-08-23 13:30:51.493518] I [socket.c:3108:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2013-08-23 13:30:51.493529] E [rpcsvc.c:1111:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1x, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2013-08-23 13:30:51.493540] E [glusterd-utils.c:380:glusterd_submit_reply] 0-: Reply submission failed [2013-08-23 13:30:51.493550] E [glusterd-utils.c:182:glusterd_unlock] 0-management: Cluster lock not held! [2013-08-23 13:30:51.494706] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.494827] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on f4237514-eb66-4b38-afb4-0dcb1d8c9091. Please check log file for details. [2013-08-23 13:30:51.498678] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.498734] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.498753] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.498777] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.498802] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.501365] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. [2013-08-23 13:30:51.504045] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Unlocking failed on 5a1f5689-17ae-4b0e-9e16-d8ecffded2bb. Please check log file for details. </snip>
Fuse mount Info =============== [Fri Aug 23 15:51:11 UTC 2013 root.36.32:/mnt/distvol ] # df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_rhsclient8-lv_root ext4 50G 3.0G 44G 7% / tmpfs tmpfs 7.8G 0 7.8G 0% /dev/shm /dev/sda1 ext4 485M 65M 396M 14% /boot /dev/mapper/vg_rhsclient8-lv_home ext4 1.8T 7.7G 1.7T 1% /home 10.70.37.44:distvol fuse.glusterfs 170G 14G 157G 8% /mnt/distvol [Fri Aug 23 15:50:12 UTC 2013 root.36.33:/mnt/distvol ] # df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_rhsclient9-lv_root ext4 50G 6.8G 40G 15% / tmpfs tmpfs 7.8G 0 7.8G 0% /dev/shm /dev/sda1 ext4 485M 91M 369M 20% /boot /dev/mapper/vg_rhsclient9-lv_home ext4 1.8T 17G 1.7T 1% /home 10.70.37.44:distvol fuse.glusterfs 170G 14G 157G 8% /mnt/distvol
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.