Bug 998493 - glusterfs process hangs if the bricks are started with valgrind
glusterfs process hangs if the bricks are started with valgrind
Status: CLOSED EOL
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: fuse (Show other bugs)
2.1
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
amainkar
:
Depends On:
Blocks: 1286040 905023 1286041 1286043 1286176
  Show dependency treegraph
 
Reported: 2013-08-19 08:54 EDT by Vijaykumar Koppad
Modified: 2015-12-03 12:16 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-03 12:16:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2013-08-19 08:54:11 EDT
Description of problem: If all the bricks of the volume  are started with valgrind and start creating files on the mount point, and also try to do another mount for the same volume , all glusterfs processes will hang. 

client logs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-08-19 12:16:06.619536] E [afr-lk-common.c:819:afr_unlock_entrylk_cbk] 0-doa-replicate-0: /level08/level18/level28/level38/level48/level58/level68/level78/52120c52~~3GQ54M43BH: unlock failed on 0, reason: Transport endpoint is not connected
[2013-08-19 12:16:06.619565] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x165) [0x7fe48fd03f55] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fe48fd03a93] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fe48fd039ae]))) 0-doa-client-0: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2013-08-19 12:15:14.956437 (xid=0x75136x)
[2013-08-19 12:16:06.619572] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-doa-client-0: remote operation failed: Transport endpoint is not connected
[2013-08-19 12:16:06.619599] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x165) [0x7fe48fd03f55] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fe48fd03a93] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fe48fd039ae]))) 0-doa-client-0: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2013-08-19 12:15:24.544462 (xid=0x75137x)
[2013-08-19 12:16:06.619606] W [client-handshake.c:276:client_ping_cbk] 0-doa-client-0: timer must have expired
[2013-08-19 12:16:06.619616] I [client.c:2103:client_rpc_notify] 0-doa-client-0: disconnected from 10.70.43.13:49155. Client process will keep trying to connect to glusterd until brick's port is available.
[2013-08-19 12:16:06.619625] E [afr-common.c:3832:afr_notify] 0-doa-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2013-08-19 12:16:06.619791] I [rpc-clnt.c:1680:rpc_clnt_reconfig] 0-doa-client-0: changing port to 49155 (from 0)
[2013-08-19 12:18:22.589169] I [afr-common.c:3953:afr_local_init] 0-doa-replicate-0: no subvolumes up
[2013-08-19 12:18:22.611181] I [afr-common.c:3953:afr_local_init] 0-doa-replicate-0: no subvolumes up
[2013-08-19 12:18:56.136182] I [afr-common.c:3953:afr_local_init] 0-doa-replicate-0: no subvolumes up
[2013-08-19 12:18:56.158172] I [afr-common.c:3953:afr_local_init] 0-doa-replicate-0: no subvolumes up
[2013-08-19 12:20:24.951166] I [afr-common.c:3953:afr_local_init] 0-doa-replicate-0: no subvolumes up
[2013-08-19 12:20:24.973159] I [afr-common.c:3953:afr_local_init] 0-doa-replicate-0: no subvolumes up

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

sometime the creation of files fails with transport end point not connected, sometimes it just doesn't return to prompt. 

and all the bricks are up , 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Volume Name: doa
Type: Distributed-Replicate
Volume ID: dcb8cd4d-4837-45a7-9217-e5272566d671
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.43.13:/bricks/m1
Brick2: 10.70.43.18:/bricks/m2
Brick3: 10.70.43.22:/bricks/m3
Brick4: 10.70.43.24:/bricks/m4
Options Reconfigured:
changelog.fsync-interval: 3
changelog.rollover-time: 20
geo-replication.ignore-pid-check: on
geo-replication.indexing: on


Status of volume: doa
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.13:/bricks/m1                            49155   Y       7886
Brick 10.70.43.18:/bricks/m2                            49155   Y       20866
Brick 10.70.43.22:/bricks/m3                            49153   Y       21530
Brick 10.70.43.24:/bricks/m4                            49153   Y       20871
NFS Server on localhost                                 2049    Y       7908
Self-heal Daemon on localhost                           N/A     Y       7919
NFS Server on 10.70.43.24                               2049    Y       20887
Self-heal Daemon on 10.70.43.24                         N/A     Y       20904
NFS Server on 10.70.43.18                               2049    Y       20878
Self-heal Daemon on 10.70.43.18                         N/A     Y       20899
NFS Server on 10.70.43.22                               2049    Y       21544
Self-heal Daemon on 10.70.43.22                         N/A     Y       21563
 
There are no active volume tasks

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>



Version-Release number of selected component (if applicable):glusterfs-3.4.0.20rhs-2.el6rhs.x86_64


How reproducible:Happens most of the time


Steps to Reproduce:
1.kill and start glusterd like "glusterd --xlator-option *.run-with-valgrind=yes" in all the nodes of cluster.
2.create and start a dist-rep volume, 
3.start creating files using the command "./crefi.py -n 1000 --multi  -b 10 -d 10 --random --max=500K --min=10   /mnt/master/" and also try to mount the volume in another host.

Actual results: glusterfs process hangs


Expected results: glusterfs process shouldn't hang. 


Additional info:
Comment 5 Vivek Agarwal 2015-12-03 12:16:19 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.