Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1471737 - file being created by dd doesn't automatically release control when the volume is down
file being created by dd doesn't automatically release control when the volu...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.3
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Ravishankar N
nchilaka
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-17 07:21 EDT by nchilaka
Modified: 2018-11-09 13:18 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2017-07-17 07:21:47 EDT
Description of problem:
======================
I have seen that when I am writing to a file using dd, and I stop the volume, the control of cli is never released by the dd command, until user intervenes.
However, when there is a transport end point error , why are we still holding the lock?
It gets released only when an user does a terminate of the command(ctrl+c)

Any file created henceforth, using dd or touch or any other manner, fails with "Transport endpoint not connected" which is correct


Version-Release number of selected component (if applicable):
=======
3.8.4-33

How reproducible:
========
always

Steps to Reproduce:
1.create a 1x3 volume and fuse mount it 
2. create a file using dd or put in loop say for 10 files
for i in {1..10000};do dd if=/dev/urandom of=file.$i bs=1024 count=100000;done

3. stop the volume using vol stop command
3.you can notice that the cli prompt is never returned until the user intervenes.
I waited for about 5 min


The file handle shows transport end point error

[root@dhcp35-103 glusterfs]#  lsof|grep file.5
lsof: WARNING: can't stat() fuse.glusterfs file system /mnt/vol_3
      Output information may be incomplete.
dd        12428          root    1w  unknown                                         /mnt/vol_3/dd2/file.5 (stat: Transport endpoint is not connected)




Actual results:
===========


Expected results:
======
the shell control must be returned to user once there is a transport end point error

Additional info:
==============
this can be a problem especially if IOs are being done through another application, where user may not be able to identify.




Also fuse logs point to the right errors

[2017-07-17 11:09:19.826321] W [socket.c:595:__socket_rwv] 0-vol_3-client-0: readv on 10.70.35.45:49154 failed (Connection reset by peer)
[2017-07-17 11:09:19.826399] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_3-client-0: disconnected from vol_3-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-17 11:09:19.827417] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f0ef8dae1e2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f0ef8b738ae] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f0ef8b739be] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f0ef8b75130] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f0ef8b75be0] ))))) 0-vol_3-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-07-17 11:09:19.824810 (xid=0x1b0f9)
[2017-07-17 11:09:19.827457] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-vol_3-client-0: remote operation failed [Transport endpoint is not connected]
[2017-07-17 11:09:19.827506] W [MSGID: 114061] [client-common.c:460:client_pre_fsync] 0-vol_3-client-0:  (5bd904f5-384b-4cbf-b0ae-1849f7e0ad36) remote_fd is -1. EBADFD [File descriptor in bad state]
[2017-07-17 11:09:19.827561] W [MSGID: 108035] [afr-transaction.c:2231:afr_changelog_fsync_cbk] 0-vol_3-replicate-0: fsync(5bd904f5-384b-4cbf-b0ae-1849f7e0ad36) failed on subvolume vol_3-client-0. Transaction was WRITE [File descriptor in bad state]
[2017-07-17 11:09:20.010047] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-vol_3-client-0: remote operation failed [Transport endpoint is not connected]
[2017-07-17 11:09:21.846196] W [socket.c:595:__socket_rwv] 0-vol_3-client-2: readv on 10.70.35.122:49154 failed (Connection reset by peer)
[2017-07-17 11:09:21.846287] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_3-client-2: disconnected from vol_3-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-17 11:09:21.846292] W [socket.c:595:__socket_rwv] 0-vol_3-client-1: readv on 10.70.35.130:49154 failed (Connection reset by peer)
[2017-07-17 11:09:21.846328] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_3-client-1: disconnected from vol_3-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-17 11:09:21.846417] W [MSGID: 108001] [afr-common.c:4820:afr_notify] 0-vol_3-replicate-0: Client-quorum is not met
[2017-07-17 11:09:21.846560] E [MSGID: 108006] [afr-common.c:4731:afr_notify] 0-vol_3-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-07-17 11:09:21.847039] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f0ef8dae1e2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f0ef8b738ae] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f0ef8b739be] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f0ef8b75130] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f0ef8b75be0] ))))) 0-vol_3-client-2: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-07-17 11:09:21.844728 (xid=0x1c6c1)
[2017-07-17 11:09:21.847047] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f0ef8dae1e2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f0ef8b738ae] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f0ef8b739be] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f0ef8b75130] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f0ef8b75be0] ))))) 0-vol_3-client-1: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-07-17 11:09:21.844687 (xid=0x1c6c3)
[2017-07-17 11:09:21.847064] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-vol_3-client-2: remote operation failed [Transport endpoint is not connected]
[2017-07-17 11:09:21.847072] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-vol_3-client-1: remote operation failed [Transport endpoint is not connected]
[2017-07-17 11:09:21.847099] W [MSGID: 114061] [client-common.c:460:client_pre_fsync] 0-vol_3-client-1:  (de308b78-ace3-40b5-bbb1-d658cdf1fb9c) remote_fd is -1. EBADFD [File descriptor in bad state]
[2017-07-17 11:09:21.847120] W [MSGID: 108035] [afr-transaction.c:2231:afr_changelog_fsync_cbk] 0-vol_3-replicate-0: fsync(de308b78-ace3-40b5-bbb1-d658cdf1fb9c) failed on subvolume vol_3-client-1. Transaction was WRITE [File descriptor in bad state]
[2017-07-17 11:09:21.847136] W [MSGID: 114061] [client-common.c:460:client_pre_fsync] 0-vol_3-client-2:  (de308b78-ace3-40b5-bbb1-d658cdf1fb9c) remote_fd is -1. EBADFD [File descriptor in bad state]
[2017-07-17 11:09:21.847151] W [MSGID: 108035] [afr-transaction.c:2231:afr_changelog_fsync_cbk] 0-vol_3-replicate-0: fsync(de308b78-ace3-40b5-bbb1-d658cdf1fb9c) failed on subvolume vol_3-client-2. Transaction was WRITE [File descriptor in bad state]
[2017-07-17 11:09:21.847166] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-vol_3-client-1: remote operation failed [Transport endpoint is not connected]
[2017-07-17 11:09:21.847185] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-vol_3-client-2: remote operation failed [Transport endpoint is not connected]
[2017-07-17 11:09:21.847246] I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up
The message "I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up" repeated 38282 times between [2017-07-17 11:09:21.847246] and [2017-07-17 11:09:30.576305]
[2017-07-17 11:09:30.576570] E [MSGID: 114058] [client-handshake.c:1537:client_query_portmap_cbk] 0-vol_3-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-07-17 11:09:30.576701] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_3-client-0: disconnected from vol_3-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-17 11:09:30.577094] I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up
The message "I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up" repeated 3457 times between [2017-07-17 11:09:30.577094] and [2017-07-17 11:09:32.579105]
[2017-07-17 11:09:32.579465] E [MSGID: 114058] [client-handshake.c:1537:client_query_portmap_cbk] 0-vol_3-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-07-17 11:09:32.579573] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_3-client-2: disconnected from vol_3-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-17 11:09:32.580279] I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up
[2017-07-17 11:09:32.581463] I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up
[2017-07-17 11:09:32.581700] E [MSGID: 114058] [client-handshake.c:1537:client_query_portmap_cbk] 0-vol_3-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-07-17 11:09:32.581782] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_3-client-1: disconnected from vol_3-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-17 11:09:32.582687] I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up
The message "I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up" repeated 61047 times between [2017-07-17 11:09:32.582687] and [2017-07-17 11:10:30.728966]
[2017-07-17 11:10:30.729836] I [MSGID: 108006] [afr-common.c:4874:afr_local_init] 0-vol_3-replicate-0: no subvolumes up



version:
3.8.4-33

Note You need to log in before you can comment on or make changes to this bug.