Bug 1395054 - I/O Errors seen when one of the node ,hosting one of brick, is down
Summary: I/O Errors seen when one of the node ,hosting one of brick, is down
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Mohit Agrawal
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-15 03:26 UTC by SATHEESARAN
Modified: 2016-11-23 10:05 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-23 10:05:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
part of the fuse mount log with DEBUG messages (41.48 KB, text/plain)
2016-11-15 04:28 UTC, SATHEESARAN
no flags Details
fuse mount logs (40.93 KB, text/plain)
2016-11-17 11:04 UTC, SATHEESARAN
no flags Details

Description SATHEESARAN 2016-11-15 03:26:14 UTC
Description of problem:
-----------------------
In a cluster of 3 nodes serving replica 3 volume, when glusterd is down in one of the node, then few file creation on the fresh fuse mounts mounted using any of the other 2 servers as volfile servers, encounters I/O errors

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.2.0 interim build ( glusterfs-3.8.4-5.el7rhgs )
RHEL 7.3

How reproducible:
-----------------
Always on few files ( not on all files )

Steps to Reproduce:
-------------------
1. create a replica 3 volume and start the volume
2. Optimized the volume for VM store usecase
3. stop glusterd on server1
4. Fuse mount the volume using server3
5. Create files on the fuse mount

Actual results:
---------------
Few file always encounter I/O error

Expected results:
-----------------
There shouldn't be any I/O Error

Additional info:
----------------
1. Initially I tested this with libgfapi but I could also hit the same with fuse,
which means this issue is not from libgfapi
2. I have tested it with plain distribute volume with 3 bricks and the issue is **not** reproducible.

Hence should be replica 3 issue or replica 3 volume tunables issue.

Comment 2 SATHEESARAN 2016-11-15 04:28:58 UTC
Created attachment 1220673 [details]
part of the fuse mount log with DEBUG messages

Comment 3 SATHEESARAN 2016-11-15 04:31:03 UTC
Part of error messages in fuse mount log

<snip>
[2016-11-15 02:58:50.069035] D [MSGID: 0] [dht-common.c:725:dht_lookup_dir_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-dht returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.069048] D [MSGID: 0] [shard.c:916:shard_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-shard returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.069065] D [MSGID: 0] [defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-io-threads returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.069087] D [MSGID: 0] [io-stats.c:2172:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.069307] D [MSGID: 0] [io-threads.c:353:iot_schedule] 0-rep3vol-io-threads: LOOKUP scheduled as fast fop
[2016-11-15 02:58:50.069391] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-rep3vol-dht: no subvolume for hash (value) = 3721687232
[2016-11-15 02:58:50.069415] D [MSGID: 0] [dht-helper.c:787:dht_subvol_get_hashed] 0-rep3vol-dht: No hashed subvolume for path=/vme1.img
[2016-11-15 02:58:50.069440] D [MSGID: 0] [dht-common.c:2471:dht_lookup] 0-rep3vol-dht: no subvolume in layout for path=/vme1.img, checking on all the subvols to see if it is a directory
[2016-11-15 02:58:50.069451] D [MSGID: 0] [dht-common.c:2485:dht_lookup] 0-rep3vol-dht: Found null hashed subvol. Calling lookup on all nodes.
[2016-11-15 02:58:50.069473] D [MSGID: 0] [client.c:546:client_lookup] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-client-0 returned -1 error: Transport endpoint is not connected [Transport endpoint is not connected]
[2016-11-15 02:58:50.069507] D [MSGID: 0] [dht-common.c:655:dht_lookup_dir_cbk] 0-rep3vol-dht: lookup of /vme1.img on rep3vol-client-0 returned error [Transport endpoint is not connected]
[2016-11-15 02:58:50.070512] D [MSGID: 0] [client-rpc-fops.c:2946:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-client-1 returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.070539] D [MSGID: 0] [dht-common.c:655:dht_lookup_dir_cbk] 0-rep3vol-dht: lookup of /vme1.img on rep3vol-client-1 returned error [No such file or directory]
[2016-11-15 02:58:50.070682] D [MSGID: 0] [client-rpc-fops.c:2946:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-client-2 returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.070701] D [MSGID: 0] [dht-common.c:655:dht_lookup_dir_cbk] 0-rep3vol-dht: lookup of /vme1.img on rep3vol-client-2 returned error [No such file or directory]
[2016-11-15 02:58:50.070723] E [dht-helper.c:1666:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x215dd) [0x7f5d522d25dd] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x36d6e) [0x7f5d52061d6e] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0xa835) [0x7f5d52035835] ) 0-rep3vol-dht: invalid argument: inode [Invalid argument]
[2016-11-15 02:58:50.070736] D [MSGID: 0] [dht-common.c:725:dht_lookup_dir_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-dht returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.070746] D [MSGID: 0] [shard.c:916:shard_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-shard returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.070765] D [MSGID: 0] [defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-io-threads returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.070786] D [MSGID: 0] [io-stats.c:2172:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol returned -1 error: No such file or directory [No such file or directory]
[2016-11-15 02:58:50.070803] D [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/vme1.img: failed to resolve (No such file or directory)
[2016-11-15 02:58:50.070863] D [MSGID: 0] [io-threads.c:353:iot_schedule] 0-rep3vol-io-threads: CREATE scheduled as normal fop
[2016-11-15 02:58:50.070998] D [MSGID: 0] [client.c:1205:client_statfs] 0-stack-trace: stack-address: 0x7f5d5d83006c, rep3vol-client-0 returned -1 error: Transport endpoint is not connected [Transport endpoint is not connected]
[2016-11-15 02:58:50.071044] W [MSGID: 109075] [dht-diskusage.c:44:dht_du_info_cbk] 0-rep3vol-dht: failed to get disk info from rep3vol-client-0 [Transport endpoint is not connected]
[2016-11-15 02:58:50.071150] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-rep3vol-dht: no subvolume for hash (value) = 3721687232
[2016-11-15 02:58:50.071166] D [MSGID: 0] [dht-helper.c:787:dht_subvol_get_hashed] 0-rep3vol-dht: No hashed subvolume for path=/vme1.img
[2016-11-15 02:58:50.071174] E [MSGID: 109011] [dht-common.c:6876:dht_create] 0-rep3vol-dht: no subvolume in layout for path=/vme1.img
[2016-11-15 02:58:50.071187] D [MSGID: 0] [dht-common.c:6959:dht_create] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-dht returned -1 error: Input/output error [Input/output error]
[2016-11-15 02:58:50.071209] D [MSGID: 0] [shard.c:2858:shard_create_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-shard returned -1 error: Input/output error [Input/output error]
[2016-11-15 02:58:50.071222] D [MSGID: 0] [defaults.c:1194:default_create_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol-io-threads returned -1 error: Input/output error [Input/output error]
[2016-11-15 02:58:50.071239] D [MSGID: 0] [io-stats.c:1927:io_stats_create_cbk] 0-stack-trace: stack-address: 0x7f5d5d831620, rep3vol returned -1 error: Input/output error [Input/output error]
[2016-11-15 02:58:50.071257] W [fuse-bridge.c:2003:fuse_create_cbk] 0-glusterfs-fuse: 4653: /vme1.img => -1 (Input/output error)
[2016-11-15 02:58:50.072005] D [MSGID: 0] [dht-diskusage.c:92:dht_du_info_cbk] 0-rep3vol-dht: subvolume 'rep3vol-client-2': avail_percent is: 99.00 and avail_space is: 106741989376 and avail_inodes is: 99.00
[2016-11-15 02:58:50.072041] D [MSGID: 0] [dht-diskusage.c:92:dht_du_info_cbk] 0-rep3vol-dht: subvolume 'rep3vol-client-1': avail_percent is: 99.00 and avail_space is: 106741985280 and avail_inodes is: 99.00
[2016-11-15 02:58:50.920269] D [MSGID: 0] [common-utils.c:335:gf_resolve_ip6] 0-resolver: returning ip-10.70.37.54 (port-24007) for hostname: dhcp37-54.lab.eng.blr.redhat.com and port: 24007
[2016-11-15 02:58:50.920296] D [socket.c:2899:socket_fix_ssl_opts] 0-rep3vol-client-0: disabling SSL for portmapper connection
[2016-11-15 02:58:50.920944] D [socket.c:683:__socket_shutdown] 0-rep3vol-client-0: shutdown() returned -1. Transport endpoint is not connected
[2016-11-15 02:58:50.920960] D [socket.c:728:__socket_disconnect] 0-rep3vol-client-0: __socket_teardown_connection () failed: Transport endpoint is not connected
[2016-11-15 02:58:50.920985] D [socket.c:2403:socket_event_handler] 0-transport: disconnecting now
</snip>

Comment 4 SATHEESARAN 2016-11-17 11:03:52 UTC
I initially thought that I have created a replica 3 volume, but in actual its a plain distribute volume with 3 bricks.

I have retested again with the following steps :

1. Created a plain distribute volume with 3 bricks and start the volume
2. Fuse mount the volume using server1 as volfile server and wrote 50 files using 'dd' command
3. Stopped the glusterd running on server1.
4. Umounted the volume
5. Mounted the volume back using server3 as volfile server.
6. Wrote another 50 files and observed I/O error for few files

Comment 5 SATHEESARAN 2016-11-17 11:04:27 UTC
Created attachment 1221524 [details]
fuse mount logs

Comment 7 Atin Mukherjee 2016-11-22 14:50:17 UTC
Sas,

This is expected as per the design. If we attempt to mount a gluster volume when one of the glusterd (say in Node 1) is down, client wouldn't be able to talk to the glusterd on Node 1 to fetch the volfile which is required to get to know about the port to which it has connect to the brick on the same host. And hence you would end up seeing transport end point not connected message in the client log for the file which hashes to the subvolume belong to N1. So if we go with the design this is an expected behaviour. We can either close this bug or at best, this can be considered as a future feature, but nothing in 3.2.0 timelines.

What do you think?

Comment 8 Mohit Agrawal 2016-11-23 04:21:53 UTC
Hi,

  I also think same,this is the limitation of current design.

Regards
Mohit Agrawal

Comment 9 Nithya Balachandran 2016-11-23 05:12:26 UTC
DHT will return EIO for create ops when the hashed subvol for the file is unavailable. This is the expected behaviour.

Comment 10 SATHEESARAN 2016-11-23 10:05:27 UTC
(In reply to Atin Mukherjee from comment #7)
> Sas,
> 
> This is expected as per the design. If we attempt to mount a gluster volume
> when one of the glusterd (say in Node 1) is down, client wouldn't be able to
> talk to the glusterd on Node 1 to fetch the volfile which is required to get
> to know about the port to which it has connect to the brick on the same
> host. And hence you would end up seeing transport end point not connected
> message in the client log for the file which hashes to the subvolume belong
> to N1. So if we go with the design this is an expected behaviour. We can
> either close this bug or at best, this can be considered as a future
> feature, but nothing in 3.2.0 timelines.
> 
> What do you think?

Initially went by the thought that why would write operation hit I/O error even though the brick is up. Now with the explanation, I can see that this is the limitation with the design.

I am closing this bug, as this bug would be addressed by glusterd 2.0 design.


Note You need to log in before you can comment on or make changes to this bug.