Bug 1127778 - DHT :- file/Directory creation fails with 'Input/output error' for all files/Directories hashing to one sub-volume
Summary: DHT :- file/Directory creation fails with 'Input/output error' for all file...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: Matt Zywusko
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-07 14:26 UTC by Rachana Patel
Modified: 2015-12-31 16:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-31 15:42:35 UTC


Attachments (Terms of Use)

Description Rachana Patel 2014-08-07 14:26:29 UTC
Description of problem:
=======================
file/Directory  creation fails with 'Input/output error'  for all files/Directories hashing to one sub-volume.
Brick log has errors:-
[2014-08-06 05:28:45.042158] C [inode.c:1151:__inode_path] 0-/brick1/inode: possible infinite loop detected, forcing break. name=((null
))


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.27-1.el6rhs.x86_64


How reproducible:
=================
only once


Steps to Reproduce:
===================
1. had distributed volume. removed all data from mount point. stopp volume and glusterd and upgraded gluster rpms
2. start volume and mount it again
3. create files and Directories from mount point.
Any creation hashing to one sub-volume fails with error


[root@OVM1 brick1]# touch abc3
touch: cannot touch `abc3': Input/output error

--> verified all bricks were up and log do not has disconnect error.
--> verified on bricks, it has no data on that sub-volume. Verified even .glusterfs Directory and it has not extra entry.


Actual results:
===============
any opration hashing to one particular error fails with I/O error


Expected results:
=================
creation and access of Directory and files should not fail if all sub-volumes are up

Additional info:
================

Work around:-
restarting brick process for that sub-volume solved this problem


brick log snippet:-
[2014-08-06 05:28:45.042158] C [inode.c:1151:__inode_path] 0-/brick1/inode: possible infinite loop detected, forcing break. name=((null
))
[2014-08-06 05:28:45.043287] C [inode.c:1151:__inode_path] 0-/brick1/inode: possible infinite loop detected, forcing break. name=(abc3)
[2014-08-06 05:28:45.043505] E [posix.c:134:posix_lookup] 0-brick1-posix: null gfid for path (null)
[2014-08-06 05:28:45.043543] E [posix.c:151:posix_lookup] 0-brick1-posix: lstat on (null) failed: Success
[2014-08-06 05:28:45.043571] E [server-rpc-fops.c:183:server_lookup_cbk] 0-brick1-server: 3415: LOOKUP (null) (00000000-0000-0000-0000-
000000000001/abc3) ==> (Success)
[2014-08-06 05:28:45.044380] C [inode.c:1151:__inode_path] 0-/brick1/inode: possible infinite loop detected, forcing break. name=(abc3)
[2014-08-06 05:28:45.044623] E [posix.c:134:posix_lookup] 0-brick1-posix: null gfid for path (null)
[2014-08-06 05:28:45.044658] E [posix.c:151:posix_lookup] 0-brick1-posix: lstat on (null) failed: Success
[2014-08-06 05:28:45.044686] E [server-rpc-fops.c:183:server_lookup_cbk] 0-brick1-server: 3416: LOOKUP (null) (00000000-0000-0000-0000-
000000000001/abc3) ==> (Success)
[2014-08-06 05:28:54.934111] C [inode.c:1151:__inode_path] 0-/brick1/inode: possible infinite loop detected, forcing break. name=((null
))
[2014-08-06 05:28:54.935288] C [inode.c:1151:__inode_path] 0-/brick1/inode: possible infinite loop detected, forcing break. name=(abc3)
[2014-08-06 05:28:54.935498] E [posix.c:134:posix_lookup] 0-brick1-posix: null gfid for path (null)
[2014-08-06 05:28:54.935551] E [posix.c:151:posix_lookup] 0-brick1-posix: lstat on (null) failed: Success
[2014-08-06 05:28:54.935580] E [server-rpc-fops.c:183:server_lookup_cbk] 0-brick1-server: 3418: LOOKUP (null) (00000000-0000-0000-0000-
000000000001/abc3) ==> (Success)

Comment 3 Nithya Balachandran 2014-08-12 10:37:28 UTC
Rachana, is this reproducible on a fresh setup? If it is, please let us know so we can gdb into the process and see what is going wrong.

Comment 4 Pranith Kumar K 2014-08-13 08:45:10 UTC
posix_health_check_thread_proc keeps performing stat on the brick root every 30 seconds. I see that stat on brick on that xfs partition is returning Input/Output error. Something must have happened to xfs partition?
I see the following logs in the bricks. What happened to the xfs partition is something we need to debug if we have dmesg or /var/log/messages on the machines. That info is not attached to the sosreports.

14:09:40 :) ⚡ grep-bricks -i "posix_health_check_thread_proc" | grep "2014-08"
172/var/log/glusterfs/bricks/brick0.log:[2014-08-06 08:13:27.557717] W [posix-helpers.c:1427:posix_health_check_thread_proc] 0-test1-posix: stat() on /brick0 returned: Input/output error
172/var/log/glusterfs/bricks/brick0.log:[2014-08-06 08:13:27.557800] M [posix-helpers.c:1447:posix_health_check_thread_proc] 0-test1-posix: health-check failed, going down
172/var/log/glusterfs/bricks/brick0.log:[2014-08-06 08:13:57.558133] M [posix-helpers.c:1452:posix_health_check_thread_proc] 0-test1-posix: still alive! -> SIGTERM
198/var/log/glusterfs/bricks/brick3-n1.log:[2014-08-06 05:54:28.153734] W [posix-helpers.c:1427:posix_health_check_thread_proc] 0-new-posix: stat() on /brick3/n1 returned: No such file or directory
198/var/log/glusterfs/bricks/brick3-n1.log:[2014-08-06 05:54:28.153771] M [posix-helpers.c:1447:posix_health_check_thread_proc] 0-new-posix: health-check failed, going down
198/var/log/glusterfs/bricks/brick3-n1.log:[2014-08-06 05:54:58.153968] M [posix-helpers.c:1452:posix_health_check_thread_proc] 0-new-posix: still alive! -> SIGTERM
198/var/log/glusterfs/bricks/brick3-n2.log:[2014-08-06 05:54:28.153576] W [posix-helpers.c:1427:posix_health_check_thread_proc] 0-new-posix: stat() on /brick3/n2 returned: No such file or directory
198/var/log/glusterfs/bricks/brick3-n2.log:[2014-08-06 05:54:28.153614] M [posix-helpers.c:1447:posix_health_check_thread_proc] 0-new-posix: health-check failed, going down
198/var/log/glusterfs/bricks/brick3-n2.log:[2014-08-06 05:54:58.153873] M [posix-helpers.c:1452:posix_health_check_thread_proc] 0-new-posix: still alive! -> SIGTERM
198/var/log/glusterfs/bricks/brick3-n3.log:[2014-08-06 05:54:28.153504] W [posix-helpers.c:1427:posix_health_check_thread_proc] 0-new-posix: stat() on /brick3/n3 returned: No such file or directory
198/var/log/glusterfs/bricks/brick3-n3.log:[2014-08-06 05:54:28.153605] M [posix-helpers.c:1447:posix_health_check_thread_proc] 0-new-posix: health-check failed, going down
198/var/log/glusterfs/bricks/brick3-n3.log:[2014-08-06 05:54:58.153875] M [posix-helpers.c:1452:posix_health_check_thread_proc] 0-new-posix: still alive! -> SIGTERM
198/var/log/glusterfs/bricks/brick3-n4.log:[2014-08-06 05:54:28.159848] W [posix-helpers.c:1427:posix_health_check_thread_proc] 0-new-posix: stat() on /brick3/n4 returned: No such file or directory
198/var/log/glusterfs/bricks/brick3-n4.log:[2014-08-06 05:54:28.159894] M [posix-helpers.c:1447:posix_health_check_thread_proc] 0-new-posix: health-check failed, going down
198/var/log/glusterfs/bricks/brick3-n4.log:[2014-08-06 05:54:58.160139] M [posix-helpers.c:1452:posix_health_check_thread_proc] 0-new-posix: still alive! -> SIGTERM
240/var/log/glusterfs/bricks/brick0.log:[2014-08-06 09:45:45.661795] W [posix-helpers.c:1427:posix_health_check_thread_proc] 0-test1-posix: stat() on /brick0 returned: Input/output error
240/var/log/glusterfs/bricks/brick0.log:[2014-08-06 09:45:45.661843] M [posix-helpers.c:1447:posix_health_check_thread_proc] 0-test1-posix: health-check failed, going down
240/var/log/glusterfs/bricks/brick0.log:[2014-08-06 09:46:15.662140] M [posix-helpers.c:1452:posix_health_check_thread_proc] 0-test1-posix: still alive! -> SIGTERM

Comment 7 Nithya Balachandran 2015-12-31 15:42:35 UTC
This issue has not been seen again so moving this to WorksForMe. Please reopen if seen again.


Note You need to log in before you can comment on or make changes to this bug.