Bug 762046 (GLUSTER-314)

Summary: Hang in quick-read
Product: [Community] GlusterFS Reporter: Raghavendra G <raghavendra>
Component: io-cacheAssignee: Raghavendra G <raghavendra>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 2.0.7CC: afmachado1963, anush, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
client volume spec file
none
server volume spec file none

Description Raghavendra G 2009-10-14 03:57:44 UTC
Created attachment 78 [details]
Patch to fix problem

Comment 1 Raghavendra G 2009-10-14 03:58:47 UTC
Created attachment 79 [details]
Test case and further discussion from reporter

Comment 2 Raghavendra G 2009-10-14 06:56:54 UTC
pasting mail content verbatim as reported by user on gluster-users
Hello,
I am trying to use the newly documented quick-read translator but the cluster
locks up now.
Substituted the read-ahead translator for this new one.
Previously was using debian 2.0.4 package and today packaged the 2.0.7 here
(unnoficial packages).
When using the quick read for bursts, glusterfs locks up. Need to kill glusterfs
server then umount, then restart nodes.
The log file shows, for a simple "touch filename":

# tail -50 /var/log/glusterfs/-etc-glusterfs-glusterfsd.vol.log


 72: # option transport.ib-verbs.work-request-recv-size  131072
 73: # option transport.ib-verbs.work-request-recv-count 64
 74:
 75: # option client-volume-filename /etc/glusterfs/glusterfs-client.vol
 76:   subvolumes brick
 77: # NOTE: Access to any volume through protocol/server is denied by
 78: # default. You need to explicitly grant access through # "auth"
 79: # option.
 80:   option auth.addr.brick.allow * # Allow access to "brick" volume
 81: end-volume

+------------------------------------------------------------------------------+
[2009-10-13 15:00:47] N [glusterfsd.c:1315:main] glusterfs: Successfully started
[2009-10-13 15:01:02] N [server-protocol.c:7065:mop_setvolume] server: accepted
client from 10.200.113.170:1023
[2009-10-13 15:01:02] N [server-protocol.c:7065:mop_setvolume] server: accepted
client from 10.200.113.170:1022
pending frames:
frame : type(1) op(LOOKUP)

patchset: v2.0.7
signal received: 11
time of crash: 2009-10-13 15:01:39
configuration details:
argp 1
backtrace 1
bdb->cursor->get 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.7
/lib/libc.so.6[0x7f47e8317f60]
/lib/libc.so.6(memcpy+0x2b0)[0x7f47e8363160]
/usr/lib/glusterfs/2.0.7/xlator/performance/io-cache.so(ioc_lookup_cbk+0x2a3)[0x7f47e70ab573]
/usr/lib/libglusterfs.so.0[0x7f47e8a6de54]
/usr/lib/glusterfs/2.0.7/xlator/performance/io-threads.so(iot_lookup_cbk+0x34)[0x7f47e74bed14]
/usr/lib/libglusterfs.so.0[0x7f47e8a6de54]
/usr/lib/glusterfs/2.0.7/xlator/storage/posix.so(posix_lookup+0x2fe)[0x7f47e78e185e]
/usr/lib/libglusterfs.so.0(default_lookup+0xb1)[0x7f47e8a71371]
/usr/lib/glusterfs/2.0.7/xlator/performance/io-threads.so(iot_lookup_wrapper+0xb1)[0x7f47e74c2291]
/usr/lib/libglusterfs.so.0(call_resume+0x1cb)[0x7f47e8a7792b]
/usr/lib/glusterfs/2.0.7/xlator/performance/io-threads.so(iot_worker_unordered+0x18)[0x7f47e74bfe48]
/lib/libpthread.so.0[0x7f47e863ffc7]
/lib/libc.so.6(clone+0x6d)[0x7f47e83b55ad]
---------
debian459140:~#



# tail -25 /var/log/glusterfs/php_sessions-.log


105: ### Add writeback feature
106: volume writeback
107:   type performance/write-behind
108: #  option aggregate-size 2MB       # deprecated option
109:   option cache-size 500MB  # default is equal to aggregate-size
110:   option flush-behind off          # default is 'off'
111:                            # too aggressive and slow background flush!
112:                            # do not enable for php sessions behaviour
113:   subvolumes iocache
114: end-volume

+------------------------------------------------------------------------------+
[2009-10-13 15:01:02] N [glusterfsd.c:1315:main] glusterfs: Successfully started
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote1:
Connected to 10.200.113.170:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote1:
Connected to 10.200.113.170:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote2:
Connected to 10.200.113.171:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote2:
Connected to 10.200.113.171:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote3:
Connected to 10.200.113.172:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote3:
Connected to 10.200.113.172:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote4:
Connected to 10.200.113.173:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote4:
Connected to 10.200.113.173:6996, attached to remote volume 'brick'.
[2009-10-13 15:01:39] E [saved-frames.c:165:saved_frames_unwind] remote1: forced
unwinding frame type(1) op(LOOKUP)
[2009-10-13 15:01:39] N [client-protocol.c:6435:notify] remote1: disconnected
[2009-10-13 15:01:42] E [socket.c:745:socket_connect_finish] remote1: connection
to 10.200.113.170:6996 failed (Connection refused)
[2009-10-13 15:01:42] E [socket.c:745:socket_connect_finish] remote1: connection
to 10.200.113.170:6996 failed (Connection refused)

Comment 3 Raghavendra G 2009-10-14 07:52:03 UTC
io-cache tries to cache the data obtained through glusterfs.content in xattrs during lookup_cbk. But, for directories, data will not be present and io-cache retries to get the data by sending lookup once again resulting in an infinite loop. 

As a fix the caching done by io-cache during lookup has to be removed since quick-read does exactly the same work.

Comment 4 Anand Avati 2009-10-15 13:26:37 UTC
PATCH: http://patches.gluster.com/patch/1906 in master (performance/io-cache: remove caching in lookup.)

Comment 5 Anand Avati 2009-10-15 13:26:40 UTC
PATCH: http://patches.gluster.com/patch/1907 in release-2.0 (performance/io-cache: remove caching in lookup.)

Comment 6 Vijay Bellur 2009-11-30 10:02:22 UTC
*** Bug 302 has been marked as a duplicate of this bug. ***