| Summary: | Hang in quick-read | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Raghavendra G <raghavendra> | ||||||
| Component: | io-cache | Assignee: | Raghavendra G <raghavendra> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 2.0.7 | CC: | afmachado1963, anush, gluster-bugs | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | --- | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Created attachment 79 [details]
Test case and further discussion from reporter
pasting mail content verbatim as reported by user on gluster-users Hello, I am trying to use the newly documented quick-read translator but the cluster locks up now. Substituted the read-ahead translator for this new one. Previously was using debian 2.0.4 package and today packaged the 2.0.7 here (unnoficial packages). When using the quick read for bursts, glusterfs locks up. Need to kill glusterfs server then umount, then restart nodes. The log file shows, for a simple "touch filename": # tail -50 /var/log/glusterfs/-etc-glusterfs-glusterfsd.vol.log 72: # option transport.ib-verbs.work-request-recv-size 131072 73: # option transport.ib-verbs.work-request-recv-count 64 74: 75: # option client-volume-filename /etc/glusterfs/glusterfs-client.vol 76: subvolumes brick 77: # NOTE: Access to any volume through protocol/server is denied by 78: # default. You need to explicitly grant access through # "auth" 79: # option. 80: option auth.addr.brick.allow * # Allow access to "brick" volume 81: end-volume +------------------------------------------------------------------------------+ [2009-10-13 15:00:47] N [glusterfsd.c:1315:main] glusterfs: Successfully started [2009-10-13 15:01:02] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.200.113.170:1023 [2009-10-13 15:01:02] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.200.113.170:1022 pending frames: frame : type(1) op(LOOKUP) patchset: v2.0.7 signal received: 11 time of crash: 2009-10-13 15:01:39 configuration details: argp 1 backtrace 1 bdb->cursor->get 1 db.h 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 2.0.7 /lib/libc.so.6[0x7f47e8317f60] /lib/libc.so.6(memcpy+0x2b0)[0x7f47e8363160] /usr/lib/glusterfs/2.0.7/xlator/performance/io-cache.so(ioc_lookup_cbk+0x2a3)[0x7f47e70ab573] /usr/lib/libglusterfs.so.0[0x7f47e8a6de54] /usr/lib/glusterfs/2.0.7/xlator/performance/io-threads.so(iot_lookup_cbk+0x34)[0x7f47e74bed14] /usr/lib/libglusterfs.so.0[0x7f47e8a6de54] /usr/lib/glusterfs/2.0.7/xlator/storage/posix.so(posix_lookup+0x2fe)[0x7f47e78e185e] /usr/lib/libglusterfs.so.0(default_lookup+0xb1)[0x7f47e8a71371] /usr/lib/glusterfs/2.0.7/xlator/performance/io-threads.so(iot_lookup_wrapper+0xb1)[0x7f47e74c2291] /usr/lib/libglusterfs.so.0(call_resume+0x1cb)[0x7f47e8a7792b] /usr/lib/glusterfs/2.0.7/xlator/performance/io-threads.so(iot_worker_unordered+0x18)[0x7f47e74bfe48] /lib/libpthread.so.0[0x7f47e863ffc7] /lib/libc.so.6(clone+0x6d)[0x7f47e83b55ad] --------- debian459140:~# # tail -25 /var/log/glusterfs/php_sessions-.log 105: ### Add writeback feature 106: volume writeback 107: type performance/write-behind 108: # option aggregate-size 2MB # deprecated option 109: option cache-size 500MB # default is equal to aggregate-size 110: option flush-behind off # default is 'off' 111: # too aggressive and slow background flush! 112: # do not enable for php sessions behaviour 113: subvolumes iocache 114: end-volume +------------------------------------------------------------------------------+ [2009-10-13 15:01:02] N [glusterfsd.c:1315:main] glusterfs: Successfully started [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote1: Connected to 10.200.113.170:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote1: Connected to 10.200.113.170:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote2: Connected to 10.200.113.171:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote2: Connected to 10.200.113.171:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote3: Connected to 10.200.113.172:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote3: Connected to 10.200.113.172:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote4: Connected to 10.200.113.173:6996, attached to remote volume 'brick'. [2009-10-13 15:01:02] N [client-protocol.c:5730:client_setvolume_cbk] remote4: Connected to 10.200.113.173:6996, attached to remote volume 'brick'. [2009-10-13 15:01:39] E [saved-frames.c:165:saved_frames_unwind] remote1: forced unwinding frame type(1) op(LOOKUP) [2009-10-13 15:01:39] N [client-protocol.c:6435:notify] remote1: disconnected [2009-10-13 15:01:42] E [socket.c:745:socket_connect_finish] remote1: connection to 10.200.113.170:6996 failed (Connection refused) [2009-10-13 15:01:42] E [socket.c:745:socket_connect_finish] remote1: connection to 10.200.113.170:6996 failed (Connection refused) io-cache tries to cache the data obtained through glusterfs.content in xattrs during lookup_cbk. But, for directories, data will not be present and io-cache retries to get the data by sending lookup once again resulting in an infinite loop. As a fix the caching done by io-cache during lookup has to be removed since quick-read does exactly the same work. PATCH: http://patches.gluster.com/patch/1906 in master (performance/io-cache: remove caching in lookup.) PATCH: http://patches.gluster.com/patch/1907 in release-2.0 (performance/io-cache: remove caching in lookup.) |
Created attachment 78 [details] Patch to fix problem