Bug 893778
Summary: | Gluster 3.3.1 NFS service died after writing bunch of data | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Rob <robinr> | ||||||
Component: | nfs | Assignee: | Vivek Agarwal <vagarwal> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.3.0 | CC: | gluster-bugs, jlu, nock, rhs-bugs, sankarshan, shaines, spradhan, vbellur, wica128 | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 902857 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-08-29 18:46:52 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 902857, 998649 | ||||||||
Attachments: |
|
Description
Rob
2013-01-09 21:51:07 UTC
### The following info is set Also, I'm getting a lot of "quota context not set in inode" # gluster volume info RedhawkShared Volume Name: RedhawkShared Type: Replicate Volume ID: f9b943f8-dcb9-448f-a8b2-795d3c19ef3d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mualglup01:/mnt/gluster/RedhawkShared Brick2: mualglup02:/mnt/gluster/RedhawkShared Options Reconfigured: nfs.register-with-portmap: 1 nfs.disable: off auth.allow: 10.0.72.135,10.0.93.*,192.168.251.* features.quota: on features.limit-usage: /robintest:1MB I have 2 nodes: the other replica node is mistakenly running at 3.3.0. It did not have the this NFS problems. The nodes with the following RPMs have issues: glusterfs-server-3.3.1-1.el6.x86_64 glusterfs-fuse-3.3.1-1.el6.x86_64 glusterfs-3.3.1-1.el6.x86_64 The node with the following RPMs did _NOT_ have issues: glusterfs-3.3.0-1.el6.x86_64 glusterfs-server-3.3.0-1.el6.x86_64 glusterfs-fuse-3.3.0-1.el6.x86_64 I have the core dumps. I can attach it if you want it. Platform is RHEL 6.2. (In reply to comment #4) > I have the core dumps. I can attach it if you want it. Platform is RHEL 6.2. Yes, please provide the core file(s). Thanks Created attachment 685250 [details]
core dump file (as requested)
Created attachment 685251 [details]
another core dump file (as requested)
I am seeing this behaviour on one of my nodes as well. 2 upgraded successfully, one fails to start the NFS service with a very similar backtrace. (In reply to comment #8) > I am seeing this behaviour on one of my nodes as well. 2 upgraded > successfully, one fails to start the NFS service with a very similar > backtrace. Reverting to 3.3.0-1 fixes the problem (the NFS server stays online). All 3.3.1-1 NFS servers eventually crashed. I've had to revert them all to 3.3.0-1. Given volfile: +------------------------------------------------------------------------------+ 1: volume mirror-client-0 2: type protocol/client 3: option remote-host lauterbur 4: option remote-subvolume /raid/mirror 5: option transport-type tcp 6: end-volume 7: 8: volume mirror-client-1 9: type protocol/client 10: option remote-host mansfield 11: option remote-subvolume /raid/mirror 12: option transport-type tcp 13: end-volume 14: 15: volume mirror-client-2 16: type protocol/client 17: option remote-host ogawa 18: option remote-subvolume /raid/mirror 19: option transport-type tcp 20: end-volume 21: 22: volume mirror-client-3 23: type protocol/client 24: option remote-host rabi 25: option remote-subvolume /raid/mirror 26: option transport-type tcp 27: end-volume 28: 29: volume mirror-client-4 30: type protocol/client 31: option remote-host rabi 32: option remote-subvolume /raid/mirror2 33: option transport-type tcp 34: end-volume 35: 36: volume mirror-client-5 37: type protocol/client 38: option remote-host ogawa 39: option remote-subvolume /raid/mirror2 40: option transport-type tcp 41: end-volume 42: 43: volume mirror-replicate-0 44: type cluster/replicate 45: subvolumes mirror-client-0 mirror-client-1 46: end-volume 47: 48: volume mirror-replicate-1 49: type cluster/replicate 50: subvolumes mirror-client-2 mirror-client-3 51: end-volume 52: 53: volume mirror-replicate-2 54: type cluster/replicate 55: subvolumes mirror-client-4 mirror-client-5 56: end-volume 57: 58: volume mirror-dht 59: type cluster/distribute 60: subvolumes mirror-replicate-0 mirror-replicate-1 mirror-replicate-2 61: end-volume 62: 63: volume mirror 64: type debug/io-stats 65: option latency-measurement off 66: option count-fop-hits off 67: subvolumes mirror-dht 68: end-volume 69: 70: volume stripe-client-0 71: type protocol/client 72: option remote-host lauterbur 73: option remote-subvolume /raid/stripe 74: option transport-type tcp ...skipping... 60: subvolumes mirror-replicate-0 mirror-replicate-1 mirror-replicate-2 61: end-volume 62: 63: volume mirror 64: type debug/io-stats 65: option latency-measurement off 66: option count-fop-hits off 67: subvolumes mirror-dht 68: end-volume 69: 70: volume stripe-client-0 71: type protocol/client 72: option remote-host lauterbur 73: option remote-subvolume /raid/stripe 74: option transport-type tcp 75: end-volume 76: 77: volume stripe-client-1 78: type protocol/client 79: option remote-host mansfield 80: option remote-subvolume /raid/stripe 81: option transport-type tcp 82: end-volume 83: 84: volume stripe-dht 85: type cluster/distribute 86: subvolumes stripe-client-0 stripe-client-1 87: end-volume 88: 89: volume stripe 90: type debug/io-stats 91: option latency-measurement off 92: option count-fop-hits off 93: subvolumes stripe-dht 94: end-volume 95: 96: volume nfs-server 97: type nfs/server 98: option nfs.dynamic-volumes on 99: option nfs.nlm on 100: option rpc-auth.addr.stripe.allow 127.0.0.1,10.1.3.*,10.1.2.* 101: option nfs3.stripe.volume-id 7bc6050a-6846-4aae-bfd1-af5930efe95f 102: option rpc-auth.addr.mirror.allow 127.0.0.1,10.1.3.*,10.1.2.* 103: option nfs3.mirror.volume-id 2361e511-42a2-4d95-a99b-a1461236f78c 104: option nfs.enable-ino32 yes 105: option rpc-auth.addr.namelookup off 106: option rpc-auth.ports.stripe.insecure on 107: option rpc-auth.ports.mirror.insecure on 108: option nfs3.mirror.trusted-sync on 109: subvolumes stripe mirror 110: end-volume +------------------------------------------------------------------------------+ [2013-01-24 16:48:57.775744] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-stripe-client-0: changing port to 24010 (from 0) [2013-01-24 16:48:57.775813] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-mirror-client-0: changing port to 24009 (from 0) [2013-01-24 16:48:57.775878] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-stripe-client-1: changing port to 24010 (from 0) [2013-01-24 16:48:57.775948] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-mirror-client-1: changing port to 24009 (from 0) [2013-01-24 16:48:57.775989] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-mirror-client-3: changing port to 24010 (from 0) [2013-01-24 16:48:57.776067] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-mirror-client-2: changing port to 24010 (from 0) [2013-01-24 16:48:57.776107] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-mirror-client-5: changing port to 24011 (from 0) [2013-01-24 16:48:57.776170] I [rpc-clnt.c:1657:rpc_clnt_reconfig] 0-mirror-client-4: changing port to 24011 (from 0) [2013-01-24 16:48:58.416280] E [nfs3.c:1549:nfs3_access] 0-nfs-nfsv3: Volume is disabled: stripe pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2013-01-24 16:48:58 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.1 /lib64/libc.so.6(+0x36300)[0x7fa8c5232300] /lib64/libc.so.6(gsignal+0x35)[0x7fa8c5232285] /lib64/libc.so.6(abort+0x17b)[0x7fa8c5233b9b] /lib64/libc.so.6(+0x77a7e)[0x7fa8c5273a7e] /lib64/libc.so.6(__fortify_fail+0x37)[0x7fa8c5304af7] /lib64/libc.so.6(__fortify_fail+0x0)[0x7fa8c5304ac0] /usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(+0x29799)[0x7fa8c0f55799] /usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x258)[0x7fa8c5f8b1c8] /usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x9b)[0x7fa8c5f8b7fb] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x7fa8c5f8f367] /usr/lib64/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7fa8c24b5c64] /usr/lib64/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_handler+0xc7)[0x7fa8c24b5fb7] /usr/lib64/libglusterfs.so.0(+0x3e8f7)[0x7fa8c61d98f7] /usr/sbin/glusterfs(main+0x34d)[0x40478d] /lib64/libc.so.6(__libc_start_main+0xed)[0x7fa8c521d69d] /usr/sbin/glusterfs[0x404a55] --------- *** Bug 893779 has been marked as a duplicate of this bug. *** Any updates or a link to a patch in the code-review site? Hi, I have seen this bug in the mailing list. and wanted report that I don't have this issue wth glusterfs 3.3.1 and nfs. Running 3.3.1 since 26 Oct 2012. using it for running vm in a vSphere cluster and for data storage. Maybe, on thing,I use roundrobin in my DNS for the nfs hostname. Zo every peer can be a nfs server, at anytime. OS: Ubuntu 12.04.1 packages from http://ppa.launchpad.net/semiosis/ubuntu-glusterfs-3.3/ubuntu Volume info: gluster volume info glusterfsvol01 Volume Name: glusterfsvol01 Type: Distributed-Replicate Volume ID: 1013b94c-7299-46b5-907a-fe7f2ae51f0b Status: Started Number of Bricks: 18 x 2 = 36 Transport-type: tcp Bricks: Brick1: gluster-brick-01n1:/export/vol1 Brick2: gluster-brick-02n1:/export/vol1 Brick3: gluster-brick-03n1:/export/vol1 Brick4: gluster-brick-01n2:/export/vol1 Brick5: gluster-brick-02n2:/export/vol1 Brick6: gluster-brick-03n2:/export/vol1 Brick7: gluster-brick-01n1:/export/vol2 Brick8: gluster-brick-02n1:/export/vol2 Brick9: gluster-brick-03n1:/export/vol2 Brick10: gluster-brick-01n2:/export/vol2 Brick11: gluster-brick-02n2:/export/vol2 Brick12: gluster-brick-03n2:/export/vol2 Brick13: gluster-brick-01n1:/export/vol3 Brick14: gluster-brick-02n1:/export/vol3 Brick15: gluster-brick-03n1:/export/vol3 Brick16: gluster-brick-01n2:/export/vol3 Brick17: gluster-brick-02n2:/export/vol3 Brick18: gluster-brick-03n2:/export/vol3 Brick19: gluster-brick-01n1:/export/vol4 Brick20: gluster-brick-02n1:/export/vol4 Brick21: gluster-brick-03n1:/export/vol4 Brick22: gluster-brick-01n2:/export/vol4 Brick23: gluster-brick-02n2:/export/vol4 Brick24: gluster-brick-03n2:/export/vol4 Brick25: gluster-brick-01n1:/export/vol5 Brick26: gluster-brick-02n1:/export/vol5 Brick27: gluster-brick-03n1:/export/vol5 Brick28: gluster-brick-01n2:/export/vol5 Brick29: gluster-brick-02n2:/export/vol5 Brick30: gluster-brick-03n2:/export/vol5 Brick31: gluster-brick-01n1:/export/vol6 Brick32: gluster-brick-02n1:/export/vol6 Brick33: gluster-brick-03n1:/export/vol6 Brick34: gluster-brick-01n2:/export/vol6 Brick35: gluster-brick-02n2:/export/vol6 Brick36: gluster-brick-03n2:/export/vol6 Options Reconfigured: diagnostics.client-sys-log-level: WARNING diagnostics.brick-sys-log-level: WARNING diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.cache-size: 134217728 performance.io-thread-count: 64 performance.write-behind-window-size: 256MB performance.io-cache: on performance.read-ahead: on auth.allow: 172.16.* nfs.disable: off I am using the Fedora Packages from gluster.org. I too use DNS roundrobin, however the NFS process on all servers crashes (eventually, usually under an hour). I am willing to provide more information, but I don't know how to proceed. Was anything discovered in the cores posted above or from the backtrace? *** This bug has been marked as a duplicate of bug 1002385 *** |