Hide Forgot
After a few weeks of error-free working, GlusterFS crashed. System: Ubuntu Lucid (2.6.32-28-generic) Mount type: NFS (default mount parameters) Configuration type: distribued-replica (4x2) Content of nfs.log file: [2011-02-16 14:37:10.491382] I [afr-common.c:819:afr_fresh_lookup_cbk] test_volume-replicate-1: added root inode [2011-02-16 14:37:10.491852] I [afr-common.c:819:afr_fresh_lookup_cbk] test_volume-replicate-2: added root inode [2011-02-16 14:37:10.492352] I [afr-common.c:819:afr_fresh_lookup_cbk] test_volume-replicate-3: added root inode [2011-02-16 16:14:09.924988] I [afr-common.c:613:afr_lookup_self_heal_check] test_volume-replicate-2: size differs for /onerepo/50cab8255b4c3a6ee377baf45fc61 e7c7cab6b32 [2011-02-16 16:14:09.925093] I [afr-common.c:716:afr_lookup_done] test_volume-replicate-2: background meta-data data self-heal triggered. path: /onerepo/50c ab8255b4c3a6ee377baf45fc61e7c7cab6b32 [2011-02-16 16:14:10.35264] I [afr-common.c:613:afr_lookup_self_heal_check] test_volume-replicate-2: size differs for /onerepo/75103af384304fdbffadfa377690f0 344fb532a3 [2011-02-16 16:14:10.35350] I [afr-common.c:716:afr_lookup_done] test_volume-replicate-2: background meta-data data self-heal triggered. path: /onerepo/7510 3af384304fdbffadfa377690f0344fb532a3 [2011-02-16 16:14:10.413155] I [afr-common.c:613:afr_lookup_self_heal_check] test_volume-replicate-2: size differs for /onerepo/e70ee7f945ad888426f55e6f0cdf3 9c091241603 [2011-02-16 16:14:10.413271] I [afr-common.c:716:afr_lookup_done] test_volume-replicate-2: background meta-data data self-heal triggered. path: /onerepo/e70 ee7f945ad888426f55e6f0cdf39c091241603 [2011-02-16 16:14:10.761177] I [afr-common.c:613:afr_lookup_self_heal_check] test_volume-replicate-2: size differs for /onerepo/0eaa253ac3e9dd9b5656828a4f7e7 486d6759850 [2011-02-16 16:14:10.761288] I [afr-common.c:716:afr_lookup_done] test_volume-replicate-2: background meta-data data self-heal triggered. path: /onerepo/0ea a253ac3e9dd9b5656828a4f7e7486d6759850 [2011-02-16 16:14:12.354160] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] test_volume-replicate-2: background meta-data data self-heal compl eted on /onerepo/75103af384304fdbffadfa377690f0344fb532a3 [2011-02-16 16:14:12.797746] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] test_volume-replicate-2: background meta-data data self-heal compl eted on /onerepo/e70ee7f945ad888426f55e6f0cdf39c091241603 [2011-02-16 16:23:31.949740] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=0) [2011-02-16 16:24:56.959805] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message [2011-02-16 16:24:56.959940] E [nfs3.c:522:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed [2011-02-16 16:24:56.961438] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message [2011-02-16 16:24:56.961553] E [nfs3.c:522:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed [2011-02-16 16:24:56.974049] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message [2011-02-16 16:24:56.974116] E [nfs3.c:522:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed [2011-02-16 16:24:57.46931] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=0) [2011-02-16 16:24:57.47068] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) [2011-02-16 16:24:57.47133] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) [2011-02-16 16:24:57.56965] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) [2011-02-16 16:24:57.57052] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) [2011-02-16 16:24:57.60621] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) [2011-02-16 16:24:57.60708] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) [2011-02-16 16:24:57.66769] E [event.c:722:event_select_on_epoll] epoll: index not found for fd=-1 (idx_hint=-1) pending frames: patchset: v3.1.1-64-gf2a067c signal received: 11 time of crash: 2011-02-16 16:24:57 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.2 /lib/libc.so.6(+0x33af0)[0x7f002031daf0] /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so(nfs_rpcsvc_submit_vectors+0x160)[0x7f001d89ef70] /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so(nfs3svc_submit_vector_reply+0xa4)[0x7f001d885b14] /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so(nfs3_read_reply+0xf5)[0x7f001d886915] /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so(nfs3svc_read_cbk+0x8a)[0x7f001d8898aa] /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so(nfs_fop_readv_cbk+0x53)[0x7f001d87baa3] /usr/lib/glusterfs/3.1.2/xlator/debug/io-stats.so(io_stats_readv_cbk+0x16f)[0x7f001dac4aef] /usr/lib/glusterfs/3.1.2/xlator/performance/quick-read.so(qr_readv_cbk+0xa6)[0x7f001dccd0a6] /usr/lib/glusterfs/3.1.2/xlator/performance/io-cache.so(ioc_frame_return+0x366)[0x7f001dedf356] /usr/lib/glusterfs/3.1.2/xlator/performance/io-cache.so(ioc_waitq_return+0x1c)[0x7f001dedf5ac] /usr/lib/glusterfs/3.1.2/xlator/performance/io-cache.so(ioc_fault_cbk+0x259)[0x7f001dee0a59] /usr/lib/glusterfs/3.1.2/xlator/performance/read-ahead.so(ra_readv_disabled_cbk+0x9b)[0x7f001e0e871b] /usr/lib/glusterfs/3.1.2/xlator/performance/write-behind.so(wb_readv_cbk+0xab)[0x7f001e2f633b] /usr/lib/glusterfs/3.1.2/xlator/cluster/distribute.so(dht_readv_cbk+0xd3)[0x7f001e50d793] /usr/lib/glusterfs/3.1.2/xlator/cluster/replicate.so(afr_readv_cbk+0x372)[0x7f001e73bcf2] /usr/lib/glusterfs/3.1.2/xlator/protocol/client.so(client3_1_readv_cbk+0x367)[0x7f001e995567] /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f0020cb1c15] /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)[0x7f0020cb1e69] /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7f0020cad02d] /usr/lib/glusterfs/3.1.2/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f0017b3f334] /usr/lib/glusterfs/3.1.2/rpc-transport/socket.so(socket_event_handler+0xb3)[0x7f0017b3f403] /usr/lib/libglusterfs.so.0(+0x38592)[0x7f0020ef1592] /usr/sbin/glusterfs(main+0x247)[0x405597] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f0020308c4d] /usr/sbin/glusterfs[0x4032a9] --------- [2011-02-16 22:31:55.556079] I [nfs.c:685:init] nfs: NFS service started [2011-02-16 22:31:55.556168] W [dict.c:1205:data_to_str] dict: @data=(nil) [2011-02-16 22:31:55.556182] W [dict.c:1205:data_to_str] dict: @data=(nil) In logs (dmesg) I've noticed call-trace (it occured probably while I tried copy some files from volume). Details below: [1241404.151335] Call Trace: [1241404.151359] [<ffffffff810f4290>] ? sync_page+0x0/0x50 [1241404.151377] [<ffffffff81542b73>] io_schedule+0x73/0xc0 [1241404.151390] [<ffffffff810f42cd>] sync_page+0x3d/0x50 [1241404.151404] [<ffffffff815432aa>] __wait_on_bit_lock+0x5a/0xc0 [1241404.151417] [<ffffffff810f4267>] __lock_page+0x67/0x70 [1241404.151432] [<ffffffff81084760>] ? wake_bit_function+0x0/0x40 [1241404.151445] [<ffffffff810fe682>] ? pagevec_lookup+0x22/0x30 [1241404.151458] [<ffffffff811000c6>] invalidate_inode_pages2_range+0x296/0x2b0 [1241404.151473] [<ffffffff811000f7>] invalidate_inode_pages2+0x17/0x20 [1241404.151511] [<ffffffffa022dcfb>] nfs_invalidate_mapping_nolock+0x2b/0xf0 [nfs] [1241404.151544] [<ffffffffa022ef27>] nfs_revalidate_mapping+0xc7/0xd0 [nfs] [1241404.151560] [<ffffffff81154dd9>] ? set_fd_set+0x49/0x60 [1241404.151588] [<ffffffffa022bb87>] nfs_file_read+0x77/0x130 [nfs] [1241404.151604] [<ffffffff811437fa>] do_sync_read+0xfa/0x140 [1241404.151617] [<ffffffff81084720>] ? autoremove_wake_function+0x0/0x40 [1241404.151633] [<ffffffff8133382b>] ? put_ldisc+0x5b/0xc0 [1241404.151646] [<ffffffff8132ded3>] ? tty_write+0x233/0x2a0 [1241404.151662] [<ffffffff81252db6>] ? security_file_permission+0x16/0x20 [1241404.151676] [<ffffffff81144115>] vfs_read+0xb5/0x1a0 [1241404.151688] [<ffffffff811442d1>] sys_read+0x51/0x80 [1241404.151704] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b [1241524.150152] INFO: task mc:1974 blocked for more than 120 seconds. [1241524.168217] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1241524.204990] mc D 0000000000000000 0 1974 1729 0x00000004 [1241524.205002] ffff880414f39b18 0000000000000082 0000000000015bc0 0000000000015bc0 [1241524.205012] ffff880414ebdf78 ffff880414f39fd8 0000000000015bc0 ffff880414ebdbc0 [1241524.205023] 0000000000015bc0 ffff880414f39fd8 0000000000015bc0 ffff880414ebdf78 GDB output (gdb /usr/sbin/glusterfs /core): Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done. [New Thread 16475] [New Thread 16477] [New Thread 16476] [New Thread 16478] warning: Can't read pathname for load map: Input/output error. Reading symbols from /usr/lib/libglusterfs.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libglusterfs.so.0 Reading symbols from /usr/lib/libgfrpc.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgfrpc.so.0 Reading symbols from /usr/lib/libgfxdr.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgfxdr.so.0 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/protocol/client.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/protocol/client.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/cluster/replicate.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/cluster/replicate.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/cluster/distribute.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/cluster/distribute.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/performance/write-behind.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/performance/write-behind.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/performance/read-ahead.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/performance/read-ahead.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/performance/io-cache.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/performance/io-cache.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/performance/quick-read.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/performance/quick-read.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/debug/io-stats.so...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/debug/io-stats.so Reading symbols from /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so Reading symbols from /usr/lib/glusterfs/3.1.2/rpc-transport/socket.so...(no debugging symbols found)...done. Loaded symbols for /usr/lib/glusterfs/3.1.2/rpc-transport/socket.so Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libgcc_s.so.1 Core was generated by `/usr/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/ru'. Program terminated with signal 11, Segmentation fault. #0 0x00007f001d89ef70 in nfs_rpcsvc_submit_vectors () from /usr/lib/glusterfs/3.1.2/xlator/nfs/server.so
Patches that fix this bug are available as part of bugs 2481 and 2504. They'll be part of release 3.1.3. Thanks for reporting. *** This bug has been marked as a duplicate of bug 2481 ***