Hide Forgot
(gdb) backtrace #0 server_mknod_cbk (frame=0x2aaab37eab20, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=<value optimized out>, inode=0x2aaab380bfe0, stbuf=0x7fff7e40d170) at server-protocol.c:877 #1 0x00002aaaaaab507a in posix_mknod (frame=0x2aaab37eb660, this=0x60b380, loc=0x7fff7e40d2f0, mode=32896, dev=0) at posix.c:499 #2 0x00002aaaaaecbd34 in server_mknod (frame=<value optimized out>, bound_xl=0x60b380, params=<value optimized out>) at server-protocol.c:4367 #3 0x00002aaaaaec3f67 in server_protocol_interpret (trans=0x2aaab00bfb40, blk=0x2aaab37d9d50) at server-protocol.c:6074 #4 0x00002aaaaaecf5dc in notify (this=<value optimized out>, event=<value optimized out>, data=0x2aaab00bfb40) at server-protocol.c:6444 #5 0x00002ad72c8c8bfb in sys_epoll_iteration (ctx=<value optimized out>) at epoll.c:64 #6 0x00002ad72c8c7fb9 in poll_iteration (ctx=0x2aaab37ebec8) at transport.c:312 #7 0x000000000040291c in main (argc=7, argv=0x7fff7e40d628) at glusterfs.c:565
Code version: glusterfs-1.3.12, fuse-2.7.4glfs11 Latest code update: 2009-06-09 Code location: feed6:/data1/nfs_share/, with local copies of libraries Usage: As shared, pooled-resource drive. Machines feed1~feed5 act as servers. Machines feed0~feed13 act as clients (including feed1~feed5). Processes on clients read and write simultaneously to the same directory, although not same files. One process monitors a specific file via tail -f. Crashing machine: feed1 Recent crashing dates: 2009-05-02, 2009-06-03, 2009-06-08, 2009-06-25 09:42 glusterfs-client.vol ------------------------------- volume feed1 type protocol/client option transport-type tcp/client option remote-host 10.1.90.201 option remote-subvolume brick end-volume volume feed2 type protocol/client option transport-type tcp/client option remote-host 10.1.90.202 option remote-subvolume brick end-volume volume feed3 type protocol/client option transport-type tcp/client option remote-host 10.1.90.203 option remote-subvolume brick end-volume volume feed4 type protocol/client option transport-type tcp/client option remote-host 10.1.90.204 option remote-subvolume brick end-volume volume feed5 type protocol/client option transport-type tcp/client option remote-host 10.1.90.205 option remote-subvolume brick end-volume volume client-ns type protocol/client option transport-type tcp/client option remote-host 10.1.90.201 option remote-subvolume brick-ns end-volume volume bricks type cluster/unify option namespace client-ns subvolumes feed1 feed2 feed3 feed4 feed5 option scheduler alu option alu.limits.min-free-disk 5% option alu.limits.max-open-files 100000 option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage option alu.disk-usage.entry-threshold 2GB option alu.disk-usage.exit-threshold 60MB option alu.open-files-usage.entry-threshold 1024 option alu.open-files-usage.exit-threshold 32 option alu.stat-refresh.interval 10sec end-volume ---------------------------------------------------------------------------- glusterfs-server.vol ----------------------------- volume brick type storage/posix option directory /data1/glusterfs end-volume volume brick-ns type storage/posix option directory /data1/glusterfs-ns end-volume volume locks type features/posix-locks # posix-locks subvolumes brick end-volume volume server type protocol/server option transport-type tcp/server subvolumes brick brick-ns option auth.ip.brick.allow 10.1.90.* option auth.ip.brick-ns.allow 10.1.90.* end-volume ---------------------------------------------------------------------------- Sample gluster.client.log error messages on feed1 -------------------------------- 2009-06-10 15:47:22 E [unify.c:882:unify_open] bricks: /persistent_data/piddata-23480.bin: entry_count is 3 2009-06-10 15:47:22 E [unify.c:885:unify_open] bricks: /persistent_data/piddata-23480.bin: found on feed2 2009-06-10 15:47:22 E [unify.c:885:unify_open] bricks: /persistent_data/piddata-23480.bin: found on feed4 2009-06-10 15:47:22 E [unify.c:885:unify_open] bricks: /persistent_data/piddata-23480.bin: found on client-ns ... 2009-06-10 23:06:21 E [unify.c:182:unify_lookup_cbk] bricks: feed1 returned 2 2009-06-10 23:06:21 E [unify.c:265:unify_lookup_cbk] bricks: Revalidate failed for /persistent_data/piddata-313.bin 2009-06-10 23:06:21 E [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 95612: (34) /persistent_data/piddata-313.bin => -1 (2) ... 2009-06-12 01:49:06 W [client-protocol.c:205:call_bail] client-ns: activating bail-out. pending frames = 1. last sent = 2009-06-12 01:48:16. last received = 2009-06-12 01:48:16 transport-timeout = 42 2009-06-12 01:49:06 C [client-protocol.c:212:call_bail] client-ns: bailing transport 2009-06-12 01:49:06 W [client-protocol.c:4784:client_protocol_cleanup] client-ns: cleaning up state in transport object 0x61bf00 2009-06-12 01:49:06 E [client-protocol.c:4834:client_protocol_cleanup] client-ns: forced unwinding frame type(1) op(40) reply=@0x61e6c0 2009-06-12 01:49:06 E [client-protocol.c:4613:client_checksum_cbk] client-ns: no proper reply from server, returning ENOTCONN 2009-06-12 01:49:06 E [client-protocol.c:325:client_protocol_xfer] client-ns: transport_submit failed ... 2009-06-12 01:49:57 W [client-protocol.c:205:call_bail] client-ns: activating bail-out. pending frames = 1. last sent = 2009-06-12 01:49:06. last received = 1969-12-31 16:00:00 transport-timeout = 42 2009-06-12 01:49:57 C [client-protocol.c:212:call_bail] client-ns: bailing transport 2009-06-12 01:49:57 W [client-protocol.c:4784:client_protocol_cleanup] client-ns: cleaning up state in transport object 0x61bf00 2009-06-12 01:49:57 E [client-protocol.c:4834:client_protocol_cleanup] client-ns: forced unwinding frame type(1) op(22) reply=@0x61e6c0 2009-06-12 01:49:57 E [client-protocol.c:3767:client_opendir_cbk] client-ns: no proper reply from server, returning ENOTCONN 2009-06-12 01:49:57 W [client-protocol.c:1711:client_closedir] client-ns: no proper fd found, returning 2009-06-12 01:49:57 W [client-protocol.c:280:client_protocol_xfer] client-ns: attempting to pipeline request type(1) op(34) with handshake ... 2009-06-14 05:14:17 W [client-protocol.c:205:call_bail] feed4: activating bail-out. pending frames = 1. last sent = 2009-06-14 05:13:31. last received = 2009-06-14 05:05:37 transport-timeout = 42 2009-06-14 05:14:17 C [client-protocol.c:212:call_bail] feed4: bailing transport 2009-06-14 05:14:17 W [client-protocol.c:4784:client_protocol_cleanup] feed4: cleaning up state in transport object 0x616760 2009-06-14 05:14:17 E [client-protocol.c:4834:client_protocol_cleanup] feed4: forced unwinding frame type(1) op(15) reply=@0x2aaaac005e70 2009-06-14 05:14:17 E [client-protocol.c:3891:client_statfs_cbk] feed4: no proper reply from server, returning ENOTCONN 2009-06-14 05:14:17 E [unify.c:1402:unify_statfs_cbk] bricks: operation failed on feed4 (107) 2009-06-14 05:14:17 W [client-protocol.c:280:client_protocol_xfer] feed4: attempting to pipeline request type(1) op(15) with handshake ... 2009-06-14 05:17:29 W [client-protocol.c:205:call_bail] feed1: activating bail-out. pending frames = 1. last sent = 2009-06-14 05:16:37. last received = 2009-06-14 05:14:17 transport-timeout = 42 2009-06-14 05:17:29 C [client-protocol.c:212:call_bail] feed1: bailing transport 2009-06-14 05:17:29 W [client-protocol.c:4784:client_protocol_cleanup] feed1: cleaning up state in transport object 0x60f4e0 2009-06-14 05:17:29 E [client-protocol.c:4834:client_protocol_cleanup] feed1: forced unwinding frame type(1) op(15) reply=@0x2aaaac005b60 2009-06-14 05:17:29 E [client-protocol.c:3891:client_statfs_cbk] feed1: no proper reply from server, returning ENOTCONN 2009-06-14 05:17:29 E [unify.c:1402:unify_statfs_cbk] bricks: operation failed on feed1 (107) 2009-06-14 05:17:29 W [client-protocol.c:205:call_bail] feed3: activating bail-out. pending frames = 1. last sent = 2009-06-14 05:16:37. last received = 2009-06-14 05:14:20 transport-timeout = 42 2009-06-14 05:17:29 C [client-protocol.c:212:call_bail] feed3: bailing transport 2009-06-14 05:17:29 W [client-protocol.c:4784:client_protocol_cleanup] feed3: cleaning up state in transport object 0x614310 2009-06-14 05:17:29 W [client-protocol.c:205:call_bail] feed2: activating bail-out. pending frames = 1. last sent = 2009-06-14 05:16:37. last received = 2009-06-14 05:14:17 transport-timeout = 42 2009-06-14 05:17:29 C [client-protocol.c:212:call_bail] feed2: bailing transport 2009-06-14 05:17:29 E [client-protocol.c:4834:client_protocol_cleanup] feed3: forced unwinding frame type(1) op(15) reply=@0x2aaaac005eb0 2009-06-14 05:17:29 E [client-protocol.c:3891:client_statfs_cbk] feed3: no proper reply from server, returning ENOTCONN 2009-06-14 05:17:29 E [unify.c:1402:unify_statfs_cbk] bricks: operation failed on feed3 (107) 2009-06-14 05:17:29 W [client-protocol.c:4784:client_protocol_cleanup] feed2: cleaning up state in transport object 0x611f00 2009-06-14 05:17:29 E [client-protocol.c:4834:client_protocol_cleanup] feed2: forced unwinding frame type(1) op(15) reply=@0x2aaaac005cd0 2009-06-14 05:17:29 E [client-protocol.c:3891:client_statfs_cbk] feed2: no proper reply from server, returning ENOTCONN 2009-06-14 05:17:29 E [unify.c:1402:unify_statfs_cbk] bricks: operation failed on feed2 (107) 2009-06-14 05:17:29 W [client-protocol.c:332:client_protocol_xfer] feed1: not connected at the moment to submit frame type(1) op(15) 2009-06-14 05:17:29 E [client-protocol.c:3891:client_statfs_cbk] feed1: no proper reply from server, returning ENOTCONN 2009-06-14 05:17:29 E [unify.c:1402:unify_statfs_cbk] bricks: operation failed on feed1 (107) 2009-06-14 05:17:29 W [client-protocol.c:280:client_protocol_xfer] feed2: attempting to pipeline request type(1) op(15) with handshake 2009-06-14 05:17:29 W [client-protocol.c:280:client_protocol_xfer] feed3: attempting to pipeline request type(1) op(15) with handshake 2009-06-14 05:17:29 W [client-protocol.c:280:client_protocol_xfer] feed4: attempting to pipeline request type(1) op(15) with handshake ... 2009-06-16 16:17:34 E [unify.c:325:unify_lookup] bricks: returning ESTALE for / [translator generation (10) inode generation (6)] 2009-06-16 16:17:34 E [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 2016545: (34) / => -1 (116) ... 2009-06-17 12:01:25 W [client-protocol.c:205:call_bail] client-ns: activating bail-out. pending frames = 2. last sent = 2009-06-17 12:00:40. last received = 2009-06-17 12:00:40 transport-timeout = 42 2009-06-17 12:01:25 C [client-protocol.c:212:call_bail] client-ns: bailing transport 2009-06-17 12:01:25 W [client-protocol.c:4784:client_protocol_cleanup] client-ns: cleaning up state in transport object 0x61bf00 2009-06-17 12:01:25 E [client-protocol.c:4834:client_protocol_cleanup] client-ns: forced unwinding frame type(1) op(12) reply=@0x2aaaac000c30 2009-06-17 12:01:25 E [client-protocol.c:2627:client_open_cbk] client-ns: no proper reply from server, returning ENOTCONN 2009-06-17 12:01:25 E [unify.c:820:unify_open_cbk] bricks: Open success on child node, failed on namespace 2009-06-17 12:01:25 E [client-protocol.c:4834:client_protocol_cleanup] client-ns: forced unwinding frame type(1) op(40) reply=@0x2aaaac000c30 2009-06-17 12:01:25 E [client-protocol.c:4613:client_checksum_cbk] client-ns: no proper reply from server, returning ENOTCONN 2009-06-17 12:01:25 E [client-protocol.c:325:client_protocol_xfer] client-ns: transport_submit failed 2009-06-17 12:01:25 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: 3345478: (12) /feed_processing/11460/11460.218.feed => -1 (107) ... 2009-06-21 14:37:40 W [client-protocol.c:205:call_bail] feed1: activating bail-out. pending frames = 2. last sent = 2009-06-21 14:36:53. last received = 2009-06-21 14:36:49 transport-timeout = 42 2009-06-21 14:37:40 C [client-protocol.c:212:call_bail] feed1: bailing transport 2009-06-21 14:37:40 W [client-protocol.c:4784:client_protocol_cleanup] feed1: cleaning up state in transport object 0x60f4e0 2009-06-21 14:37:40 E [client-protocol.c:4834:client_protocol_cleanup] feed1: forced unwinding frame type(1) op(34) reply=@0x2aaab005eea0 2009-06-21 14:37:40 E [client-protocol.c:4430:client_lookup_cbk] feed1: no proper reply from server, returning ENOTCONN 2009-06-21 14:37:40 E [unify.c:182:unify_lookup_cbk] bricks: feed1 returned 107 2009-06-21 14:37:40 E [client-protocol.c:4834:client_protocol_cleanup] feed1: forced unwinding frame type(1) op(34) reply=@0x2aaab005eea0 2009-06-21 14:37:40 E [client-protocol.c:4430:client_lookup_cbk] feed1: no proper reply from server, returning ENOTCONN 2009-06-21 14:37:40 E [unify.c:182:unify_lookup_cbk] bricks: feed1 returned 107 ... 2009-06-25 04:44:14 W [fuse-bridge.c:411:fuse_entry_cbk] glusterfs-fuse: 4270680: (34) /feed_processing/12879/12879.206.feed => 1074574335 Rehashing because st_nlink less than dentry maps ... 2009-06-25 09:42:00 E [protocol.c:271:gf_block_unserialize_transport] feed1: EOF from peer (10.1.90.201:6996) 2009-06-25 09:42:00 W [client-protocol.c:4784:client_protocol_cleanup] feed1: cleaning up state in transport object 0x60f4e0 2009-06-25 09:42:00 E [protocol.c:271:gf_block_unserialize_transport] client-ns: EOF from peer (10.1.90.201:6996) 2009-06-25 09:42:00 W [client-protocol.c:4784:client_protocol_cleanup] client-ns: cleaning up state in transport object 0x61bf00 2009-06-25 09:42:00 E [tcp-client.c:190:tcp_connect] client-ns: non-blocking connect() returned: 111 (Connection refused) ----------------------------------------------------------------------------- Sample gluster.server.log error messages on feed1 -------------------------------- 2009-06-11 13:40:06 W [posix.c:486:posix_mknod] brick-ns: mknod on /condor/var/PS/query/query_data.lock: File exists ... 2009-06-11 20:37:49 W [posix.c:486:posix_mknod] brick-ns: mknod on /condor/var/PS/query/query_par.lock: File exists ... 2009-06-12 10:17:33 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (10.1.90.200:1016) ... 2009-06-13 04:15:00 W [posix.c:486:posix_mknod] brick-ns: mknod on /condor/var/PS/item/item_list_27.lock: File exists ... 2009-06-14 01:51:25 E [server-protocol.c:186:generic_reply] server: transport_writev failed ... 2009-06-14 01:51:25 E [tcp.c:124:tcp_except] server: shutdown () - error: Transport endpoint is not connected ... 2009-06-14 05:28:00 W [posix.c:486:posix_mknod] brick-ns: mknod on /condor/var/PS/token/token_data.lock: File exists ... TLA Repo Revision: glusterfs--mainline--2.5--patch-797 Time : 2009-06-25 09:41:58 Signal Number : 11 /usr/local/sbin/glusterfsd -f /etc/glusterfs/glusterfs-server.vol -l /var/log/glusterfs/gluster.server.log -L WARNING --pidfile /var/log/glusterfs/pid volume server type protocol/server option auth.ip.brick-ns.allow 10.1.90.* option auth.ip.brick.allow 10.1.90.* option transport-type tcp/server subvolumes brick brick-ns end-volume volume locks type features/posix-locks subvolumes brick end-volume volume brick-ns type storage/posix option directory /data1/glusterfs-ns end-volume volume brick type storage/posix option directory /data1/glusterfs end-volume frame : type(1) op(2) /lib64/libc.so.6[0x2ad72cf535b0] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/glusterfs/1.3.12/xlator/protocol/server.so(server_mknod_cbk+0xf0)[0x2aaaaaecbb70] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/glusterfs/1.3.12/xlator/storage/posix.so(posix_mknod+0x12a)[0x2aaaaaab507a] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/glusterfs/1.3.12/xlator/protocol/server.so(server_mknod+0x164)[0x2aaaaaecbd34] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/glusterfs/1.3.12/xlator/protocol/server.so(server_protocol_interpret+0x2e7)[0x2aaaaaec3f67] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/glusterfs/1.3.12/xlator/protocol/server.so(notify+0x9c)[0x2aaaaaecf5dc] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/libglusterfs.so.0(sys_epoll_iteration+0xbb)[0x2ad72c8c8bfb] /mnt/nfs_share/feed6/apps/install/glusterfs-1.3.12/lib/libglusterfs.so.0(poll_iteration+0x79)[0x2ad72c8c7fb9] [glusterfs](main+0x67c)[0x40291c] /lib64/libc.so.6(__libc_start_main+0xf4)[0x2ad72cf40ae4] [glusterfs][0x401de9] ----------------------------------------------------------------------------- Sample core data: -------------------------------- feed1:/ # gdb /usr/local/sbin/glusterfsd core.12030 GNU gdb 6.5 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/libthread_db.so.1". ... Core was generated by `[glusterfs] '. Program terminated with signal 11, Segmentation fault. #0 server_mknod_cbk (frame=0x2aaab37eab20, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=<value optimized out>, inode=0x2aaab380bfe0, stbuf=0x7fff7e40d170) at server-protocol.c:877 877 dict_set (reply, "INODE", data_from_uint64 (inode->ino)); (gdb) p *inode Cannot access memory at address 0x2aaab380bfe0 (gdb) p *reply $3 = {is_static = 0 '\0', is_locked = 1 '\001', hash_size = 1, count = 3, refcount = 0, members = 0x2aaab2b71ab0, members_list = 0x2aaab2e9b750, extra_free = 0x0, lock = 1} -----------------------------------------------------------------------------
Yang, We are no more supporting 1.3.12 release. Please try 2.0.8+ or 3.0.0 release when it is done. Also we no longer support 'cluster/unify' translator too. Please use 'cluster/distribute' instead. That should fix all these issues. Regards, Amar