Bug 764920 (GLUSTER-3188) - Still getting gfid mismatches on "sed -i" renames
Summary: Still getting gfid mismatches on "sed -i" renames
Keywords:
Status: CLOSED DUPLICATE of bug 764196
Alias: GLUSTER-3188
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.5
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-18 16:12 UTC by Joe Julian
Modified: 2011-07-21 23:56 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
log with debug/trace (207.33 KB, application/x-gzip)
2011-07-20 11:47 UTC, Joe Julian
no flags Details
files and script that create mismatched gfids (3.94 KB, application/x-gzip)
2011-07-20 12:42 UTC, Joe Julian
no flags Details
One-time pass that created mismatched gfids (35.26 KB, application/x-gzip)
2011-07-20 12:43 UTC, Joe Julian
no flags Details

Description Joe Julian 2011-07-18 16:12:27 UTC
After upgrading to 3.1.5, "sed -i" scripts still create gfid mismatch errors.

The directory bridge/ is rm -rf'd and mkdir'd before create_tables.sql is created. It's created with a stdout redirect then process with 'sed -i'.

Volume Name: share1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: ewcs2:/var/spool/glusterfs/a_share1
Brick2: ewcs4:/var/spool/glusterfs/a_share1
Brick3: ewcs7:/var/spool/glusterfs/a_share1
Brick4: ewcs2:/var/spool/glusterfs/b_share1
Brick5: ewcs4:/var/spool/glusterfs/b_share1
Brick6: ewcs7:/var/spool/glusterfs/b_share1
Brick7: ewcs2:/var/spool/glusterfs/c_share1
Brick8: ewcs4:/var/spool/glusterfs/c_share1
Brick9: ewcs7:/var/spool/glusterfs/c_share1
Brick10: ewcs2:/var/spool/glusterfs/d_share1
Brick11: ewcs4:/var/spool/glusterfs/d_share1
Brick12: ewcs7:/var/spool/glusterfs/d_share1

[2011-07-18 03:35:01.455389] W [dht-common.c:657:dht_lookup_linkfile_cbk] 0-share1-dht: /bridge/create_tables.sql: gfid different on data file on share1-replicate-1
[2011-07-18 03:35:01.489529] E [afr-lk-common.c:569:afr_unlock_inodelk_cbk] 0-share1-replicate-3: /bridge/create_tables.sql: unlock failed No such file or directory
[2011-07-18 03:35:01.489666] E [afr-lk-common.c:569:afr_unlock_inodelk_cbk] 0-share1-replicate-3: /bridge/create_tables.sql: unlock failed No such file or directory
[2011-07-18 03:35:01.489713] E [afr-lk-common.c:569:afr_unlock_inodelk_cbk] 0-share1-replicate-3: /bridge/create_tables.sql: unlock failed No such file or directory
[2011-07-18 03:35:01.688496] W [fuse-bridge.c:582:fuse_fd_cbk] 0-glusterfs-fuse: 10781: OPEN() /bridge/create_tables.sql => -1 (Invalid argument)
[2011-07-18 03:35:01.689867] W [fuse-bridge.c:582:fuse_fd_cbk] 0-glusterfs-fuse: 10782: OPEN() /bridge/create_tables.sql => -1 (Invalid argument)
[2011-07-18 03:35:13.652498] W [dht-common.c:657:dht_lookup_linkfile_cbk] 0-share1-dht: /bridge/create_tables.sql: gfid different on data file on share1-replicate-1
[2011-07-18 04:05:22.893785] E [rpc-clnt.c:199:call_bail] 0-share1-client-9: bailing out frame type(GlusterFS 3.1) op(INODELK(29)) xid = 0x6297x sent = 2011-07-18 03:35:13.679729. timeout = 1800
[2011-07-18 04:35:23.619224] E [rpc-clnt.c:199:call_bail] 0-share1-client-10: bailing out frame type(GlusterFS 3.1) op(INODELK(29)) xid = 0x6188x sent = 2011-07-18 04:05:22.893927. timeout = 1800
[2011-07-18 05:05:24.459560] E [rpc-clnt.c:199:call_bail] 0-share1-client-11: bailing out frame type(GlusterFS 3.1) op(INODELK(29)) xid = 0x5987x sent = 2011-07-18 04:35:23.632024. timeout = 1800
[2011-07-18 05:05:24.463618] W [fuse-bridge.c:582:fuse_fd_cbk] 0-glusterfs-fuse: 32281: OPEN() /bridge/create_tables.sql => -1 (Invalid argument)
[2011-07-18 07:26:51.743494] W [fuse-bridge.c:905:fuse_unlink_cbk] 0-glusterfs-fuse: 87650: UNLINK() /bridge/create_tables.sql => -1 (Invalid argument)
[2011-07-18 07:26:54.714613] W [fuse-bridge.c:905:fuse_unlink_cbk] 0-glusterfs-fuse: 87748: UNLINK() /bridge/create_tables.sql => -1 (Invalid argument)
[2011-07-18 08:02:51.611195] W [fuse-bridge.c:905:fuse_unlink_cbk] 0-glusterfs-fuse: 103497: UNLINK() /bridge/create_tables.sql => -1 (Invalid argument)
[2011-07-18 08:07:30.172676] I [fuse-bridge.c:3218:fuse_thread_proc] 0-fuse: unmounting /mnt/gluster/share1
[2011-07-18 08:07:30.184897] I [glusterfsd.c:712:cleanup_and_exit] 0-glusterfsd: shutting down

Comment 1 Joe Julian 2011-07-18 22:38:51 UTC
After this happens the first time, if it happens a second time the client will hang on fuse-bridge.c:3124 (readv())

Comment 2 Joe Julian 2011-07-18 22:40:44 UTC
#0  0x00007f96bbb7b265 in sigwait () from /lib64/libpthread.so.0
#1  0x0000000000403c6b in glusterfs_sigwaiter ()
#2  0x00007f96bbb737e1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f96bb8ce52d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f96b934a710 (LWP 27076)):
#0  0x00007f96bb893afd in nanosleep () from /lib64/libc.so.6
#1  0x00007f96bb8c78f4 in usleep () from /lib64/libc.so.6
#2  0x00007f96bc3de31d in gf_timer_proc (ctx=0xd73010) at timer.c:182
#3  0x00007f96bbb737e1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f96bb8ce52d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f96b4fcb710 (LWP 27077)):
#0  0x00007f96bb8c6667 in readv () from /lib64/libc.so.6
#1  0x00007f96ba7db765 in fuse_thread_proc (data=<value optimized out>) at fuse-bridge.c:3124
#2  0x00007f96bbb737e1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f96bb8ce52d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f96bc834700 (LWP 27072)):
#0  0x00007f96bb8ceb23 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f96bc3eedf7 in event_dispatch_epoll (event_pool=0xd73350) at event.c:859
#2  0x00000000004049fb in main ()

Comment 3 Joe Julian 2011-07-20 11:47:55 UTC
Created attachment 563

Comment 4 Joe Julian 2011-07-20 12:42:20 UTC
Created attachment 564

Comment 5 Joe Julian 2011-07-20 12:43:28 UTC
Created attachment 565


I ran the script once, and looked at the backend. They had mismached ids.

Comment 6 Joe Julian 2011-07-21 20:56:02 UTC

*** This bug has been marked as a duplicate of bug 2464 ***


Note You need to log in before you can comment on or make changes to this bug.