Bug 786419 - [c3aa99d907591f72b6302287b9b8899514fb52f1]: Missing gfids for some files & hence IO
Summary: [c3aa99d907591f72b6302287b9b8899514fb52f1]: Missing gfids for some files & he...
Keywords:
Status: CLOSED DUPLICATE of bug 797167
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-01 11:52 UTC by Rahul C S
Modified: 2012-03-01 12:30 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-01 12:30:36 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
missing gfid logs (11.86 MB, application/x-bzip)
2012-02-01 11:52 UTC, Rahul C S
no flags Details

Description Rahul C S 2012-02-01 11:52:50 UTC
Created attachment 558801 [details]
missing gfid logs

Description of problem:
Created a striped-replicate volume
Volume Name: vol
Type: Striped-Replicate
Volume ID: ba41b542-bdbd-4691-989b-6103301135fa
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: dagobah:/data/export1
Brick2: dagobah:/data/export2
Brick3: dagobah:/data/export3
Brick4: dagobah:/data/export4
Options Reconfigured:
performance.write-behind: off
cluster.stripe-block-size: 32KB
performance.stat-prefetch: off

Mounted 2 clients, one with & another without acl.

Untarred openssl from both clients parallelly & then did rm -rf of the dir parallely. 

rm failed saying some directories are not empty which are fine.

Then killed a brick & then did untar of openssl from client(acl) 
I got the below errors:
openssl-1.0.0g/crypto/aes/aes.h
openssl-1.0.0g/crypto/aes/aes_ige.c
openssl-1.0.0g/crypto/aes/aes_locl.h
openssl-1.0.0g/crypto/aes/aes_misc.c
openssl-1.0.0g/crypto/aes/aes_ofb.c
tar: openssl-1.0.0g/crypto/aes/aes_ofb.c: Cannot open: No data available
openssl-1.0.0g/crypto/aes/aes_wrap.c
tar: openssl-1.0.0g/crypto/aes/aes_wrap.c: Cannot open: Input/output error
openssl-1.0.0g/crypto/aes/aes_x86core.c
tar: openssl-1.0.0g/crypto/aes/aes_x86core.c: Cannot open: File exists
openssl-1.0.0g/crypto/aes/asm/
openssl-1.0.0g/crypto/aes/asm/aes-586.pl
openssl-1.0.0g/crypto/aes/asm/aes-armv4.pl
openssl-1.0.0g/crypto/aes/asm/aes-ia64.S
openssl-1.0.0g/crypto/aes/asm/aes-ppc.pl
openssl-1.0.0g/crypto/aes/asm/aes-s390x.pl
tar: openssl-1.0.0g/crypto/aes/asm/aes-s390x.pl: Cannot open: No data available
openssl-1.0.0g/crypto/aes/asm/aes-sparcv9.pl
openssl-1.0.0g/crypto/aes/asm/aes-x86_64.pl
openssl-1.0.0g/crypto/aes/Makefile
openssl-1.0.0g/crypto/aes/README
...
& again did rm -rf openssl-1.0.0g/
rm: cannot remove `openssl-1.0.0g/apps/cms.c': Input/output error
rm: cannot remove `openssl-1.0.0g/bugs/MS': No data available
rm: cannot remove `openssl-1.0.0g/bugs/stream.c': Input/output error
rm: cannot remove `openssl-1.0.0g/crypto/aes/asm/aes-s390x.pl': No data available
rm: cannot remove `openssl-1.0.0g/crypto/aes/aes_ofb.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/tasn_prn.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/t_crl.c': Input/output error
rm: cannot remove `openssl-1.0.0g/crypto/asn1/t_req.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_attrib.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_crl.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_info.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_req.c': Input/output error
rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_val.c': Input/output error
rm: cannot remove `openssl-1.0.0g/crypto/asn1/a_time.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/charmap.h': Input/output error
rm: cannot remove `openssl-1.0.0g/crypto/asn1/bio_asn1.c': No data available
rm: cannot remove `openssl-1.0.0g/crypto/asn1/f_string.c': Input/output error

Client log:
2012-02-01 13:05:30.045703] I [afr-common.c:1288:afr_launch_self_heal] 0-vol-replicate-1: background  gfid self-heal triggered. path: /opens
sl-1.0.0g/crypto/asn1/a_time.c, reason: lookup detected pending operations
[2012-02-01 13:05:30.046053] E [afr-common.c:1713:afr_lookup_done] 0-vol-replicate-0: Failing lookup for /openssl-1.0.0g/crypto/asn1/a_time.c
, LOOKUP on a file without gfid is not allowed when some of the children are down
[2012-02-01 13:05:30.046958] E [afr-self-heal-common.c:1295:afr_sh_common_lookup_cbk] 0-vol-replicate-1: Missing Gfids for /openssl-1.0.0g/cr
ypto/asn1/a_time.c
[2012-02-01 13:05:30.047194] I [afr-self-heal-common.c:908:afr_sh_missing_entries_done] 0-vol-replicate-1: split brain found, aborting selfhe
al of /openssl-1.0.0g/crypto/asn1/a_time.c
[2012-02-01 13:05:30.047223] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-vol-replicate-1: background  gfid self-heal faile
d on /openssl-1.0.0g/crypto/asn1/a_time.c
[2012-02-01 13:05:30.047296] W [fuse-bridge.c:269:fuse_entry_cbk] 0-glusterfs-fuse: 56238: LOOKUP() /openssl-1.0.0g/crypto/asn1/a_time.c => -
1 (No data available)
[2012-02-01 13:05:30.048475] I [afr-common.c:1288:afr_launch_self_heal] 0-vol-replicate-1: background  gfid self-heal triggered. path: /opens
sl-1.0.0g/crypto/asn1/a_time.c, reason: lookup detected pending operations
[2012-02-01 13:05:30.050326] E [afr-self-heal-common.c:1295:afr_sh_common_lookup_cbk] 0-vol-replicate-1: Missing Gfids for /openssl-1.0.0g/cr
ypto/asn1/a_time.c
[2012-02-01 13:05:30.050594] E [afr-common.c:1713:afr_lookup_done] 0-vol-replicate-0: Failing lookup for /openssl-1.0.0g/crypto/asn1/a_time.c
, LOOKUP on a file without gfid is not allowed when some of the children are down
[2012-02-01 13:05:30.050747] I [afr-self-heal-common.c:908:afr_sh_missing_entries_done] 0-vol-replicate-1: split brain found, aborting selfhe
al of /openssl-1.0.0g/crypto/asn1/a_time.c
[2012-02-01 13:05:30.050775] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-vol-replicate-1: background  gfid self-heal faile
d on /openssl-1.0.0g/crypto/asn1/a_time.c
[2012-02-01 13:05:30.050807] W [fuse-bridge.c:269:fuse_entry_cbk] 0-glusterfs-fuse: 56240: LOOKUP() /openssl-1.0.0g/crypto/asn1/a_time.c => -
1 (No data available)


root@Dagobah:/data/export1# getfattr -d -m . openssl-1.0.0g/crypto/asn1/a_time.c
# file: openssl-1.0.0g/crypto/asn1/a_time.c
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA==

root@Dagobah:/data/export1# getfattr -d -m . ../export2/openssl-1.0.0g/crypto/asn1/a_time.c
# file: ../export2/openssl-1.0.0g/crypto/asn1/a_time.c
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA==

root@Dagobah:/data/export1# getfattr -d -m . ../export3/openssl-1.0.0g/crypto/asn1/a_time.c
# file: ../export3/openssl-1.0.0g/crypto/asn1/a_time.c
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA==

root@Dagobah:/data/export1# getfattr -d -m . ../export4/openssl-1.0.0g/crypto/asn1/a_time.c
# file: ../export4/openssl-1.0.0g/crypto/asn1/a_time.c
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA==


Attaching log files.

Comment 1 Pranith Kumar K 2012-03-01 12:30:36 UTC
The bug appears because of a race in create and lookup, Shishir is working on the bug.

*** This bug has been marked as a duplicate of bug 797167 ***


Note You need to log in before you can comment on or make changes to this bug.