Created attachment 558801 [details] missing gfid logs Description of problem: Created a striped-replicate volume Volume Name: vol Type: Striped-Replicate Volume ID: ba41b542-bdbd-4691-989b-6103301135fa Status: Started Number of Bricks: 1 x 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: dagobah:/data/export1 Brick2: dagobah:/data/export2 Brick3: dagobah:/data/export3 Brick4: dagobah:/data/export4 Options Reconfigured: performance.write-behind: off cluster.stripe-block-size: 32KB performance.stat-prefetch: off Mounted 2 clients, one with & another without acl. Untarred openssl from both clients parallelly & then did rm -rf of the dir parallely. rm failed saying some directories are not empty which are fine. Then killed a brick & then did untar of openssl from client(acl) I got the below errors: openssl-1.0.0g/crypto/aes/aes.h openssl-1.0.0g/crypto/aes/aes_ige.c openssl-1.0.0g/crypto/aes/aes_locl.h openssl-1.0.0g/crypto/aes/aes_misc.c openssl-1.0.0g/crypto/aes/aes_ofb.c tar: openssl-1.0.0g/crypto/aes/aes_ofb.c: Cannot open: No data available openssl-1.0.0g/crypto/aes/aes_wrap.c tar: openssl-1.0.0g/crypto/aes/aes_wrap.c: Cannot open: Input/output error openssl-1.0.0g/crypto/aes/aes_x86core.c tar: openssl-1.0.0g/crypto/aes/aes_x86core.c: Cannot open: File exists openssl-1.0.0g/crypto/aes/asm/ openssl-1.0.0g/crypto/aes/asm/aes-586.pl openssl-1.0.0g/crypto/aes/asm/aes-armv4.pl openssl-1.0.0g/crypto/aes/asm/aes-ia64.S openssl-1.0.0g/crypto/aes/asm/aes-ppc.pl openssl-1.0.0g/crypto/aes/asm/aes-s390x.pl tar: openssl-1.0.0g/crypto/aes/asm/aes-s390x.pl: Cannot open: No data available openssl-1.0.0g/crypto/aes/asm/aes-sparcv9.pl openssl-1.0.0g/crypto/aes/asm/aes-x86_64.pl openssl-1.0.0g/crypto/aes/Makefile openssl-1.0.0g/crypto/aes/README ... & again did rm -rf openssl-1.0.0g/ rm: cannot remove `openssl-1.0.0g/apps/cms.c': Input/output error rm: cannot remove `openssl-1.0.0g/bugs/MS': No data available rm: cannot remove `openssl-1.0.0g/bugs/stream.c': Input/output error rm: cannot remove `openssl-1.0.0g/crypto/aes/asm/aes-s390x.pl': No data available rm: cannot remove `openssl-1.0.0g/crypto/aes/aes_ofb.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/tasn_prn.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/t_crl.c': Input/output error rm: cannot remove `openssl-1.0.0g/crypto/asn1/t_req.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_attrib.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_crl.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_info.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_req.c': Input/output error rm: cannot remove `openssl-1.0.0g/crypto/asn1/x_val.c': Input/output error rm: cannot remove `openssl-1.0.0g/crypto/asn1/a_time.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/charmap.h': Input/output error rm: cannot remove `openssl-1.0.0g/crypto/asn1/bio_asn1.c': No data available rm: cannot remove `openssl-1.0.0g/crypto/asn1/f_string.c': Input/output error Client log: 2012-02-01 13:05:30.045703] I [afr-common.c:1288:afr_launch_self_heal] 0-vol-replicate-1: background gfid self-heal triggered. path: /opens sl-1.0.0g/crypto/asn1/a_time.c, reason: lookup detected pending operations [2012-02-01 13:05:30.046053] E [afr-common.c:1713:afr_lookup_done] 0-vol-replicate-0: Failing lookup for /openssl-1.0.0g/crypto/asn1/a_time.c , LOOKUP on a file without gfid is not allowed when some of the children are down [2012-02-01 13:05:30.046958] E [afr-self-heal-common.c:1295:afr_sh_common_lookup_cbk] 0-vol-replicate-1: Missing Gfids for /openssl-1.0.0g/cr ypto/asn1/a_time.c [2012-02-01 13:05:30.047194] I [afr-self-heal-common.c:908:afr_sh_missing_entries_done] 0-vol-replicate-1: split brain found, aborting selfhe al of /openssl-1.0.0g/crypto/asn1/a_time.c [2012-02-01 13:05:30.047223] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-vol-replicate-1: background gfid self-heal faile d on /openssl-1.0.0g/crypto/asn1/a_time.c [2012-02-01 13:05:30.047296] W [fuse-bridge.c:269:fuse_entry_cbk] 0-glusterfs-fuse: 56238: LOOKUP() /openssl-1.0.0g/crypto/asn1/a_time.c => - 1 (No data available) [2012-02-01 13:05:30.048475] I [afr-common.c:1288:afr_launch_self_heal] 0-vol-replicate-1: background gfid self-heal triggered. path: /opens sl-1.0.0g/crypto/asn1/a_time.c, reason: lookup detected pending operations [2012-02-01 13:05:30.050326] E [afr-self-heal-common.c:1295:afr_sh_common_lookup_cbk] 0-vol-replicate-1: Missing Gfids for /openssl-1.0.0g/cr ypto/asn1/a_time.c [2012-02-01 13:05:30.050594] E [afr-common.c:1713:afr_lookup_done] 0-vol-replicate-0: Failing lookup for /openssl-1.0.0g/crypto/asn1/a_time.c , LOOKUP on a file without gfid is not allowed when some of the children are down [2012-02-01 13:05:30.050747] I [afr-self-heal-common.c:908:afr_sh_missing_entries_done] 0-vol-replicate-1: split brain found, aborting selfhe al of /openssl-1.0.0g/crypto/asn1/a_time.c [2012-02-01 13:05:30.050775] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-vol-replicate-1: background gfid self-heal faile d on /openssl-1.0.0g/crypto/asn1/a_time.c [2012-02-01 13:05:30.050807] W [fuse-bridge.c:269:fuse_entry_cbk] 0-glusterfs-fuse: 56240: LOOKUP() /openssl-1.0.0g/crypto/asn1/a_time.c => - 1 (No data available) root@Dagobah:/data/export1# getfattr -d -m . openssl-1.0.0g/crypto/asn1/a_time.c # file: openssl-1.0.0g/crypto/asn1/a_time.c trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA== root@Dagobah:/data/export1# getfattr -d -m . ../export2/openssl-1.0.0g/crypto/asn1/a_time.c # file: ../export2/openssl-1.0.0g/crypto/asn1/a_time.c trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA== root@Dagobah:/data/export1# getfattr -d -m . ../export3/openssl-1.0.0g/crypto/asn1/a_time.c # file: ../export3/openssl-1.0.0g/crypto/asn1/a_time.c trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA== root@Dagobah:/data/export1# getfattr -d -m . ../export4/openssl-1.0.0g/crypto/asn1/a_time.c # file: ../export4/openssl-1.0.0g/crypto/asn1/a_time.c trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAA== Attaching log files.
The bug appears because of a race in create and lookup, Shishir is working on the bug. *** This bug has been marked as a duplicate of bug 797167 ***