Created attachment 1576564 [details] Script to continously append to a file Description of problem: When 2 bricks in an arbiter volume are brought down continuously, fattr's of both data bricks blame each other and arbiter is becoming source of heal, A similar issue was discovered earlier in BZ#1401969 which was fixed in 3.4.0 Version-Release number of selected component (if applicable): Was discovered while testing Hotfix # rpm -qa | grep gluster python2-gluster-3.12.2-40.el7rhgs.1.HOTFIX.sfdc02320997.bz1708121.x86_64 glusterfs-3.12.2-40.el7rhgs.1.HOTFIX.sfdc02320997.bz1708121.x86_64 How reproducible: Once Steps to Reproduce: 1. Run a script that collects all the bricks in the volume, kill 2 bricks (b0, b1) with milli-second difference, bring back bricks using glusterd restart 2. Now kill b1 and b2 and repeat the cycle in loop 3. At the same time run the perl script on fuse client as IO, the script is attached along with this bug, this script opens a file and does infinite writes in loop Actual results: File is pending heal and is unable to access from mount point. Arbiter becoming source of heal # ls 1 ls: cannot access 1: Transport endpoint is not connected # stat 1 stat: cannot stat ‘1’: Transport endpoint is not connected # gluster v heal master2vol-2 info Brick 10.70.36.49:/bricks/brick1/master1vol-2 <gfid:90959a41-63dc-4fe0-b6d9-f1223b1ab40f> Status: Connected Number of entries: 1 Brick 10.70.36.62:/bricks/brick3/master1vol-2repl <gfid:90959a41-63dc-4fe0-b6d9-f1223b1ab40f> Status: Connected Number of entries: 1 Brick 10.70.36.56:/bricks/brick1/master1vol-2 <gfid:90959a41-63dc-4fe0-b6d9-f1223b1ab40f> Status: Connected Number of entries: 1 Expected results: Arbiter brick should not become source of heal, and all files should heal Additional info: ================ # gluster v info master2vol-2 Volume Name: master2vol-2 Type: Replicate Volume ID: 0f62e637-15ae-4c64-828b-f7d83e08baf4 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.49:/bricks/brick1/master1vol-2 Brick2: 10.70.36.62:/bricks/brick3/master1vol-2repl Brick3: 10.70.36.56:/bricks/brick1/master1vol-2 (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on cluster.shd-max-threads: 30 cluster.enable-shared-storage: enable ============================================================================= Extended Attributes for the file blame each other (client 0 and client 1) and dirty attribute is set. Data-brick 1 # getfattr -m . -d -e hex /bricks/brick1/master1vol-2/replace-brick/1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/master1vol-2/replace-brick/1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x0000002d0000000000000000 trusted.afr.master2vol-2-client-1=0x000000020000000000000000 trusted.gfid=0x90959a4163dc4fe0b6d9f1223b1ab40f trusted.gfid2path.d6e66232a352f62e=0x31363365353336322d393862312d343836652d393061392d3437313437633165306662302f31 trusted.glusterfs.0f62e637-15ae-4c64-828b-f7d83e08baf4.xtime=0x5cf343a9000264bd == Data-brick 2 # getfattr -m . -d -e hex /bricks/brick3/master1vol-2repl/replace-brick/1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick3/master1vol-2repl/replace-brick/1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x0000002c0000000000000000 trusted.afr.master2vol-2-client-0=0x000000010000000000000000 trusted.gfid=0x90959a4163dc4fe0b6d9f1223b1ab40f trusted.gfid2path.d6e66232a352f62e=0x31363365353336322d393862312d343836652d393061392d3437313437633165306662302f31 trusted.glusterfs.0f62e637-15ae-4c64-828b-f7d83e08baf4.xtime=0x5cf343ae0005e2f7 == Arbiter brick # getfattr -m . -d -e hex /bricks/brick1/master1vol-2/replace-brick/1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/master1vol-2/replace-brick/1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x0000002c0000000000000000 trusted.afr.master2vol-2-client-0=0x000000010000000000000000 trusted.afr.master2vol-2-client-1=0x000000020000000000000000 trusted.gfid=0x90959a4163dc4fe0b6d9f1223b1ab40f trusted.gfid2path.d6e66232a352f62e=0x31363365353336322d393862312d343836652d393061392d3437313437633165306662302f31 trusted.glusterfs.0f62e637-15ae-4c64-828b-f7d83e08baf4.xtime=0x5cf343ac000e9301 ============================================================================= System details and sos-report to be provide in following comment