+++ This bug was initially created as a clone of Bug #1329503 +++ +++ This bug was initially created as a clone of Bug #1326248 +++ Copy pasting description and RCA for public use Description of problem: On an NFS mount, when large files are written and detach tier operation is started, input/output error is seen. [root@dhcp46-9 mnt]# while true; do for i in {1..5};do dd if=/dev/urandom of=file$i bs=1024 count=700000;echo $?;done; echo 'end of cycle'; done 700000+0 records in 700000+0 records out 716800000 bytes (717 MB) copied, 73.3324 s, 9.8 MB/s 0 700000+0 records in 700000+0 records out 716800000 bytes (717 MB) copied, 71.0725 s, 10.1 MB/s 0 dd: error writing ‘file3’: Input/output error 600027+0 records in 600026+0 records out 614426624 bytes (614 MB) copied, 70.7233 s, 8.7 MB/s 1 700000+0 records in 700000+0 records out 716800000 bytes (717 MB) copied, 75.3172 s, 9.5 MB/s 0 700000+0 records in 700000+0 records out 716800000 bytes (717 MB) copied, 73.2562 s, 9.8 MB/s 0 end of cycle [2016-04-12 01:43:39.423991] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error] [2016-04-12 01:43:39.424838] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error] [2016-04-12 01:43:39.425705] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error] [2016-04-12 01:43:39.429049] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error] [2016-04-12 01:43:39.430226] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error] [root@dhcp47-105 ~]# gluster v info Volume Name: testvol Type: Tier Volume ID: 02427025-adcf-48a2-ac58-ae494839e9f8 Status: Started Number of Bricks: 12 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.46.94:/bricks/brick3/leg1 Brick2: 10.70.47.9:/bricks/brick3/leg1 Brick3: 10.70.47.105:/bricks/brick3/leg1 Brick4: 10.70.47.90:/bricks/brick3/leg1 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 4 x 2 = 8 Brick5: 10.70.47.90:/bricks/brick0/ct Brick6: 10.70.47.105:/bricks/brick0/ct Brick7: 10.70.47.9:/bricks/brick0/ct Brick8: 10.70.46.94:/bricks/brick0/ct Brick9: 10.70.47.90:/bricks/brick1/ct Brick10: 10.70.47.105:/bricks/brick1/ct Brick11: 10.70.47.9:/bricks/brick1/ct Brick12: 10.70.46.94:/bricks/brick1/ct Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on Version-Release number of selected component (if applicable): glusterfs-server-3.7.9-1.el7rhgs.x86_64 How reproducible: 2/3 Steps to Reproduce: 1) create a dist-rep and start it followed by enabling quota 2) now nfs mount the volume and use dd command to create say 5 files of atleast 700MB each " for i in {1..5};do dd if=/dev/urandom of=file$i bs=1024 count=700000;echo $?;done" 3) Now while dd is in progress, perform an attach tier operation 4) After attach tier is successful, Perform detach tier start --> This is when dd throws IO error Actual results: IO error is seen Expected results: No IO error should be seen during detach tier operation Additional info: --- Additional comment from Mohammed Rafi KC on 2016-04-21 10:40:23 EDT --- RCA: NFS uses anonymous fd when writing into a file. If the file moved from cached subvol then write or lock from afr will fail with ENOENT. When write fails, first we will check migration complete check from dht layer. Which does a lookup on the previous source subvol. Since the file moved from there, this lookup will fail. So it will set readable flag to 0 for all subvolume in afr. At this point, the tier still has cached_subvolume as old source. So any subsequent request will again send to the same subvolume. That will cause afr to throw EIO error. Tier layer update cached_subvol only after it completes "migration complete check". So this race window will be in between migration complete check from dht later and tier layer.
Cloned this bug to track the fix for afr code.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.