Bug 1326248 - [tiering]: during detach tier operation, Input/output error is seen with new file writes on NFS mount
Summary: [tiering]: during detach tier operation, Input/output error is seen with new ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: All
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.1.3
Assignee: Mohammed Rafi KC
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1311817 1329503 1329505 1330428
TreeView+ depends on / blocked
 
Reported: 2016-04-12 09:04 UTC by krishnaram Karthick
Modified: 2016-09-17 15:43 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.7.9-3
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1329503 (view as bug list)
Environment:
Last Closed: 2016-06-23 05:17:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Description krishnaram Karthick 2016-04-12 09:04:56 UTC
Description of problem:
On an NFS mount, when large files are written and detach tier operation is started, input/output error is seen. 

[root@dhcp46-9 mnt]# while true; do for i in {1..5};do dd if=/dev/urandom of=file$i bs=1024 count=700000;echo $?;done; echo 'end of cycle'; done
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 73.3324 s, 9.8 MB/s
0
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 71.0725 s, 10.1 MB/s
0
dd: error writing ‘file3’: Input/output error
600027+0 records in
600026+0 records out
614426624 bytes (614 MB) copied, 70.7233 s, 8.7 MB/s
1
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 75.3172 s, 9.5 MB/s
0
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 73.2562 s, 9.8 MB/s
0
end of cycle

[2016-04-12 01:43:39.423991] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error]
[2016-04-12 01:43:39.424838] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error]
[2016-04-12 01:43:39.425705] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error]
[2016-04-12 01:43:39.429049] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error]
[2016-04-12 01:43:39.430226] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed. [Input/output error]

[root@dhcp47-105 ~]# gluster v info
 
Volume Name: testvol
Type: Tier
Volume ID: 02427025-adcf-48a2-ac58-ae494839e9f8
Status: Started
Number of Bricks: 12
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.46.94:/bricks/brick3/leg1
Brick2: 10.70.47.9:/bricks/brick3/leg1
Brick3: 10.70.47.105:/bricks/brick3/leg1
Brick4: 10.70.47.90:/bricks/brick3/leg1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick5: 10.70.47.90:/bricks/brick0/ct
Brick6: 10.70.47.105:/bricks/brick0/ct
Brick7: 10.70.47.9:/bricks/brick0/ct
Brick8: 10.70.46.94:/bricks/brick0/ct
Brick9: 10.70.47.90:/bricks/brick1/ct
Brick10: 10.70.47.105:/bricks/brick1/ct
Brick11: 10.70.47.9:/bricks/brick1/ct
Brick12: 10.70.46.94:/bricks/brick1/ct
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on

Version-Release number of selected component (if applicable):
glusterfs-server-3.7.9-1.el7rhgs.x86_64

How reproducible:
2/3 

Steps to Reproduce:
1) create a dist-rep and start it followed by enabling quota
2) now nfs mount the volume and use dd command to create say 5 files of atleast 700MB each " for i in {1..5};do dd if=/dev/urandom of=file$i bs=1024 count=700000;echo $?;done"
3) Now while dd is in progress, perform an attach tier operation
4) After attach tier is successful, Perform detach tier start --> This is when dd throws IO error


Actual results:
IO error is seen

Expected results:
No IO error should be seen  during detach tier operation

Additional info:
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1319634/

Comment 2 Mohammed Rafi KC 2016-04-19 07:04:01 UTC
I was able to reproduce this issue in my setup with latest upstream. We are debugging this issue, hopefully we will have an RCA to update .

Comment 3 Mohammed Rafi KC 2016-04-21 14:40:23 UTC
RCA:

NFS uses anonymous fd when writing into a file. If the file moved from cached subvol then write or lock from afr will fail with ENOENT. When write fails, first we will check migration complete check from dht layer. Which does a lookup on the previous source subvol. Since the file moved from there, this lookup will fail. So it will set readable flag to 0 for all subvolume in afr. At this point, the tier still has cached_subvolume as old source. So any subsequent request will again send to the same subvolume. That will cause afr to throw EIO error.

Tier layer update cached_subvol only after it completes "migration complete check". So this race window will be in between  migration complete check from dht later and tier layer.

Comment 4 Mohammed Rafi KC 2016-04-22 06:47:56 UTC
upstream patch : http://review.gluster.org/14049

Comment 8 krishnaram Karthick 2016-05-17 13:58:20 UTC
Ran the test mentioned in steps to reproduce for 5 iterations, the issue reported in the bug is no more seen in build glusterfs-3.7.9-5.

Moving the bug to verified.

Comment 10 errata-xmlrpc 2016-06-23 05:17:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.