Bug 1264831 - Data Tiering:Loss of data writes(IO error) to an existing file when detach-tier start is issued(seems like writes and detach are mutually exclusive)
Data Tiering:Loss of data writes(IO error) to an existing file when detach-ti...
Status: CLOSED DUPLICATE of bug 1265890
Product: GlusterFS
Classification: Community
Component: tiering (Show other bugs)
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Nithya Balachandran
Depends On:
Blocks: 1260923
  Show dependency treegraph
Reported: 2015-09-21 06:25 EDT by nchilaka
Modified: 2015-10-30 13:32 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-10-01 22:48:49 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description nchilaka 2015-09-21 06:25:04 EDT
Description of problem:
while  writes are going to a file if user issue a detach-tier, then the IOs are getting missed to the file.
For eg, I mounted a tier vol on fuse and issued file creates of size 100Mb as below:
[root@localhost lanka]# for i in {1..100};do dd if=/dev/urandom of=file.$i bs=1024 count=100000;done

Now after about 2 files were created and when third file was in progress and about 70% of writes(70MB) was created, I issued a detach-tier start. The writes to the file stopped there and detach-tier went on. After detach tier start completed, it started to create the 4th file on cold tier. That means the 3rd file was incomplete and writes were missed

See brick logs:
[root@zod ~]# ###############no destach start when file3 is creating##########
[root@zod ~]# ll /rhs/brick*/lank*
total 335244
-rw-r--r--. 2 root root 102400000 Sep 21 15:26 file.1
-rw-r--r--. 2 root root 102400000 Sep 21 15:27 file.2
-rw-r--r--. 2 root root  71508992 Sep 21 15:27 file.3(writes stopped to this file as soon as detach-tier start was issued)(also files were migrated to cold brick as here)
-rw-r--r--. 2 root root  57340928 Sep 21 15:28 file.4(new file create post detach tier start completed)

==========Mount point fuse error========
[root@localhost lanka]# for i in {1..100};do dd if=/dev/urandom of=file.$i bs=1024 count=100000;done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 53.8811 s, 1.9 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 55.6703 s, 1.8 MB/s
dd: error writing ‘file.3’: Input/output error
69858+0 records in
69857+0 records out
71533568 bytes (72 MB) copied, 41.3222 s, 1.7 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 59.3975 s, 1.7 MB/s
100000+0 records in
100000+0 records out

Version-Release number of selected component (if applicable):

[root@zod ~]# gluster --version
glusterfs 3.7.4 built on Sep 19 2015 01:30:43
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@zod ~]# rpm -qa|grep gluster
[root@zod ~]# 

[root@zod ~]# gluster v info
Volume Name: lanka
Type: Tier
Volume ID: 258a9a07-43e8-417e-8152-880ca5186f53
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: yarrow:/rhs/brick7/lanka_hot
Brick2: zod:/rhs/brick7/lanka_hot
Brick3: yarrow:/rhs/brick6/lanka_hot
Brick4: zod:/rhs/brick6/lanka_hot
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick5: zod:/rhs/brick1/lanka
Brick6: yarrow:/rhs/brick1/lanka
Brick7: zod:/rhs/brick2/lanka
Brick8: yarrow:/rhs/brick2/lanka
Brick9: zod:/rhs/brick3/lanka
Brick10: yarrow:/rhs/brick3/lanka
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
[root@zod ~]# gluster v status
Status of volume: lanka
Gluster process                             TCP Port  RDMA Port  Online  Pid
Hot Bricks:
Brick yarrow:/rhs/brick7/lanka_hot          49261     0          Y       23253
Brick zod:/rhs/brick7/lanka_hot             49278     0          Y       8213 
Brick yarrow:/rhs/brick6/lanka_hot          49260     0          Y       23234
Brick zod:/rhs/brick6/lanka_hot             49277     0          Y       8195 
Cold Bricks:
Brick zod:/rhs/brick1/lanka                 49274     0          Y       8015 
Brick yarrow:/rhs/brick1/lanka              49257     0          Y       22961
Brick zod:/rhs/brick2/lanka                 49275     0          Y       8033 
Brick yarrow:/rhs/brick2/lanka              49258     0          Y       22981
Brick zod:/rhs/brick3/lanka                 49276     0          Y       8051 
Brick yarrow:/rhs/brick3/lanka              49259     0          Y       22999
NFS Server on localhost                     2049      0          Y       8232 
Quota Daemon on localhost                   N/A       N/A        Y       8246 
NFS Server on yarrow                        2049      0          Y       23294
Quota Daemon on yarrow                      N/A       N/A        Y       23324
Task Status of Volume lanka
Task                 : Rebalance           
ID                   : 4306a687-1a83-4df8-8890-bdec702820c0
Status               : in progress         

Steps to Reproduce:
1.attach tier layer to a volume with quota enabled
2.enable ctr
3.now fuse mount the volume
4. Now start file creates for about 15 files  in loop using dd command, of say 100MB, 
5)Now, after two files are created completely, and while 3rd file create has started, issue a detach tier start
6)after detach tier start is completed, it can be seen that the 3rd file create abrubptbly ends and 4th file is created on cold layer, with IO error for the 3rd file on mount

Expected results:
seemless detach-tier is required
Comment 1 Dan Lambright 2015-10-01 11:59:33 EDT
I have as of yet been unable to recreate this on my machines with a similar set up. May need to see logs, etc.
Comment 2 Dan Lambright 2015-10-01 22:48:49 EDT
I loaded the older build for the week this problem was found and it recreated right away. Then I loaded new code but removed fix 12223. That fix corrected problems created during graph switches (like detach), which would lead to aborted I/Os. 

After more experiments confirming, I believe fix 12223 corrected the problem.

*** This bug has been marked as a duplicate of bug 1265890 ***

Note You need to log in before you can comment on or make changes to this bug.