Bug 1264831 - Data Tiering:Loss of data writes(IO error) to an existing file when detach-tier start is issued(seems like writes and detach are mutually exclusive)
Summary: Data Tiering:Loss of data writes(IO error) to an existing file when detach-ti...
Status: CLOSED DUPLICATE of bug 1265890
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: 3.7.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact: bugs@gluster.org
Depends On:
Blocks: 1260923
TreeView+ depends on / blocked
Reported: 2015-09-21 10:25 UTC by Nag Pavan Chilakam
Modified: 2015-10-30 17:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2015-10-02 02:48:49 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)

Description Nag Pavan Chilakam 2015-09-21 10:25:04 UTC
Description of problem:
while  writes are going to a file if user issue a detach-tier, then the IOs are getting missed to the file.
For eg, I mounted a tier vol on fuse and issued file creates of size 100Mb as below:
[root@localhost lanka]# for i in {1..100};do dd if=/dev/urandom of=file.$i bs=1024 count=100000;done

Now after about 2 files were created and when third file was in progress and about 70% of writes(70MB) was created, I issued a detach-tier start. The writes to the file stopped there and detach-tier went on. After detach tier start completed, it started to create the 4th file on cold tier. That means the 3rd file was incomplete and writes were missed

See brick logs:
[root@zod ~]# ###############no destach start when file3 is creating##########
[root@zod ~]# ll /rhs/brick*/lank*
total 335244
-rw-r--r--. 2 root root 102400000 Sep 21 15:26 file.1
-rw-r--r--. 2 root root 102400000 Sep 21 15:27 file.2
-rw-r--r--. 2 root root  71508992 Sep 21 15:27 file.3(writes stopped to this file as soon as detach-tier start was issued)(also files were migrated to cold brick as here)
-rw-r--r--. 2 root root  57340928 Sep 21 15:28 file.4(new file create post detach tier start completed)

==========Mount point fuse error========
[root@localhost lanka]# for i in {1..100};do dd if=/dev/urandom of=file.$i bs=1024 count=100000;done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 53.8811 s, 1.9 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 55.6703 s, 1.8 MB/s
dd: error writing ‘file.3’: Input/output error
69858+0 records in
69857+0 records out
71533568 bytes (72 MB) copied, 41.3222 s, 1.7 MB/s
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 59.3975 s, 1.7 MB/s
100000+0 records in
100000+0 records out

Version-Release number of selected component (if applicable):

[root@zod ~]# gluster --version
glusterfs 3.7.4 built on Sep 19 2015 01:30:43
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@zod ~]# rpm -qa|grep gluster
[root@zod ~]# 

[root@zod ~]# gluster v info
Volume Name: lanka
Type: Tier
Volume ID: 258a9a07-43e8-417e-8152-880ca5186f53
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: yarrow:/rhs/brick7/lanka_hot
Brick2: zod:/rhs/brick7/lanka_hot
Brick3: yarrow:/rhs/brick6/lanka_hot
Brick4: zod:/rhs/brick6/lanka_hot
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick5: zod:/rhs/brick1/lanka
Brick6: yarrow:/rhs/brick1/lanka
Brick7: zod:/rhs/brick2/lanka
Brick8: yarrow:/rhs/brick2/lanka
Brick9: zod:/rhs/brick3/lanka
Brick10: yarrow:/rhs/brick3/lanka
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
[root@zod ~]# gluster v status
Status of volume: lanka
Gluster process                             TCP Port  RDMA Port  Online  Pid
Hot Bricks:
Brick yarrow:/rhs/brick7/lanka_hot          49261     0          Y       23253
Brick zod:/rhs/brick7/lanka_hot             49278     0          Y       8213 
Brick yarrow:/rhs/brick6/lanka_hot          49260     0          Y       23234
Brick zod:/rhs/brick6/lanka_hot             49277     0          Y       8195 
Cold Bricks:
Brick zod:/rhs/brick1/lanka                 49274     0          Y       8015 
Brick yarrow:/rhs/brick1/lanka              49257     0          Y       22961
Brick zod:/rhs/brick2/lanka                 49275     0          Y       8033 
Brick yarrow:/rhs/brick2/lanka              49258     0          Y       22981
Brick zod:/rhs/brick3/lanka                 49276     0          Y       8051 
Brick yarrow:/rhs/brick3/lanka              49259     0          Y       22999
NFS Server on localhost                     2049      0          Y       8232 
Quota Daemon on localhost                   N/A       N/A        Y       8246 
NFS Server on yarrow                        2049      0          Y       23294
Quota Daemon on yarrow                      N/A       N/A        Y       23324
Task Status of Volume lanka
Task                 : Rebalance           
ID                   : 4306a687-1a83-4df8-8890-bdec702820c0
Status               : in progress         

Steps to Reproduce:
1.attach tier layer to a volume with quota enabled
2.enable ctr
3.now fuse mount the volume
4. Now start file creates for about 15 files  in loop using dd command, of say 100MB, 
5)Now, after two files are created completely, and while 3rd file create has started, issue a detach tier start
6)after detach tier start is completed, it can be seen that the 3rd file create abrubptbly ends and 4th file is created on cold layer, with IO error for the 3rd file on mount

Expected results:
seemless detach-tier is required

Comment 1 Dan Lambright 2015-10-01 15:59:33 UTC
I have as of yet been unable to recreate this on my machines with a similar set up. May need to see logs, etc.

Comment 2 Dan Lambright 2015-10-02 02:48:49 UTC
I loaded the older build for the week this problem was found and it recreated right away. Then I loaded new code but removed fix 12223. That fix corrected problems created during graph switches (like detach), which would lead to aborted I/Os. 

After more experiments confirming, I believe fix 12223 corrected the problem.

*** This bug has been marked as a duplicate of bug 1265890 ***

Note You need to log in before you can comment on or make changes to this bug.