1342830 – [Tiering]: when hot tier's subvol is brought down and up later, files from those subvols aren't listed in mountpoint

Bug 1342830 - [Tiering]: when hot tier's subvol is brought down and up later, files from those subvols aren't listed in mountpoint

Summary: [Tiering]: when hot tier's subvol is brought down and up later, files from th...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1343002
TreeView+	depends on / blocked

Reported:	2016-06-05 16:51 UTC by krishnaram Karthick
Modified:	2018-02-06 17:52 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1343002 (view as bug list)
Environment:
Last Closed:	2018-02-06 17:52:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description krishnaram Karthick 2016-06-05 16:51:49 UTC

Description of problem:
When a sub-volume is brought down and restored, files which existed on that sub-vol before failure isn't listed on the mountpoint. 

gluster v info
 
Volume Name: brick-down-case
Type: Tier
Volume ID: 6adac192-0aeb-4398-b928-ffe416c5edc2
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: 10.70.47.187:/bricks/brick1/bd1
Brick2: 10.70.46.103:/bricks/brick1/bd1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick3: 10.70.46.103:/bricks/brick0/bd1
Brick4: 10.70.47.187:/bricks/brick0/bd1
Brick5: 10.70.47.171:/bricks/brick0/bd1
Brick6: 10.70.47.128:/bricks/brick0/bd1
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable

=== gluster v status, after subvol failure was restored ====

[root@dhcp46-103 ~]# gluster v status
Status of volume: brick-down-case
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.47.187:/bricks/brick1/bd1       49368     0          Y       30638
Brick 10.70.46.103:/bricks/brick1/bd1       49358     0          Y       32729
Cold Bricks:
Brick 10.70.46.103:/bricks/brick0/bd1       49357     0          Y       32648
Brick 10.70.47.187:/bricks/brick0/bd1       49367     0          Y       30310
Brick 10.70.47.171:/bricks/brick0/bd1       49341     0          Y       3807 
Brick 10.70.47.128:/bricks/brick0/bd1       49196     0          Y       23661
NFS Server on localhost                     2049      0          Y       474  
Self-heal Daemon on localhost               N/A       N/A        Y       482  
NFS Server on 10.70.47.128                  2049      0          Y       23931
Self-heal Daemon on 10.70.47.128            N/A       N/A        Y       23940
NFS Server on 10.70.47.187                  2049      0          Y       30658
Self-heal Daemon on 10.70.47.187            N/A       N/A        Y       30666
NFS Server on 10.70.47.171                  2049      0          Y       4064 
Self-heal Daemon on 10.70.47.171            N/A       N/A        Y       4075 
 
Task Status of Volume brick-down-case
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : bbaafd57-02c0-4316-ad84-4a220c4c98c8
Status               : in progress         


From mountpoint after brick '10.70.47.187:/bricks/brick1/bd1' is taken down and then brought up

[root@dhcp46-9 brick-down-case]# ll all-bbricks-up/
total 30720
-rw-r--r--. 1 root root 10485760 Jun  5  2016 file-1
-rw-r--r--. 1 root root 10485760 Jun  5  2016 file-3
-rw-r--r--. 1 root root 10485760 Jun  5  2016 file-6

list of files from backend:

[root@dhcp46-103 ~]# ll /bricks/brick1/bd1/all-bbricks-up/
total 30720
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-1
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-3
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-6


[root@dhcp47-187 ~]# ll /bricks/brick1/bd1/all-bbricks-up/
total 71680
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-10
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-2
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-4
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-5
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-7
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-8
-rw-r--r--. 2 root root 10485760 Jun  5 16:23 file-9


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. create a tiered vol as created in 'gluster v info' o/p above
2. from a fuse mountpoint, create a directory and within the dir, create 10 files
3. kill one of the hot tier brick process = kill -15 <pid of brick>
4. Try creating a new directory and files within the dir - file creation fails
5. restore the brick failure - gluster v start <volname> force
6. from the fuse mountpoint, list the files and check if all files that existed before brick failure still exists.

Actual results:
Files that existed on failed bricks aren't listed

Expected results:
All files should be listed

Additional info:
sosreports shall be attached

Comment 4 krishnaram Karthick 2016-06-06 05:28:25 UTC

The linkto files on the corresponding cold tier seems to be missing.


======= from mountpoint =========


before subvol was brought down

-rw-r--r--. 1 root root 10485760 Jun  6 10:39 file-1
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-10
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-2
-rw-r--r--. 1 root root 10485760 Jun  6 10:39 file-3
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-4
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-5
-rw-r--r--. 1 root root 10485760 Jun  6 10:39 file-6
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-7
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-8
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-9


after subvol was down and retore

ll
total 30720
-rw-r--r--. 1 root root 10485760 Jun  6 10:39 file-1
-rw-r--r--. 1 root root 10485760 Jun  6 10:39 file-3
-rw-r--r--. 1 root root 10485760 Jun  6 10:39 file-6



====== backend bricks =====

[root@dhcp47-28 files]# ll /bricks/brick{0,1}/abcd/files/
/bricks/brick0/abcd/files/:
total 0
---------T. 2 root root 0 Jun  6 10:39 file-1
---------T. 2 root root 0 Jun  6 10:39 file-3
---------T. 2 root root 0 Jun  6 10:39 file-6

/bricks/brick1/abcd/files/:
total 30720
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-1
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-3
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-6

[root@dhcp46-142 ~]# ll /bricks/brick{0,1}/abcd/files/
/bricks/brick0/abcd/files/:
total 0
---------T. 2 root root 0 Jun  6 10:39 file-1
---------T. 2 root root 0 Jun  6 10:39 file-3
---------T. 2 root root 0 Jun  6 10:39 file-6

/bricks/brick1/abcd/files/:
total 71680
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-10
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-2
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-4
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-5
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-7
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-8
-rw-r--r--. 2 root root 10485760 Jun  6 10:39 file-9


[root@dhcp46-44 ~]# ll /bricks/brick0/abcd/files/
total 0
[root@dhcp46-44 ~]# 


[root@dhcp46-58 ~]# ll /bricks/brick0/abcd/files/
total 0


====== volume configuration =======

gluster v status
Status of volume: sd-down
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.46.142:/bricks/brick1/abcd      49410     0          Y       22308
Brick 10.70.47.28:/bricks/brick1/abcd       49410     0          Y       2768 
Cold Bricks:
Brick 10.70.47.28:/bricks/brick0/abcd       49409     0          Y       2689 
Brick 10.70.46.142:/bricks/brick0/abcd      49409     0          Y       22070
Brick 10.70.46.44:/bricks/brick0/abcd       49400     0          Y       2390 
Brick 10.70.46.58:/bricks/brick0/abcd       49401     0          Y       3809 
NFS Server on localhost                     2049      0          Y       2996 
Self-heal Daemon on localhost               N/A       N/A        Y       3004 
NFS Server on 10.70.46.44                   2049      0          Y       2571 
Self-heal Daemon on 10.70.46.44             N/A       N/A        Y       2579 
NFS Server on 10.70.46.142                  2049      0          Y       22328
Self-heal Daemon on 10.70.46.142            N/A       N/A        Y       22336
NFS Server on 10.70.46.58                   2049      0          Y       3991 
Self-heal Daemon on 10.70.46.58             N/A       N/A        Y       3999 
 
Task Status of Volume sd-down
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 120068cd-a8fb-4dc7-a9f0-957c52d2015d
Status               : in progress         
 
[root@dhcp47-28 ~]# gluster v info
 
Volume Name: sd-down
Type: Tier
Volume ID: ae7562dc-c199-4f1d-8866-a2b5a183a7be
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: 10.70.46.142:/bricks/brick1/abcd
Brick2: 10.70.47.28:/bricks/brick1/abcd
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick3: 10.70.47.28:/bricks/brick0/abcd
Brick4: 10.70.46.142:/bricks/brick0/abcd
Brick5: 10.70.46.44:/bricks/brick0/abcd
Brick6: 10.70.46.58:/bricks/brick0/abcd
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on

Comment 5 Nithya Balachandran 2016-06-06 09:13:01 UTC

RCA:

The hot tier is a pure distribute volume so an entire DHT subvol is unavailable when a single brick is brought down.

Tier_readdirp lists files by reading only the cold tier and then doing lookups on any linkto files found. The lookup on the linkto file for the data file on the brick that is down fails with ENOENT (as the ENOTCONN op_errno is overwritten by the ENOENT from the brick that is up), the linkto file is considered stale and deleted. So even after the brick is brought back up again, as the linkto files are no longer present, they are no longer listed from the mount point.

Workaround:
Once all bricks are up:
ls <filename> 

will recreate the linkto file.

This should not happen with a dist-rep hot tier.

Comment 9 Shyamsundar 2018-02-06 17:52:32 UTC

Thank you for your bug report.

We are not further root causing this bug, as a result this bug is being closed as WONTFIX. Please reopen if the problem continues to be observed after upgrading
to a latest version.

Note You need to log in before you can comment on or make changes to this bug.