1564071 – directories are invisible on client side

Bug 1564071 - directories are invisible on client side

Summary: directories are invisible on client side

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	unclassified
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Nithya Balachandran
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1603093 1603099 1603103
TreeView+	depends on / blocked

Reported:	2018-04-05 09:39 UTC by g.amedick
Modified:	2018-10-23 15:06 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-5.0
Clone Of:
Clones:	1603093 1603099 1603103 (view as bug list)
Environment:
Last Closed:	2018-10-23 15:06:59 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
requested output list of getfattr for all directories (25.11 KB, text/plain) 2018-04-06 07:56 UTC, g.amedick	no flags	Details
The requested tcpdump (644.89 KB, text/plain) 2018-04-27 13:35 UTC, g.amedick	no flags	Details
mount log of the volume (75.95 KB, text/plain) 2018-04-27 13:35 UTC, g.amedick	no flags	Details
View All

Description g.amedick 2018-04-05 09:39:09 UTC

Description of problem:

Some directories and files can't be seen by the clients. They are there, they can be entered  but they don't show up in ls. If moved, they appear. If moved back to the old name, they disappear again.

On the server:

> The directory exists on all bricks and contains some files.
$ cd /$path_to_brick/data/dir1/
$ ls
dir2
$ find /$path_to_brick/data/dir1/
/$path_to_brick/data/dir1/dir2
/$path_to_brick/data/dir1/dir2/file1
/$path_to_brick/data/dir1/dir2/file2

on all clients:

> The directory seems empty.
$ cd /$path_to_mount/dir1/
$ ls
$
> We can list the files in the invisible directory.
$ ls dir2
file1 file2
> We can move the directory to make it reappear.
$ mv dir2 dir2_new
$ ls
dir2_new
> But after moving back it disappears again.
$ mv dir2_new dir2
$ ls
$
> We can't remove the affected directory.
$ cd; rm -rf /$path_to_mount/dir1/
rm: cannot remove ‘/$path_to_mount/dir1/dir2’: Directory not empty
rm: cannot remove ‘/$path_to_mount/dir1/dir3’: Directory not empty

Some files in the affected dir show the same behaviour.

What we tried so far:
Setting "performance.readdir-ahead" to off didn't change the behaviour.

Version-Release number of selected component (if applicable):
$ gluster --version
glusterfs 3.12.7
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

$ dpkg -l | grep glusterfs-server
ii  glusterfs-server               3.12.7-1                       amd64        clustered file-system (server package)

$ cat /etc/apt/sources.list.d/gluster.list 
deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/3.12/LATEST/Debian/stretch/amd64/apt stretch main

How reproducible:
Seems to be random. We weren't able to break a directory on purpose. An affected directory didn't return back to normal.

Additional info:
Gluster is installed on Debian Stretch using https://download.gluster.org and updated from Gluster 3.8 a while ago and 1 day before the first seen appearance updated from 3.12.6 to 3.12.7.
A clean reboot of all GlusterFS servers happened during this update. 

According to the user who owns the affected directory, it was created after the update.

The volume is mounted via FUSE.

$ gluster volume info $volume

Volume Name: $volume
Type: Distribute
Volume ID: 0d210c70-e44f-46f1-862c-ef260514c9f1
Status: Started
Snapshot Count: 0
Number of Bricks: 23
Transport-type: tcp
Bricks:
Brick1: gluster02:/srv/glusterfs/bricks/DATA201/data
Brick2: gluster02:/srv/glusterfs/bricks/DATA202/data
Brick3: gluster02:/srv/glusterfs/bricks/DATA203/data
Brick4: gluster02:/srv/glusterfs/bricks/DATA204/data
Brick5: gluster02:/srv/glusterfs/bricks/DATA205/data
Brick6: gluster02:/srv/glusterfs/bricks/DATA206/data
Brick7: gluster02:/srv/glusterfs/bricks/DATA207/data
Brick8: gluster02:/srv/glusterfs/bricks/DATA208/data
Brick9: gluster01:/srv/glusterfs/bricks/DATA110/data
Brick10: gluster01:/srv/glusterfs/bricks/DATA111/data
Brick11: gluster01:/srv/glusterfs/bricks/DATA112/data
Brick12: gluster01:/srv/glusterfs/bricks/DATA113/data
Brick13: gluster01:/srv/glusterfs/bricks/DATA114/data
Brick14: gluster02:/srv/glusterfs/bricks/DATA209/data
Brick15: gluster01:/srv/glusterfs/bricks/DATA101/data
Brick16: gluster01:/srv/glusterfs/bricks/DATA102/data
Brick17: gluster01:/srv/glusterfs/bricks/DATA103/data
Brick18: gluster01:/srv/glusterfs/bricks/DATA104/data
Brick19: gluster01:/srv/glusterfs/bricks/DATA105/data
Brick20: gluster01:/srv/glusterfs/bricks/DATA106/data
Brick21: gluster01:/srv/glusterfs/bricks/DATA107/data
Brick22: gluster01:/srv/glusterfs/bricks/DATA108/data
Brick23: gluster01:/srv/glusterfs/bricks/DATA109/data
Options Reconfigured:
nfs.addr-namelookup: off
transport.address-family: inet
nfs.disable: on
diagnostics.brick-log-level: ERROR
performance.readdir-ahead: on
auth.allow: $OUR_IP_SPACE
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

Nothing interesting ran into the brick logs. Just normal boot stuff as always.

Comment 1 Raghavendra G 2018-04-06 06:42:48 UTC

(In reply to g.amedick from comment #0)
> Description of problem:
> 
> Some directories and files can't be seen by the clients. They are there,
> they can be entered  but they don't show up in ls. If moved, they appear. If
> moved back to the old name, they disappear again.
> 
> On the server:
> 
> > The directory exists on all bricks and contains some files.
> $ cd /$path_to_brick/data/dir1/
> $ ls
> dir2
> $ find /$path_to_brick/data/dir1/
> /$path_to_brick/data/dir1/dir2
> /$path_to_brick/data/dir1/dir2/file1
> /$path_to_brick/data/dir1/dir2/file2

I see that there are 23 bricks. But listed here are only 3 bricks. Is it possible to figure out:
1. dir2 exists on all 23 bricks
2. getfattr -e hex -m. -d dir1 on all bricks

It looks to be an issue of dir2 missing on some bricks.

Comment 2 g.amedick 2018-04-06 07:56:14 UTC

Created attachment 1417986 [details]
requested output list of getfattr for all directories

Comment 3 g.amedick 2018-04-06 07:59:42 UTC

Hi,

1. I can see all three affected folders on every brick.

2. We have so far found three affected dirs (dir2, dir3, dir4). I've attached the info.

I found another curious thing while verifying that all directories exist on all bricks: 
as said, we have "hidden" files, too. We can rename them without getting an error, but they don't reappear. They do get renamed on the brick. Now I saw that a copy of the file with the same name was created on another brick. Neither of them show up on client side.

Comment 4 Frank Rühlemann 2018-04-06 09:30:26 UTC

Hi,

I'm not sure if this is relevant to this case but as a colleague I found an other strange thing on the same volume.

(on gluster02:)
# gluster volume rebalance $volumename status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            11421     16384.0PB       2629168             0        113066            completed       17:16:58
                               gluster01            18936       770.3MB       2710359             0        203851            completed       17:16:58
volume rebalance: $volumename: success

This rebalance ran weeks ago and we didn't notice any error. 16384.0PB is way too much and looks like an error.

We didn't start any rebalance since the reboot (and first notification of the bug) due to concerns about the data safety.

Do you think running a new rebalance might help (and hopefully fix this issue) or would you advice against it?

Comment 5 g.amedick 2018-04-12 11:33:21 UTC

Hi,

a fourth affected directory appeared, and there is something interesting about it. In the 3 first directories, there were ~10 files, about 2/3 of them "hidden". They were pretty much scattered over the bricks.

Now, the new affected directory has about 2.000 "hidden" files, and they are all located on 4 bricks on the same server (gluster02). On all other bricks, the folder is empty. We don't know whether that's coincidence or not.

Is there any additional information we could provide and that might be helpful?

Any opinions on doing a rebalance?

Comment 6 g.amedick 2018-04-18 14:57:22 UTC

Hi,

the affected dirs seem to heal by themselves after 2-4 days. There are new ones popping up regularly now though.

we increased the log level from error to warning and entries like this are the result on pretty much every brick (paths of user directories replaced with $dir_… & we haven't seen files affected by the bug, but we don't know whether all hidden-directory-appearances are reported to us or whether we increased the loglevel before or after the last reported directory broke):

[2018-04-16 12:34:33.643937] W [MSGID: 120020] [quota.c:2755:quota_rename_continue] $volume-quota: quota context not set in inode (gfid:f44e77bc-a54c-4cb5-9f70-c581ed270f2d), considering file size as zero while enforcing quota on new ancestry
[2018-04-16 12:44:33.979176] W [MSGID: 113103] [posix.c:282:posix_lookup] $volume-posix: Found stale gfid handle /srv/glusterfs/bricks/DATA111/data/.glusterfs/93/fa/93fa35bb-22fe-40dc-9415-f08186ab1c93, removing it. [Stale file handle]
[2018-04-17 09:05:44.438907] A [MSGID: 120004] [quota.c:4998:quota_log_usage] $volume-quota: Usage is above soft limit: 187.4TB used by /$dir_1
[2018-04-17 18:34:52.084247] W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard] $volume-posix: link /srv/glusterfs/bricks/DATA111/data/$file_1 -> /srv/glusterfs/bricks/DATA111/data/.glusterfs/10/21/1021cecf-08dc-48ab-a44e-6542cc8e75acfailed  [File exists]
[2018-04-17 18:34:52.084325] E [MSGID: 113020] [posix.c:3162:posix_create] $volume-posix: setting gfid on /srv/glusterfs/bricks/DATA111/data/$file_1 failed
[2018-04-17 20:57:10.613860] W [MSGID: 113001] [posix.c:4421:posix_get_ancestry_non_directory] $volume-posix: listxattr failed on/srv/glusterfs/bricks/DATA111/data/.glusterfs/8d/7d/8d7dc368-b229-4d41-921c-546627a03248 [No such file or directory]
[2018-04-17 20:57:10.614719] W [marker-quota.c:33:mq_loc_copy] 0-marker: src loc is not valid
[2018-04-17 20:57:10.614818] E [marker-quota.c:1488:mq_initiate_quota_task] $volume-marker: loc copy failed
The message "W [MSGID: 113001] [posix.c:4421:posix_get_ancestry_non_directory] $volume-posix: listxattr failed on/srv/glusterfs/bricks/DATA111/data/.glusterfs/8d/7d/8d7dc368-b229-4d41-921c-546627a03248 [No such file or directory]" repeated 1300 times between [2018-04-17 20:57:10.613860] and [2018-04-17 20:57:11.536419]
[2018-04-17 21:34:42.809053] W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard] $volume-posix: link /srv/glusterfs/bricks/DATA111/data/$file_2 -> /srv/glusterfs/bricks/DATA111/data/.glusterfs/21/db/21db009b-aa53-4d26-afb1-8f1667574530failed  [File exists]
[2018-04-17 21:34:42.809115] E [MSGID: 113020] [posix.c:3162:posix_create] $volume-posix: setting gfid on /srv/glusterfs/bricks/DATA111/data/$file_2 failed
[2018-04-17 21:34:42.809944] E [MSGID: 113018] [posix.c:552:posix_setattr] $volume-posix: setattr (lstat) on /srv/glusterfs/bricks/DATA111/data/.glusterfs/21/db/21db009b-aa53-4d26-afb1-8f1667574530 failed [No such file or directory]
[2018-04-17 21:34:42.811179] E [MSGID: 113001] [posix.c:4874:posix_getxattr] $volume-posix: getxattr failed on /srv/glusterfs/bricks/DATA111/data/.glusterfs/21/db/21db009b-aa53-4d26-afb1-8f1667574530: trusted.glusterfs.dht.linkto  [No such file or directory]
[2018-04-18 09:08:02.273714] A [MSGID: 120004] [quota.c:4998:quota_log_usage] $volume-quota: Usage is above soft limit: 188.8TB used by /dir_1
[2018-04-18 10:50:54.072890] A [MSGID: 120004] [quota.c:4998:quota_log_usage] $volume-quota: Usage is above soft limit: 4.0TB used by /dir_2
[2018-04-18 10:50:54.073972] A [MSGID: 120004] [quota.c:4998:quota_log_usage] $volume-quota: Usage is above soft limit: 4.0TB used by /dir_2
[2018-04-18 11:20:12.880347] W [MSGID: 120020] [quota.c:2755:quota_rename_continue] $volume-quota: quota context not set in inode (gfid:367de5fb-c7c3-4bde-a8fa-a3a2cafc6abc), considering file size as zero while enforcing quota on new ancestry
[2018-04-18 11:20:16.865349] W [MSGID: 120020] [quota.c:2755:quota_rename_continue] $volume-quota: quota context not set in inode (gfid:cd8877f1-c5d5-47bd-8a60-c3224c13e724), considering file size as zero while enforcing quota on new ancestry
[2018-04-18 11:20:17.510650] W [MSGID: 120020] [quota.c:2755:quota_rename_continue] $volume-quota: quota context not set in inode (gfid:4eb88aac-8e95-4eac-8d48-1615a135efcd), considering file size as zero while enforcing quota on new ancestry

We are unsure whether this is related or relevant.

Comment 7 Nithya Balachandran 2018-04-23 16:33:46 UTC

Please provide the following:

1. The FUSE mount options being used
2. tcmpdump: From a fresh tmp FUSE mount, perform ls on a directory with invisible files/dirs using 
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
3. The output of gluster volume status
4. The FUSE mount logs
5. The actual location of the invisible files on the bricks (ls -l on the brick) and the xattrs and stat info for the files and parent directory.

Comment 8 g.amedick 2018-04-27 13:33:45 UTC

Hi,

as I said, the directories seem to heal over time. We currently don't know of a "hidden" folder. There are "hidden" files though. I'll proceed with those, hope it helps. I created a new mount at /mnt on a test compute node.

1. 
$ mount | grep /mnt
gluster01.FQDN:/$vol1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

$ systemctl status mnt.mount
● mnt.mount
   Loaded: loaded (/etc/systemd/system/mnt.mount; disabled)
   Active: active (mounted) since Tue 2018-04-24 16:52:06 CEST; 2 days ago
    Where: /mnt
     What: gluster01.FQDN:/$vol1
  Process: 3104 ExecMount=/bin/mount -n gluster01.FQDN:/$vol1 /mnt -t glusterfs -o defaults,_netdev,backupvolfile-server=gluster02.FQDN (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/mnt.mount
           └─3173 /usr/sbin/glusterfs --volfile-server=gluster01.FQDN --volfile-server=gluster02.FQDN --volfile-id=/$vol1 /mnt


2. see attachment.

3.
$ gluster volume status
Status of volume: $vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster02:/srv/glusterfs/bricks/DATA2
01/data                                     49152     0          Y       4064 
Brick gluster02:/srv/glusterfs/bricks/DATA2
02/data                                     49153     0          Y       4072 
Brick gluster02:/srv/glusterfs/bricks/DATA2
03/data                                     49154     0          Y       4080 
Brick gluster02:/srv/glusterfs/bricks/DATA2
04/data                                     49155     0          Y       4090 
Brick gluster02:/srv/glusterfs/bricks/DATA2
05/data                                     49156     0          Y       4098 
Brick gluster02:/srv/glusterfs/bricks/DATA2
06/data                                     49157     0          Y       4107 
Brick gluster02:/srv/glusterfs/bricks/DATA2
07/data                                     49158     0          Y       4116 
Brick gluster02:/srv/glusterfs/bricks/DATA2
08/data                                     49159     0          Y       4125 
Brick gluster01:/srv/glusterfs/bricks/DATA1
10/data                                     49152     0          Y       4418 
Brick gluster01:/srv/glusterfs/bricks/DATA1
11/data                                     49153     0          Y       4426 
Brick gluster01:/srv/glusterfs/bricks/DATA1
12/data                                     49154     0          Y       4434 
Brick gluster01:/srv/glusterfs/bricks/DATA1
13/data                                     49155     0          Y       4444 
Brick gluster01:/srv/glusterfs/bricks/DATA1
14/data                                     49156     0          Y       4452 
Brick gluster02:/srv/glusterfs/bricks/DATA2
09/data                                     49160     0          Y       4134 
Brick gluster01:/srv/glusterfs/bricks/DATA1
01/data                                     49157     0          Y       4461 
Brick gluster01:/srv/glusterfs/bricks/DATA1
02/data                                     49158     0          Y       4470 
Brick gluster01:/srv/glusterfs/bricks/DATA1
03/data                                     49159     0          Y       4479 
Brick gluster01:/srv/glusterfs/bricks/DATA1
04/data                                     49160     0          Y       4488 
Brick gluster01:/srv/glusterfs/bricks/DATA1
05/data                                     49161     0          Y       4498 
Brick gluster01:/srv/glusterfs/bricks/DATA1
06/data                                     49162     0          Y       4507 
Brick gluster01:/srv/glusterfs/bricks/DATA1
07/data                                     49163     0          Y       4516 
Brick gluster01:/srv/glusterfs/bricks/DATA1
08/data                                     49164     0          Y       4525 
Brick gluster01:/srv/glusterfs/bricks/DATA1
09/data                                     49165     0          Y       4533 
Quota Daemon on localhost                   N/A       N/A        Y       4041 
Quota Daemon on gluster03.FQDN              N/A       N/A        Y       701  
Quota Daemon on gluster04.FQDN              N/A       N/A        Y       810  
Quota Daemon on gluster05.FQDN              N/A       N/A        Y       3011 
Quota Daemon on gluster01                   N/A       N/A        Y       4393 
 
Task Status of Volume $vol1
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 326d0a79-98e7-4e7a-9ae1-6fc5e33663ae
Status               : failed              
 
Status of volume: $vol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster02:/srv/glusterfs/bricks/SRV_C
LOUD_201/data                               49161     0          Y       4143 
 
Task Status of Volume $vol2
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: $vol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster02:/srv/glusterfs/bricks/SRV_H
OME_201/data                                49162     0          Y       4152 
Quota Daemon on localhost                   N/A       N/A        Y       4041 
Quota Daemon on gluster04.FQDN              N/A       N/A        Y       810  
Quota Daemon on gluster03.FQDN              N/A       N/A        Y       701  
Quota Daemon on gluster01                   N/A       N/A        Y       4393 
Quota Daemon on gluster05.FQDN              N/A       N/A        Y       3011 
 
Task Status of Volume $vol3
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: $vol4
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster02:/srv/glusterfs/bricks/SRV_S
LURM_201/data                               49163     0          Y       4161 
 
Task Status of Volume $vol4
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: $vol5
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster05.FQDN:/srv/glusterfs/bricks/TEST001/data                N/A       N/A        N       N/A  
 
Task Status of Volume $vol5
------------------------------------------------------------------------------
There are no active volume tasks
 
Volume TEST_DISPERSED is not started


Volumes 1,2,3 and 4 are productive. We haven't recieved any reports about errors on 2-4. They are small though and don't have a high load. We don't know whether they are affected or not.

4. see attachments

5. The volume is part of a cluster that does genome analysis. I'm afraid I can't publish the complete path, it contains sensitive information. But it only contains alphanumeric symbols, "." and "_".

1st file:
root@gluster02:~# getfattr -d -m - /srv/glusterfs/bricks/DATA202/data/$PATH/sd.bin
# file: srv/glusterfs/bricks/DATA202/data/$PATH/sd.bin
trusted.gfid=0s85TbqbmpQoG/3BV5LbJwxg==
trusted.gfid2path.6ccfa9a95c18c513="3847d58a-0225-4be2-8ba6-a7fcaf16dcf2/sd.bin"
trusted.glusterfs.quota.3847d58a-0225-4be2-8ba6-a7fcaf16dcf2.contri.1=0sAAAAAAAAfgAAAAAAAAAAAQ==
trusted.pgfid.3847d58a-0225-4be2-8ba6-a7fcaf16dcf2=0sAAAAAQ==

root@gluster02:~# stat /srv/glusterfs/bricks/DATA202/data/$PATH/sd.bin 
  File: /srv/glusterfs/bricks/DATA202/data/$PATH/sd.bin
  Size: 32058     	Blocks: 72         IO Block: 4096   regular file
Device: fe11h/65041d	Inode: 34550135635  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1029/ $user)   Gid: ( 1039/$group)
Access: 2018-04-24 16:53:47.688932475 +0200
Modify: 2018-03-27 09:11:01.000000000 +0200
Change: 2018-04-24 13:32:26.357256496 +0200
 Birth: -

2nd file:
root@gluster02:~# getfattr -d -m - /srv/glusterfs/bricks/DATA202/data/$PATH/pairtable.bin 
# file: srv/glusterfs/bricks/DATA202/data/$PATH/pairtable.bin
trusted.gfid=0sGGS421fzQpquDiz3KTaO1g==
trusted.gfid2path.5b44f1b5ab80e888="3847d58a-0225-4be2-8ba6-a7fcaf16dcf2/pairtable.bin"
trusted.glusterfs.quota.3847d58a-0225-4be2-8ba6-a7fcaf16dcf2.contri.1=0sAAAAAAAABgAAAAAAAAAAAQ==
trusted.pgfid.3847d58a-0225-4be2-8ba6-a7fcaf16dcf2=0sAAAAAQ==

root@gluster02:~# stat /srv/glusterfs/bricks/DATA202/data/$PATH/pairtable.bin 
  File: /srv/glusterfs/bricks/DATA202/data/$PATH/pairtable.bin
  Size: 1054      	Blocks: 16         IO Block: 4096   regular file
Device: fe11h/65041d	Inode: 34550135634  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1029/ $user)   Gid: ( 1039/$group)
Access: 2018-04-24 13:29:51.615393077 +0200
Modify: 2018-03-27 09:11:04.000000000 +0200
Change: 2018-04-24 13:32:26.357256496 +0200
 Birth: -

3rd file
root@gluster02:~# getfattr -d -m - /srv/glusterfs/bricks/DATA201/data/$PATH/seqdata.bin 
# file: srv/glusterfs/bricks/DATA201/data/$PATH/seqdata.bin
trusted.gfid=0soL+uP9hOTWyo3Z3+cLOa6w==
trusted.gfid2path.91ad63dbe24d5d40="3847d58a-0225-4be2-8ba6-a7fcaf16dcf2/seqdata.bin"
trusted.glusterfs.quota.3847d58a-0225-4be2-8ba6-a7fcaf16dcf2.contri.1=0sAAAAAAAJ+AAAAAAAAAAAAQ==
trusted.pgfid.3847d58a-0225-4be2-8ba6-a7fcaf16dcf2=0sAAAAAQ==

root@gluster02:~# stat /srv/glusterfs/bricks/DATA201/data/$PATH/seqdata.bin
  File: /srv/glusterfs/bricks/DATA201/data/$PATH/seqdata.bin
  Size: 653142    	Blocks: 1288       IO Block: 4096   regular file
Device: fe10h/65040d	Inode: 34385264557  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1029/ $user)   Gid: ( 1039/$group)
Access: 2018-04-24 16:53:29.588711695 +0200
Modify: 2018-03-27 09:11:03.000000000 +0200
Change: 2018-04-24 13:32:26.357256496 +0200
 Birth: -



There's another thing that happend. We started the rebalance and, as you can see, it failed on gluster02. This is the part of the log where it failed:

[2018-04-24 18:55:08.253990] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-$vol1-client-7: server $IP_gluster02:49159 has not responded in the last 42 seconds, disconnecting.
[2018-04-24 18:55:08.254210] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-$vol1-client-7: disconnected from $vol1-client-7. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-24 18:55:08.254260] W [MSGID: 109073] [dht-common.c:9315:dht_notify] 0-$vol1-dht: Received CHILD_DOWN. Exiting
[2018-04-24 18:55:08.254283] I [MSGID: 109029] [dht-rebalance.c:5283:gf_defrag_stop] 0-: Received stop command on rebalance
[2018-04-24 18:55:08.254620] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f5ca82e3b6e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f5ca80aa111] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5ca80aa23e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f5ca80ab8d1] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7f5ca80ac3f8] ))))) 0-$vol1-client-7: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-04-24 18:54:25.704755 (xid=0x4150a5)
[2018-04-24 18:55:08.254651] W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-$vol1-client-7: remote operation failed [Transport endpoint is not connected]
[2018-04-24 18:55:08.254740] E [MSGID: 109023] [dht-rebalance.c:1820:dht_migrate_file] 0-$vol1-dht: Migrate file failed: /$PATH1/file1: failed to migrate data
[2018-04-24 18:55:08.254807] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-$vol1-client-7:  (7d4a7dd7-db43-428f-9618-add08088d7bb) remote_fd is -1. EBADFD [File descriptor in bad state]
[2018-04-24 18:55:08.254836] E [MSGID: 109023] [dht-rebalance.c:1459:__dht_migration_cleanup_src_file] 0-$vol1-dht: Migrate file cleanup failed: failed to fstat file /$PATH1/file1 on $vol1-client-7  [File descriptor in bad state]
[2018-04-24 18:55:08.254853] W [MSGID: 109023] [dht-rebalance.c:2275:dht_migrate_file] 0-$vol1-dht: /$PATH1/file1: failed to cleanup source file on $vol1-client-7
[2018-04-24 18:55:08.254870] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7f5ca82e3b6e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f5ca80aa111] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5ca80aa23e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f5ca80ab8d1] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7f5ca80ac3f8] ))))) 0-$vol1-client-7: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2018-04-24 18:54:26.249541 (xid=0x4150a6)
[2018-04-24 18:55:08.254898] W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-$vol1-client-7: socket disconnected
[2018-04-24 18:55:14.862395] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-$vol1-client-7: remote operation failed [Transport endpoint is not connected]
[2018-04-24 18:55:14.862493] W [MSGID: 109023] [dht-rebalance.c:2300:dht_migrate_file] 0-$vol1-dht: /$PATH1/file1: failed to unlock file on $vol1-client-7 [Transport endpoint is not connected]
[2018-04-24 18:55:14.862585] E [MSGID: 109023] [dht-rebalance.c:2790:gf_defrag_migrate_single_file] 0-$vol1-dht: migrate-data failed for /$PATH1/file1 [Transport endpoint is not connected]
[2018-04-24 18:55:14.862626] W [dht-rebalance.c:3397:gf_defrag_process_dir] 0-$vol1-dht: Found error from gf_defrag_get_entry
[2018-04-24 18:55:14.863078] E [MSGID: 109111] [dht-rebalance.c:3914:gf_defrag_fix_layout] 0-$vol1-dht: gf_defrag_process_dir failed for directory: /$PATH2
[2018-04-24 18:55:16.492243] W [MSGID: 114061] [client-common.c:1197:client_pre_readdirp] 0-$vol1-client-7:  (12c968a9-4d43-4746-9c16-2e3671b87dd7) remote_fd is -1. EBADFD [File descriptor in bad state]
[2018-04-24 18:55:18.256351] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-$vol1-client-7: changing port to 49159 (from 0)
[2018-04-24 18:55:18.256828] I [MSGID: 114057] [client-handshake.c:1478:select_server_supported_programs] 0-$vol1-client-7: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-04-24 18:55:18.257718] I [MSGID: 114046] [client-handshake.c:1231:client_setvolume_cbk] 0-$vol1-client-7: Connected to $vol1-client-7, attached to remote volume '/srv/glusterfs/bricks/DATA208/data'.
[2018-04-24 18:55:18.257753] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-$vol1-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2018-04-24 18:55:18.257771] I [MSGID: 114042] [client-handshake.c:1047:client_post_handshake] 0-$vol1-client-7: 9 fds open - Delaying child_up until they are re-opened
[2018-04-24 18:55:18.258105] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:00000000-0000-0000-0000-000000000001> succeeded (fd = 0)
[2018-04-24 18:55:18.258152] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:30abdc78-e85b-43fb-aac1-df9be4facf8e> succeeded (fd = 1)
[2018-04-24 18:55:18.258193] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:87541f65-770a-4cc3-89ab-19f6d0e98aa5> succeeded (fd = 4)
[2018-04-24 18:55:18.258222] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:9faf9889-0419-4e1e-ade1-2929a8575ce2> succeeded (fd = 5)
[2018-04-24 18:55:18.258248] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:14d6f9bc-5756-444d-86b7-a55d64753ca7> succeeded (fd = 6)
[2018-04-24 18:55:18.258272] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:ba3dbaf6-5774-416d-956c-483ddb514f42> succeeded (fd = 2)
[2018-04-24 18:55:18.258328] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:12c968a9-4d43-4746-9c16-2e3671b87dd7> succeeded (fd = 3)
[2018-04-24 18:55:18.258442] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-$vol1-client-7: reopendir on <gfid:14d6f9bc-5756-444d-86b7-a55d64753ca7> succeeded (fd = 6)
[2018-04-24 18:55:18.258541] I [MSGID: 114041] [client-handshake.c:678:client_child_up_reopen_done] 0-$vol1-client-7: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2018-04-24 18:55:18.258659] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-$vol1-client-7: Server lk version = 1
[2018-04-24 18:55:20.089490] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-$vol1-dht: fixing the layout of /dir1/dir2/dir3/dir4/dir5
[2018-04-24 18:56:02.823510] I [dht-rebalance.c:3223:gf_defrag_process_dir] 0-$vol1-dht: migrate data called on /dir1/dir2/dir3/dir4/dir5
[2018-04-24 18:56:02.854207] W [dht-rebalance.c:3397:gf_defrag_process_dir] 0-$vol1-dht: Found error from gf_defrag_get_entry
[2018-04-24 18:56:02.854759] E [MSGID: 109111] [dht-rebalance.c:3914:gf_defrag_fix_layout] 0-$vol1-dht: gf_defrag_process_dir failed for directory: /dir1/dir2/dir3/dir4/dir5
[2018-04-24 18:56:02.855041] E [MSGID: 109016] [dht-rebalance.c:3851:gf_defrag_fix_layout] 0-$vol1-dht: Fix layout failed for /dir1/dir2/dir3/dir4
[2018-04-24 18:56:02.855225] E [MSGID: 109016] [dht-rebalance.c:3851:gf_defrag_fix_layout] 0-$vol1-dht: Fix layout failed for /dir1/dir2/dir3
[2018-04-24 18:56:02.855565] E [MSGID: 109016] [dht-rebalance.c:3851:gf_defrag_fix_layout] 0-$vol1-dht: Fix layout failed for /dir1/dir2
[2018-04-24 18:56:02.855760] E [MSGID: 109016] [dht-rebalance.c:3851:gf_defrag_fix_layout] 0-$vol1-dht: Fix layout failed for /dir1
[2018-04-24 18:59:19.254438] I [MSGID: 109022] [dht-rebalance.c:2218:dht_migrate_file] 0-$vol1-dht: completed migration of /$PATH3/file3 from subvolume $vol1-client-0 to $vol1-client-1
[2018-04-24 18:59:19.256074] I [MSGID: 109028] [dht-rebalance.c:5097:gf_defrag_status_get] 0-$vol1-dht: Rebalance is failed. Time taken is 120378.00 secs
[2018-04-24 18:59:19.256119] I [MSGID: 109028] [dht-rebalance.c:5101:gf_defrag_status_get] 0-$vol1-dht: Files migrated: 434664, size: 21939317004280, lookups: 935269, failures: 8, skipped: 166223
[2018-04-24 18:59:19.256371] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f5ca755b494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x5644710ead45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x5644710eaba4] ) 0-: received signum (15), shutting down

client-7 seems to be responsable for the brick DATA208 on gluster02. His log contains this line at that time:

[2018-04-24 17:19:33.929281] E [MSGID: 113001] [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$vol1-posix: setxattr failed on /srv/glusterfs/bricks/DATA208/data/.glusterfs/ac/24/ac246c06-bd39-4799-bdbe-7fba9beb4fb7 while doing xattrop: key=trusted.glusterfs.quota.33157353-a842-48ac-8f84-e0cc55a59eae.contri.1 [No such file or directory]
[2018-04-24 18:50:33.091715] E [MSGID: 113001] [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$vol1-posix: setxattr failed on /srv/glusterfs/bricks/DATA208/data/.glusterfs/33/c4/33c4ca2e-63cd-4ab6-b56d-b95bb085b9b3 while doing xattrop: key=trusted.glusterfs.quota.8bc4b5a3-0792-429b-878a-7bcfba5d8360.contri.1 [No such file or directory]
[2018-04-24 18:56:02.744587] W [socket.c:593:__socket_rwv] 0-tcp.$vol1-server: writev on $IP_gluster02:49057 failed (Broken pipe)
[2018-04-24 18:56:02.744742] W [inodelk.c:499:pl_inodelk_log_cleanup] 0-$vol1-server: releasing lock on 7d4a7dd7-db43-428f-9618-add08088d7bb held by {client=0x7f38600ba190, pid=-3 lk-owner=fdffffff}
[2018-04-25 04:38:35.718259] A [MSGID: 120004] [quota.c:4998:quota_log_usage] 0-$vol1-quota: Usage is above soft limit: 199.7TB used by /$some_dir


I think the brick somehow lost connection so something. Not sure what port 49057 was used for though. We can't find anything in the logs and it's currently not used according to "netstat".

We're also starting to see errors in the brick logs (all bricks) that look like this:

[2018-04-26 11:43:42.244821] W [marker-quota.c:33:mq_loc_copy] 0-marker: src loc is not valid
[2018-04-26 11:43:42.244854] E [marker-quota.c:1488:mq_initiate_quota_task] 0-$vol1-marker: loc copy failed
[2018-04-26 11:43:34.752298] W [MSGID: 113001] [posix.c:4430:posix_get_ancestry_non_directory] 0-$vol1-posix: listxattr failed on/srv/glusterfs/bricks/DATA208/data/.glusterfs/75/79/757961cd-4348-41fa-93cb-2a681f87af96 [No such file or directory]
[2018-04-26 11:43:42.245003] W [MSGID: 113001] [posix.c:4430:posix_get_ancestry_non_directory] 0-$vol1-posix: listxattr failed on/srv/glusterfs/bricks/DATA208/data/.glusterfs/e2/8a/e28a69cc-e23a-43ab-998c-f41ef77212b5 [No such file or directory]

Comment 9 g.amedick 2018-04-27 13:35:10 UTC

Created attachment 1427665 [details]
The requested tcpdump

Comment 10 g.amedick 2018-04-27 13:35:45 UTC

Created attachment 1427666 [details]
mount log of the volume

Comment 11 Nithya Balachandran 2018-04-30 03:23:49 UTC

Hi,
Please capture the tcpdump using 
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22



I need to be able to examine the packets.

Comment 12 Nithya Balachandran 2018-04-30 03:26:38 UTC

There is nothing in the mount logs to indicate a problem.

The rebalance process failed because it could not contact $IP_gluster02:49159

[2018-04-24 18:55:08.253990] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-$vol1-client-7: server $IP_gluster02:49159 has not responded in the last 42 seconds, disconnecting.
[2018-04-24 18:55:08.254210] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-$vol1-client-7: disconnected from $vol1-client-7. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-24 18:55:08.254260] W [MSGID: 109073] [dht-common.c:9315:dht_notify] 0-$vol1-dht: Received CHILD_DOWN. Exiting


This is the expected rebalance behaviour. You need to check if there is any reason why this brick is not reachable. Does it happen if you try the rebalance again?

Comment 13 g.amedick 2018-04-30 15:37:26 UTC

We restarted the rebalance. It'll take a while though (est time 50 hrs). We'll report the outcome

The bricks actually are virtual discs provided by a big storage system. The storage reports no errors (including no loss of connectivity or harddrive failure).
We didn't touch the brick process at all (actually, we weren't even present, it was late in the evening). It recovered on its own.
Port 49159 on gluster02 belongs to brick DATA208. The port was open when we came to work the next day. The brick was up and running. The glusterd-log showed nothing about having lost a brick, just the failed rebalance:

[2018-04-24 18:59:19.256333] I [MSGID: 106172] [glusterd-handshake.c:1014:__server_event_notify] 0-glusterd: received defrag status updated
[2018-04-24 18:59:19.263291] W [socket.c:593:__socket_rwv] 0-management: readv on /var/run/gluster/gluster-rebalance-0d210c70-e44f-46f1-862c-ef260514c9f1.sock failed (No data available)
[2018-04-24 18:59:19.266258] I [MSGID: 106007] [glusterd-rebalance.c:158:__glusterd_defrag_notify] 0-management: Rebalance process for volume $vol1 has disconnected.

That's the complete log of that day.


For some reason, DATA208 tried to connect to port 49057:
[2018-04-24 18:56:02.744587] W [socket.c:593:__socket_rwv] 0-tcp.$vol1-server: writev on $IP_gluster02:49057 failed (Broken pipe)

We are unsure why. There's nothing listening:

$ netstat -tulpen | grep 49057

$ netstat -tulpen | grep gluster
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      0          24130      4064/glusterfsd     
tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      0          18881      4072/glusterfsd     
tcp        0      0 0.0.0.0:49154           0.0.0.0:*               LISTEN      0          19775      4080/glusterfsd     
tcp        0      0 0.0.0.0:49155           0.0.0.0:*               LISTEN      0          26969      4090/glusterfsd     
tcp        0      0 0.0.0.0:49156           0.0.0.0:*               LISTEN      0          45238      4098/glusterfsd     
tcp        0      0 0.0.0.0:49157           0.0.0.0:*               LISTEN      0          46649      4107/glusterfsd     
tcp        0      0 0.0.0.0:49158           0.0.0.0:*               LISTEN      0          1440       4116/glusterfsd     
tcp        0      0 0.0.0.0:49159           0.0.0.0:*               LISTEN      0          18417      4125/glusterfsd     
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      0          15592      3873/glusterd       
tcp        0      0 0.0.0.0:49160           0.0.0.0:*               LISTEN      0          19785      4134/glusterfsd     
tcp        0      0 0.0.0.0:49161           0.0.0.0:*               LISTEN      0          36104      4143/glusterfsd     
tcp        0      0 0.0.0.0:49162           0.0.0.0:*               LISTEN      0          72783      4152/glusterfsd     
tcp        0      0 0.0.0.0:49163           0.0.0.0:*               LISTEN      0          38236      4161/glusterfsd    

We don't know why the rebalance failed. It's the first time that something like this happened. And we don't understand the brick log.


We need to discuss uploading the pcap-file with our supervisor, since it contains our IP's. Is there a way to give it to you without making it public? 

There's something else that happened today:

A user reported she wanted to create a smylink with an absolute path to some file. There was no error message (in fact, the mount log reported Success), but the symlink lead to nowhere. The volume usually is mounted as /data, on all compute nodes with the /data-mount, creating a symling to this file didn't work. The new mount I created at /mnt however could do the symlink. The Systemd-mount-unit literally is copied except for the mount point. A server with both mount points (/data and /mnt) could do the smylink on the /mnt- mount point but not at /data. Relative paths however work fine. It looks like this:

$ ls -lah
lrwxrwxrwx  1 root    itsc_test_proj2  120 Apr 30 15:25 test1.gz -> /mnt/$PATH/$file.gz
lrwxrwxrwx  1 root    itsc_test_proj2  121 Apr 30 15:47 test2.gz -> 
lrwxrwxrwx  1 root    itsc_test_proj2  120 Apr 30 15:48 test3.gz -> /mnt/$PATH/$file.gz
lrwxrwxrwx  1 root    itsc_test_proj2  118 Apr 30 16:05 test4.gz -> ../$PATH/$file.gz
lrwxrwxrwx  1 root    itsc_test_proj2  119 Apr 30 16:06 test5.gz ->
lrwxrwxrwx  1 root    itsc_test_proj2  121 Apr 30 16:08 test6.gz -> 
lrwxrwxrwx  1 root    itsc_test_proj2  121 Apr 30 16:08 test7.gz -> 
lrwxrwxrwx  1 root    itsc_test_proj2  120 Apr 30 15:48 test8.gz -> /mnt/$PATH/$file.gz

Creation of the symlinks:
test1.gz & test3.gz via "cd /mnt; ln -s /mnt/$PATH/$file.gz test$x.gz"
test2.gz, test5.gz & test6.gz via "cd /data; ln -s /data/$PATH/$file.gz test$x.gz"
test4.gz via "cd /data; ln -s ../$PATH/$file.gz test$x.gz"
test7.gz via "cd /mnt; ln -s /data/$PATH/$file.gz test$x.gz"
test8.gz via "cd /data; ln -s /mnt/$PATH/$file.gz test$x.gz"

This was reproducible.

We know that the /mnt-mount point is not completely fine either, since the hidden files we used to create the logs were hidden there, too. Still, the mounts behave different. Symlinks with an absolute path pointing on /data aren't created correctly. Following the strange symlinks with zcat produces an error:

$ zcat test7.gz | head
gzip: test7.gz is a directory -- ignored

All links, including the one with a relative link pointing to /data, can be used as usual.

Comment 14 Nithya Balachandran 2018-05-02 11:19:21 UTC

> We need to discuss uploading the pcap-file with our supervisor, since it
> contains our IP's. Is there a way to give it to you without making it
> public? 


Would it be possible to share it using Google drive?

Comment 15 Frank Rühlemann 2018-05-02 12:42:10 UTC

(In reply to Nithya Balachandran from comment #14)
> > We need to discuss uploading the pcap-file with our supervisor, since it
> > contains our IP's. Is there a way to give it to you without making it
> > public? 
> 
> 
> Would it be possible to share it using Google drive?

I sent you an e-mail with the file.

Comment 16 Frank Rühlemann 2018-05-04 13:20:02 UTC

(In reply to Nithya Balachandran from comment #12)

> This is the expected rebalance behaviour. You need to check if there is any
> reason why this brick is not reachable. Does it happen if you try the
> rebalance again?

The second try finished without problems.

> [2018-05-02 15:50:21.572882] I [MSGID: 109028] [dht-rebalance.c:5097:gf_defrag_status_get] 0-$VOLUME-dht: Rebalance is completed. Time taken is 203329.00 secs
> [2018-05-02 15:50:21.572915] I [MSGID: 109028] [dht-rebalance.c:5101:gf_defrag_status_get] 0-$VOLUME-dht: Files migrated: 653637, size: 53966747687482, lookups: 3303019, failures: 0, skipped: 262197

Comment 17 g.amedick 2018-05-08 11:31:50 UTC

Hi,

the rebalance finished on the second server. No brick loss this time, but we got two failures. It looks like he tried to move a directory:

[2018-05-03 16:04:21.886030] I [dht-rebalance.c:1570:dht_migrate_file] 0-$vol1-dht: /$path/$dir: attempting to move from §vol1-client-2 to §vol1-client-1
[2018-05-03 16:04:21.886133] I [dht-rebalance.c:1570:dht_migrate_file] 0-§vol1-dht: /$path/$dir: attempting to move from §vol1-client-2 to §vol1-client-1
[2018-05-03 16:04:21.904481] W [MSGID: 109023] [dht-rebalance.c:626:__is_file_migratable] 0-§vol1-dht: Migrate file failed:/$path/$dir: migrate-file called on directory
[2018-05-03 16:04:21.904558] W [MSGID: 109023] [dht-rebalance.c:626:__is_file_migratable] 0-§vol1-dht: Migrate file failed:/$path/$dir: migrate-file called on directory
[2018-05-03 16:04:21.904711] E [MSGID: 109023] [dht-rebalance.c:2790:gf_defrag_migrate_single_file] 0-$vol1-dht: migrate-data failed for /$path/$dir [Is a directory]
[2018-05-03 16:04:21.904715] E [MSGID: 109023] [dht-rebalance.c:2790:gf_defrag_migrate_single_file] 0-$vol1-dht: migrate-data failed for /$path/$dir [Is a directory]


the strange symlinks fixed themselves within a few hours. Newly created ones still look broken. When I checked on the bricks, the links were broken there too. 
readlink also shows an empty link on both brick and mounted volume.

Comment 18 Frank Rühlemann 2018-05-23 07:39:08 UTC

Hi,

we did a kernel downgrade back to 4.9.0-5 on all Gluster servers a few days ago.

This is the kernel that was running before the errors were reported for the first time.

This is what we noticed so far:

The symlink anomaly is still reproducable.

Error messages are still present during the rebalance (started after all servers booted with 4.9.0-5), but much less and only every few seconds or minutes (instead of multiple per second, as before the downgrade).

For example (Volume name replaced with $VOLUME):
May 23 05:34:00 gluster02 srv-glusterfs-bricks-DATA209-data[4145]: [2018-05-23 03:34:00.055879] E [MSGID: 113001] [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$VOLUME_DATA-posix: setxattr failed on /srv/glusterfs/bricks/DATA209/data/.glusterfs/1d/5e/1d5e9895-97c5-4099-869e-3d86cdc7486e while doing xattrop: key=trusted.glusterfs.quota.ff8583fd-1452-4939-85c9-7b402ea85968.contri.1 [No such file or directory]

Comment 19 Worker Ant 2018-06-26 05:13:34 UTC

REVIEW: https://review.gluster.org/20387 (storage/posix: Add warning logs on failure) posted (#1) for review on master by N Balachandran

Comment 20 Nithya Balachandran 2018-06-26 05:30:05 UTC

On analysing the tcpdumps sent, I noticed the following:

Readdirp returns entries without iatt information from 3 bricks of the volume (the ones that contained the "hidden" files). This is the case for all entries on that brick and for both files and directories.

dht_readdirp_cbk finds an invalid iatt type and strips out the entries which is why they are not listed even though they exist on the bricks.

From the code, a possible cause is the call to posix_handle_path() failing in posix_readdirp_fill(). posix_readdirp_fill will not fill the iatt information in this case but the entries are still returned to the client.

Comment 27 Worker Ant 2018-07-02 19:33:22 UTC

COMMIT: https://review.gluster.org/20387 committed in master by "Raghavendra Bhat" <raghavendra> with a commit message- storage/posix: Add warning logs on failure

posix_readdirp_fill will fail to update the iatt
information if posix_handle_path fails.
There is currently no log message to indicate this making
debugging difficult.

Change-Id: I6bce360ea7d1696501637433f80e02794fe1368f
updates: bz#1564071
Signed-off-by: N Balachandran <nbalacha>

Comment 30 Nithya Balachandran 2018-07-10 05:35:15 UTC

There are several messages of the type seen in the bricks:

Found stale gfid handle
/dir1/pool1/brick1/myvol/.glusterfs/86/13/8613f6e0-3171-4191-8b42-d8c8063ffbce,
removi ]


seen in the brick logs . This was introduced in https://review.gluster.org/#/c/19070/. This needs to be modified in order to skip directories.

Comment 31 Worker Ant 2018-07-16 12:20:11 UTC

REVIEW: https://review.gluster.org/20509 (posix: check before removing stale symlink) posted (#2) for review on master by Ravishankar N

Comment 32 Worker Ant 2018-07-19 05:06:26 UTC

COMMIT: https://review.gluster.org/20509 committed in master by "jiffin tony Thottan" <jthottan> with a commit message- posix: check before removing stale symlink

BZ 1564071 complains of directories with missing gfid symlinks and
corresponding "Found stale gfid handle" messages in the logs. Hence
add a check to see if the symlink points to an actual directory before
removing it.

Note: Removing stale symlinks was added via commit
3e9a9c029fac359477fb26d9cc7803749ba038b2

Change-Id: I5d91fab8e5f3a621a9ecad4a1f9c898a3c2d346a
Updates: bz#1564071
Signed-off-by: Ravishankar N <ravishankar>

Comment 33 Ravishankar N 2018-07-19 06:56:03 UTC

I'm changing the version to mainline  as the patches sent on master used this BZ ID (by mistake). Will clone it to other release versions to send back ports.

Also, Nithya tells me there might be further patches, so moving the state to POST.

Comment 35 Shyamsundar 2018-10-23 15:06:59 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.