Bug 884280

Summary: distributed volume - rebalance doesn't finish - getdents stuck in loop
Product: [Community] GlusterFS Reporter: S.Knoth <jarl1337>
Component: unclassifiedAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, sgowda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-25 05:37:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description S.Knoth 2012-12-05 19:44:19 UTC
Description of problem:

2x 100GB distributed over two nodes
added 2x 30GB distributed over same two nodes
~1% filled

rebalance fix-layout doesn't finish (I left it running for a week)

read accesses to directories in the volume hang (or actually stuck in a loop)
for example: "ls -l" on the mounted volume (process in a loop over getdents and lstat with memory usage constantly rising until out of memory)
See "strace" output bellow: "ls -l" loops over the same 8 entries over and over again.
Although directory listings hang, read/write access directly to a file works.


Version-Release number of selected component (if applicable):
glusterfs 3.3.1 on CentOS 6.3


How reproducible:
hard to say since I've done a lot of other testing with the same volume before, I'm not sure what could have caused the observed behavior

Steps to Reproduce:
1. build same volume configuration: 2x100GB distributed over two nodes
2. use something like a ~900MB file, kernel sources zipped and also extracted (-> a lot of small files), and a few other files some around a few MBs as content
3. add two 30GB bricks to volume also one on each node and start a rebalance fix-layout (if it finishes successfully something is different to my case)
4. mount the volume, run "ls -l" on mountpoint (maybe even while rebalance runs, should work, shouldn't it?) (if it works normally something is different to my case)

  
Actual results:
as stated above: rebalance doesn't finish, directory listings stuck in a loop over same entries from getdents with rising memory consumption

Expected results:
rebalance fix-layout expected to finish quickly (24h at very most, normally at most a few minutes), access to directories and files on the volume should be possible at all times without problems


Additional info:

part of "strace" output of "ls -l" on the mounted volume (/mnt/test is where the glusterfs volume is mounted):

[...]
getdents64(3, /* 9 entries */, 32768)   = 432
lstat64("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=553408, ...}) = 0
lgetxattr("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819d900, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/big_buck_bunny_1080p_surround.avi", {st_mode=S_IFREG|0644, st_size=928670754, ...}) = 0
lgetxattr("/mnt/test/big_buck_bunny_1080p_surround.avi", "security.selinux", 0x819d930, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/big_buck_bunny_1080p_surround.avi", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/neu4", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
lgetxattr("/mnt/test/neu4", "security.selinux", 0x819d958, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/neu4", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", {st_mode=S_IFREG|0644, st_size=11075050, ...}) = 0
lgetxattr("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", "security.selinux", 0x819d968, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/lost+found", {st_mode=S_IFDIR|0700, st_size=65536, ...}) = 0
lgetxattr("/mnt/test/lost+found", "security.selinux", 0x819d9a0, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/lost+found", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=106684, ...}) = 0
lgetxattr("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819d9b0, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=65808, ...}) = 0
lgetxattr("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819d9e8, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/neu6", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
lgetxattr("/mnt/test/neu6", "security.selinux", 0x819da18, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/neu6", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
getdents64(3, /* 9 entries */, 32768)   = 432
lstat64("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=553408, ...}) = 0
lgetxattr("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819da28, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/big_buck_bunny_1080p_surround.avi", {st_mode=S_IFREG|0644, st_size=928670754, ...}) = 0
lgetxattr("/mnt/test/big_buck_bunny_1080p_surround.avi", "security.selinux", 0x819da58, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/big_buck_bunny_1080p_surround.avi", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/neu4", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
lgetxattr("/mnt/test/neu4", "security.selinux", 0x819da80, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/neu4", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", {st_mode=S_IFREG|0644, st_size=11075050, ...}) = 0
lgetxattr("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", "security.selinux", 0x819da90, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/lost+found", {st_mode=S_IFDIR|0700, st_size=65536, ...}) = 0
lgetxattr("/mnt/test/lost+found", "security.selinux", 0x819dac8, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/lost+found", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=106684, ...}) = 0
lgetxattr("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819dad8, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=65808, ...}) = 0
lgetxattr("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819db10, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/neu6", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
lgetxattr("/mnt/test/neu6", "security.selinux", 0x819db40, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/neu6", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
getdents64(3, /* 9 entries */, 32768)   = 432
lstat64("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=553408, ...}) = 0
lgetxattr("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819db50, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-server-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/big_buck_bunny_1080p_surround.avi", {st_mode=S_IFREG|0644, st_size=928670754, ...}) = 0
lgetxattr("/mnt/test/big_buck_bunny_1080p_surround.avi", "security.selinux", 0x819db80, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/big_buck_bunny_1080p_surround.avi", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/neu4", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
lgetxattr("/mnt/test/neu4", "security.selinux", 0x819dba8, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/neu4", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", {st_mode=S_IFREG|0644, st_size=11075050, ...}) = 0
lgetxattr("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", "security.selinux", 0x819dbb8, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-common_3.3.1-ubuntu1~precise1_i386.deb", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/lost+found", {st_mode=S_IFDIR|0700, st_size=65536, ...}) = 0
lgetxattr("/mnt/test/lost+found", "security.selinux", 0x819dbf0, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/lost+found", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=106684, ...}) = 0
lgetxattr("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819dc00, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-geo-replication-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=65808, ...}) = 0
lgetxattr("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", "security.selinux", 0x819dc38, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/glusterfs-fuse-3.3.1-1.el6.x86_64.rpm", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat64("/mnt/test/neu6", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
lgetxattr("/mnt/test/neu6", "security.selinux", 0x819dc68, 255) = -1 ENODATA (No data available)
lgetxattr("/mnt/test/neu6", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[...]


not much to find in the logs that I can associate with this issue:
- mnt-test.log: "stale nfs handle of /lost+found" (-> but I'm not using nfs) and "gfid different /lost+found"
- testvolume-rebalance.log: just a log of "Rebalance is in progress" and "Files migrated: 0, size: 0, lookups: 0, failures: 0" while doing rebalance fix-layout



When I try to remove the 2x 30GB from the volume. The process started by "gluster volume remove-brick testvolume node1:/brick2 node2:/brick2 start" also doesn't finish.
Then I stop the process and remove those bricks without "start", risking data loss. (But no problem since it is just a testvolume and files were never redistributed onto the new bricks since rebalance fix-layout doesn't even work.)
Then I do a rebalance fix-layout on the volume with just the 2x 100GB and it works and finishes quickly. Directory access also works correctly.
Then I add the two 30GB bricks again and same things happen as described above.

When I add just one of those 30GB bricks it works. When I then add the second it doesn't. When I then remove one of them, it works again. When I add a fifth brick it also works. What's the problem with the number four (2x100GB 2x30GB)?

The bricks on each node are just partitions on one hard disk. Could that cause problems?

Comment 1 shishir gowda 2012-12-06 02:55:14 UTC
What is the backend fs used?

Comment 2 S.Knoth 2012-12-06 15:13:52 UTC
The backend FS is ext4.

Comment 3 shishir gowda 2012-12-07 03:57:54 UTC
The recommended backend fs is XFS. Your issue is related to bug 838784. Can you please confirm it? I will mark this as a duplicate of the bug

Comment 4 S.Knoth 2012-12-19 20:24:01 UTC
When I run the test program for bug 838784 the output contains 64 bit values which means my system is affected.
When I use XFS it seems to work.

I'm not sure if that is sufficient to confirm that it's a duplicate of bug 838784. But it is definitely related. I don't understand why it only failed when there were four bricks (2x 100GB 2x 30GB) and worked otherwise.

It's a pity that there are problems with glusterfs with ext4 as backend FS. I don't quite understand what the problem is. Isn't it normal that filesystems like ext4 return 64 bit values nowadays, especially on x86_64? Although I don't understand why ext4 returns those crazy numbers as d_off values (but I would guess there is a reason for that). I guess 64 bit values (which can be at least potentially such big numbers) will always bear the risk of overflow, if the dht_itransform function doesn't avoid that in its calculations. And off_t is defined as 64 bit unsigned integer, isn't it?

Comment 5 Vijay Bellur 2013-07-25 05:37:38 UTC
Fixed in 3.3.2 and 3.4.0.