Description of problem: The test is about "Peer probing while rebalance is in progress". Though peer probe works fine when rebalance is going on , but there is failure seen in rebalance status. BVT ran twice on the same build and this test case passed once and failed once. However I couldn't find any failure message in the rebalance logs. Version-Release number of selected component (if applicable): glusterfs-server-3.6.0.17-1.el6rhs.x86_64 How reproducible: Intermittently Steps to Reproduce: 1. Create a dht gluster volume with 2 bricks. 2. Mount it on a client and create data on the mount point. qeCreateData --small-files $MOUNT_POINT 10000 10K qeCreateData --medium-files $MOUNT_POINT 1000 10M mkdir -p ${MOUNT_POINT}/large for i in `seq 1 5` do rlRun "dd if=/dev/zero of=${MOUNT_POINT}/large/${i}.large bs=1M count=10000" done 3. Add a new brick using "gluster volume add-brick" 4. Start the reblance process . "gluster volume rebalance $VOL_NAME start" 5. Perform a peer probe for a new node. "gluster peer probe $NEW_HOST" 6. Check rebalance status "gluster volume rebalance $VOL_NAME status " Actual results: :: [ 04:32:26 ] :: The rebalance is complete, logging final status Node Rebalanced-files size scanned failures skipped status run time in secs -- ----------- ---- ------- -------- ------- ------ ---------------- localhost 19 190.0KB 11457 1 1827 completed 32.00 rhsauto057 1799 1.7GB 12823 0 2 completed 145.00 Expected results: Rebalance should not fail. Additional info:
Created attachment 909580 [details] Rebalance logs from the node where commands were executed
Created attachment 909582 [details] Rebalance logs from all nodes
Can you please attach glusterd log?
The statistics of the log files and cli output doesn't match. Lala, can you please upload the correct set of logs for this case?
Looking in to the automation to see why it has wrong logs. The good news is I am not seeing the issue in the recent runs.
Found similar issue in yesterday's BVT run: https://beaker.engineering.redhat.com/jobs/676309 with build: glusterfs-3.6.0.20-1.el6rhs.x86_64.rpm I can see failures in rebalance log i.e. rebalvol-rebalance.log [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle) [2014-06-22 22:50:26.336185] E [MSGID: 109023] [dht-rebalance.c:1416:gf_defrag_migrate_data] 0-rebalvol-dht: Migrate file failed:Failed to get node-uuid for /symlinks/23 [2014-06-22 22:50:26.606546] I [dht-rebalance.c:823:dht_migrate_file] 0-rebalvol-dht: /symlinks/16: attempting to move from rebalvol-client-1 to rebalvol-client-2 [2014-06-22 22:50:26.640725] E [dht-rebalance.c:1470:gf_defrag_migrate_data] 0-rebalvol-dht: /symlinks/22: failed to get trusted.distribute.linkinfo key - Stale file handle [2014-06-22 22:50:27.802194] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 23, size: 485, lookups: 211, failures: 1, skipped: 0 [2014-06-22 22:50:28.134909] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 28, size: 630, lookups: 226, failures: 1, skipped: 0 [2014-06-22 22:50:28.818404] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 43, size: 1177, lookups: 385, failures: 1, skipped: 0
Created attachment 911310 [details] Rebalance logs from node with failure
Initial analysis as follows: /symlinks/23 file migration failed with the following log: [2014-06-22 22:50:26.330680] W [MSGID: 109007] [dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup) [2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23 [2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831 [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle) And this is why the failures count was incremented by 1. As far as glusterd is concerned, it has shown the correct stats. Assigning it to the DHT team.
There is another old bug on rebalance failure bz 1053537 i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1053537. Not sure if the root cause of this bug is related with it.
Hi Atin, The problem which you mentioned is because of race in rebalance. State 1: Brick-1 File-1 State 2: Add-Brick Brick-1 Brick-2 File-1 (Cached) State 3: Lets assume rebalance on brick-2 ran first, it will do the lookup on hashed which is brick-2 and becuase file is not present on brick-2, dht_lookup_everywhere should get called. If just before sending the lookup fops over the wire, rebalance on brick-1 will get called and it will migrate the file, and sets the permission from source to sink, then at this point in time in the system there are two files and 0 link file. Now if lookup of rebalance on brick-2 proceeds, it will see two files and following log will be seen dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup) <<MKNOD WILL Fail as file is already migrated by brick-1 rebalance>> [2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23 <<As I mentioned, a non-linkfile will be present >> [2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831 <<Now just before getxattr, rebalance might have got finished, and hashed file might got have removed, and this will fail>> [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)
EDIT: Instead of hased, cached file initally present at brick-1, which got migrated to brick-2 will be removed <<Now just before getxattr, rebalance might have got finished, and "CACHED" file might got have removed from brick-1, and getxattr for node-uuid form brick-1 will fail>> [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)
Reproduced the issue again on build glusterfs-server-3.6.0.22-1. Attaching logs.
Created attachment 913381 [details] Rebalance logs from node with failure
Venkatesh, Couple of questions: 1. Why is server returning ESTALE instead of ENOENT? If its an ENOENT, can't we send another lookup to hashed subvol, to account for the fact that the file might be migrated already and treat this as a non-failure case? 2. How are we handling races b/w rebalance vs unlink from I/O Path? Are we considering file not existing because of an unlink on I/O path also as a failure? regards, Raghavendra
Since this issue is getting reproduced in almost every BVT, couldn't we fix this race asap?
This again got reproduced in today's BVT run https://beaker.engineering.redhat.com/jobs/682433
We should mark it as blocker as we are able to reproduce it almost every time on build glusterfs-server-3.6.0.22-1, which was not the case in rhs2.1
Race gets fixed with patch: http://review.gluster.org/#/c/8231/ Race Explanation: https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4 Similar Race : In comment number 10. Patch: http://review.gluster.org/#/c/8231/
Not seeing this issue on glusterfs-3.6.0.25-1 and glusterfs-3.6.0.27-1. Marking it verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html