Description of problem: Rebalance status command output shows failures. It should not show any failures as if any file does not get rebalanced because of space issue it should be in skip list. Version-Release number of selected component (if applicable): glusterfs-3.4.0.56rhs-1389602694.el6.x86_64.rpm How reproducible: Intermittent Steps to Reproduce: 1.Create a distribute volume 2. create the data i.e. symlinks on the mount point Refer "Additional info:" for code that creates the symlinks 3. Add brick 4. Start rebalance 5. Check rebalance status Actual results: Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 46 1.2KB 366 0 1 completed 3.00 rhsauto057.lab.eng.blr.redhat.com 43 1.1KB 382 1 0 completed 3.00 rhsauto022.lab.eng.blr.redhat.com 0 0Bytes 330 0 0 completed 1.00 Expected results: Additional info: Code to create symbolic links: mkdir -p $MOUNT_POINT/symlinks mkdir -p /symlinks mkdir -p $MOUNT_POINT/symlinks-gluster mkdir -p $MOUNT_POINT/symlinks-gluster-dest for i in `seq 1 100`; do echo "#!/bin/bash" > /symlinks/$i.sh echo "echo Hello World" >> /symlinks/$i.sh ln -s /symlinks/$i.sh $MOUNT_POINT/symlinks/$i echo "#!/bin/bash" >> $MOUNT_POINT/symlinks-gluster/$i.sh echo "echo Hello World" >> $MOUNT_POINT/symlinks-gluster/$i.sh ln -s $MOUNT_POINT/symlinks-gluster/$i.sh $MOUNT_POINT/symlinks-gluster-dest/$i done
Marked this bug as intermittent as this came twice in last few days BVT run. Today's Run: https://beaker.engineering.redhat.com/jobs/575364 Previous Run: https://beaker.engineering.redhat.com/jobs/572378
Created attachment 850449 [details] Rebalance logs
In the logs below error message is seen: hosdu-rebalance.log.1:[2014-01-15 18:14:31.263392] E [dht-linkfile.c:287:dht_linkfile_setattr_cbk] 0-hosdu-dht: setattr of uid/gid on /93 :<gfid:00000000-0000-0000-0000-000000000000> failed (No such file or directory)
In last couple of weeks of BVT run I haven't seen this issue. Hence lowering the severity
adding 3.0 flag and removing 2.1.z
From the logs it seems that one of the rebalance process is not able to perform setattr because file was not present at the backend. If it is a replicated volume, ideally lock should be taken. So not sure how it went into that situation. But able to reproduce it for not-replicated volume. 1. Initially there are two bricks. 2. Created file. org_file 3. Created symbolic link to org_file. [sym_file] 4. Added new brick. 5. Ran rebalance. NOTE: All bricks are on different node. Now rebalance process will run on all three nodes. Lets assume file needs to be migrated from node-2 to node-3. Rebalance-1: Rebalance-2 Rebalance-3 t1: Lookup (sym_file) t2: Create dht-link(T) at node-3 t3: Lookup(sym_file) t4: After some more operations delete the dht-link at node-3 t5: Do setattr at dht-link created at t2 above. NOTE: It will fail as at t4, rebalance-2 deleted dht-link file t6: Create symbolic-link at node3
Cloning this to 3.1. To be fixed in future.