Description of problem: ======================= Some of the files are getting truncated to 0 size after doing renames from multiple mounts and running rebalance simultaneously. Version-Release number of selected component (if applicable): ============================================================= 3.6.0.27-6.el6rhs.x86_64 How reproducible: ================= intermittent Steps to Reproduce: ================== 1. created 100 files on the mount point of size 1MB 2. started renaming from multiple mount points 3. did add-brick and rebalance 4. ran rebalance many times 5. created a directory test 6. moved all the files inside test/ directory 7. then created another directory test1/ 8. mv test/* test1/ Result: some of the files got truncated to 0 size -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f83-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f84-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f85-18 -rw-r--r-- 1 root root 0 Aug 28 05:43 f8-57 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f86-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f87-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f88-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f89-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f90-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f91-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f9-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f92-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f93-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f94-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f95-18 -rw-r--r-- 1 root root 0 Aug 28 05:43 f9-57 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f96-18 files which are truncated [root@localhost test1]# find . -size 0 ./f5-19 ./f8-57 ./f4-57 ./f9-57 ./f6-57 files ==== -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f100-16 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f10-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f11-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f1-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f12-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f13-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f14-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f15-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f16-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f17-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f18-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f19-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f20-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f21-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f2-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f22-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f23-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f24-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f25-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f26-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f27-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f28-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f29-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f30-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f31-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f3-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f32-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f33-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f34-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f35-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f36-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f37-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f38-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f39-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f40-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f41-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f4-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f42-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f43-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f44-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f45-17 -rw-r--r-- 1 root root 0 Aug 28 05:43 f4-57 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f46-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f47-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f48-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f49-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f50-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f51-17 -rw-r--r-- 1 root root 0 Aug 28 05:43 f5-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f52-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f53-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f54-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f55-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f56-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f57-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f58-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f59-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f60-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f61-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f6-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f62-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f63-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f64-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f65-17 -rw-r--r-- 1 root root 0 Aug 28 05:43 f6-57 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f66-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f67-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f68-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f69-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f70-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f71-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f7-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f72-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f73-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f74-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f75-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f76-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f77-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f78-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f79-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f80-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f81-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f8-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f82-17 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f83-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f84-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f85-18 -rw-r--r-- 1 root root 0 Aug 28 05:43 f8-57 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f86-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f87-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f88-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f89-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f90-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f91-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f9-19 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f92-18 -rw-r--r-- 1 root root 1048576 Aug 28 05:43 f93-18 Actual results: =============== few files are truncated to size 0
Analysis for the issue that was logged as part of the #Description: ------------------------------------------------------------------- This is not a data loss scenario, as a matter of fact everything is in working order. Here is why, 1) There was a 0 byte file named f[4,5,6,8,9]-57 already on the system Evidence: NFS log has the CREATE for these files 2) When the rename happens, there is a rename for f5-57 -> f5-1 when the new f5-1 exists (from a rename of f5-340), so this is a case of renaming over an existing file, at this point f5-* is a 0 byte file and only 1 instance survives. Evidence: NFS log [2014-08-28 10:18:15.259705] D [MSGID: 0] [dht-rename.c:1281:dht_rename] 0-spacebar-dht: renaming /f5-340 (hash=spacebar-replicate-17/cache=spacebar-replicate-17) => /f5-1 (hash=spacebar-replicate-21/cache=<nul>) [2014-08-28 10:19:18.896736] D [MSGID: 0] [dht-rename.c:1281:dht_rename] 0-spacebar-dht: renaming /f5-57 (hash=spacebar-replicate-2/cache=spacebar-replicate-2) => /f5-1 (hash=spacebar-replicate-21/cache=spacebar-replicate-17) (NOTE: dstcached location found as R17 and will be unlinked, this unlink will not appear in the logs as this is a trace level message) [2014-08-28 10:24:05.684857] D [MSGID: 0] [dht-rename.c:1281:dht_rename] 0-spacebar-dht: renaming /f5-1 (hash=spacebar-replicate-21/cache=spacebar-replicate-2) => /f5-2 (hash=spacebar-replicate-4/cache=<nul>) The rename of f5-57 to f5-1 also deletes f5-1 at replicate-17 as the dstcached existed. 3) From the rebalance logs, we can see that f5-* was rebalanced a few times, there were no clashes etc. BUT between 2 rebalance of f5-* the cached changed, and that led to the above investigation, to determine who changed the cached location. (which is the stale/old prior test case created f5-57). Evidence: rebalance logs ./192.168.12.67/spacebar-rebalance.log:[2014-08-28 10:08:39.564778] I [dht-rebalance.c:865:dht_migrate_file] 0-spacebar-dht: /f5-340: attempting to move from spacebar-replicate-22 to spacebar-replicate-17 ./192.168.12.67/spacebar-rebalance.log:[2014-08-28 10:08:40.459044] I [MSGID: 109022] [dht-rebalance.c:1143:dht_migrate_file] 0-spacebar-dht: completed migration of /f5-340 from subvolume spacebar-replicate-22 to spacebar-replicate-17 As per time stamp, this session of rebalance moved the file from R22 (R = replicate) to R17 The very next rebalance for the file f5-* is below and moves it from R2 to R15 ./192.168.12.17/spacebar-rebalance.log:[2014-08-28 10:24:25.015226] I [dht-rebalance.c:865:dht_migrate_file] 0-spacebar-dht: /f5-3: attempting to move from spacebar-replicate-2 to spacebar-replicate-15 ./192.168.12.17/spacebar-rebalance.log:[2014-08-28 10:24:25.262409] I [MSGID: 109022] [dht-rebalance.c:1143:dht_migrate_file] 0-spacebar-dht: completed migration of /f5-3 from subvolume spacebar-replicate-2 to spacebar-replicate-15 So basically nothing other than rebalance can change the cached location for the file, which led to the hunt explained in (1) and (2) above. Overall, the test is working as intended, and we are good here. Attached are the log snippets from the various logs with some text on the analysis.
Created attachment 937204 [details] Log snippets as discussed in Comment #3 Log snippets to show the issue as described in comment #3
Moving this bug back to Nithya as we decided not to document the warning in the admin guide.