Hide Forgot
REVIEW: http://review.gluster.org/6481 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/6481 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/6481 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/6481 committed in master by Anand Avati (avati) ------ commit 493008a299cd1197df0caee72eacd12c1a54606b Author: Pranith Kumar K <pkarampu> Date: Wed Dec 11 15:19:25 2013 +0530 cluster/dht: Make sure gf_defrag_migrate_data is not optimized Problem: Whenever there syncop_xxx() is used inside a synctask and gcc optimizes it when compiled with -O2 there is a problem where 'errno' would not work as expected. Fix: Until http://review.gluster.com/6475 is reviewed and merged we are making sure the function is not going to be optimized. Change-Id: I504c18c8a7789f0c776a56f0aa60db3618b21601 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/6481 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Anand Avati <avati>
REVIEW: http://review.gluster.org/6475 (syncop: Change return value of syncop) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/6475 (syncop: Change return value of syncop) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/6475 (syncop: Change return value of syncop) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/6475 committed in master by Anand Avati (avati) ------ commit 8d55c25f158921b508bff0e7f25158991913f922 Author: Pranith Kumar K <pkarampu> Date: Wed Dec 11 09:33:53 2013 +0530 syncop: Change return value of syncop Problem: We found a day-1 bug when syncop_xxx() infra is used inside a synctask with compilation optimization (CFLAGS -O2). Detailed explanation of the Root cause: We found the bug in 'gf_defrag_migrate_data' in rebalance operation: Lets look at interesting parts of the function: int gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc, dict_t *migrate_data) { ..... code section - [ Loop ] while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL, &entries)) != 0) { ..... code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a thread) /* Need to keep track of ENOENT errno, that means, there is no need to send more readdirp() */ readdir_operrno = errno; ..... code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread) ret = syncop_getxattr (this, &entry_loc, &dict, GF_XATTR_LINKINFO_KEY); code section - [ ERRNO-2] (checking for failures of syncop_getxattr(). This may not always be executed in same thread which executed [SYNCOP-1]) if (ret < 0) { if (errno != ENODATA) { loglevel = GF_LOG_ERROR; defrag->total_failures += 1; ..... } the function above could be executed by thread(t1) till [SYNCOP-1] and code from [ERRNO-2] can be executed by a different thread(t2) because of the way syncop-infra schedules the tasks. when the code is compiled with -O2 optimization this is the assembly code that is generated: [ERRNO-1] 1165 readdir_operrno = errno; <<---- errno gets expanded as *(__errno_location()) 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 <address@hidden> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ Address returned by __errno_location() is stored in a special location in stack for later use. 0x00007fd149d48b77 <+519>: mov (%rax),%eax 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp) .... [ERRNO-2] 1281 if (errno != ENODATA) { 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- Because it already stored the address returned by __errno_location(), it just dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!! 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp 0x00007fd149d492b8 <+2376>: mov (%rax),%edi 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi The problem is that __errno_location() value of t1 and t2 are different. So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is executing [ERRNO-2] code section. When code is compiled without any optimization for [ERRNO-2]: 1281 if (errno != ENODATA) { 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 <address@hidden><<--- As it is calling __errno_location() again it gets the location from t2 so it works as intended. 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 <gf_defrag_migrate_data+2287> Fix: Make syncop_xxx() return (-errno) value as the return value in case of errors and all the functions which make syncop_xxx() will need to use (-ret) to figure out the reason for failure in case of syncop_xxx() failures. Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/6475 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Anand Avati <avati>
REVIEW: http://review.gluster.org/6745 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#1) for review on release-3.5 by Lalatendu Mohanty (lmohanty)
COMMIT: http://review.gluster.org/6745 committed in release-3.5 by Vijay Bellur (vbellur) ------ commit 9b13d17f27d65c3bff68b6c6a69cf0f02c972e73 Author: Pranith Kumar K <pkarampu> Date: Wed Dec 11 15:19:25 2013 +0530 cluster/dht: Make sure gf_defrag_migrate_data is not optimized Problem: Whenever there syncop_xxx() is used inside a synctask and gcc optimizes it when compiled with -O2 there is a problem where 'errno' would not work as expected. Fix: Until http://review.gluster.com/6475 is reviewed and merged we are making sure the function is not going to be optimized. Change-Id: I504c18c8a7789f0c776a56f0aa60db3618b21601 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/6481 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/6745 Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user