We might be using an old release of the tool, but currently, it has issues with the call frequency. As an example: From https://build.gluster.org/job/line-coverage/Line_20Coverage_20Report/libglusterfs/src/fd-lk.c.gcov.html: 1051783 : fd_lk_ctx_unref(fd_lk_ctx_t *lk_ctx) 64 : { 65 1051783 : int ref = -1; 66 : 67 1051783 : GF_VALIDATE_OR_GOTO("fd-lk", lk_ctx, err); 68 : 69 1051783 : ref = GF_ATOMIC_DEC(lk_ctx->ref); 70 1051783 : if (ref < 0) 71 0 : GF_ASSERT(!ref); 72 1051783 : if (ref == 0) 73 738446 : _fd_lk_destroy_lock_list(lk_ctx); 74 : 75 1051798 : if (ref == 0) { 76 738443 : LOCK_DESTROY(&lk_ctx->lock); 77 738442 : GF_FREE(lk_ctx); How exactly is line 75 called more than the others, and the whole function?!
I would have blamed gcc optimisation, but that's unlikely (despites -O0 doing some minor optimisation, I didn't found anything that could result into that). There is also curious output, such as line 76 called 1 time more than line 77, and less than line 73, despites all of them supposed to be in the same clause. Line 52 of the same file is the same: list_for_each_entry_safe is called 585195 time, while the function is called 583093. The same goes in the others file, like https://build.gluster.org/job/line-coverage/Line_20Coverage_20Report/libglusterfs/src/common-utils.c.gcov.html line 92 ( "hash = XXH64(data, len, seed);") vs 93 ( "XXH64_canonicalFromHash(&c_hash, hash);" ). So it could be normal, or it could be a bug. I am gonna take a quick look at the assembly generated out of curiosity, and I did ask to a few compiler folks on irc, but if this doesn't yield anything, I will just close the bug, unless I am missing something. The goal of coverage is to see what is tested from what is not, so even with such errors, it should be good enough for that.
Amar was under the impression we are running an old version of whatever is measuring the coverage. Could it be?
We are using the RHEL 7 version (gcov), so yeah, that's indeed old. We could switch to Fedora builders, but that would imply that the test suite is running on Fedora and compile on it, even with a newer GCC (as this did create issue on the past). We can also push that on Centos 8 once we are ready for it, I didn't had time to look yet. After (painfully) reading the assembly, I think the problem is that LOCK_DESTROY can take 2 paths, one using pthread_spin_destroy, another using pthread_mutex_destroy, and this could mess with the counter somehow. Now, why does it happen, I suspect there is somehow a bug. My build (on RHEL 7) do have HAVE_SPINLOCK defined (in config.log), and yet, the compiled code for fd-lk.c seems to act as if LOCKING_IMPL was undefined (that's in variable defined in ./libglusterfs/src/locking.c). I am not sure if that's by design or not, it could be, but having different behavior for the same macro strike me as unusual. I guess someone who have a idea of the locking code on gluster should take a look, cause we are at the limits of my skills (still no answer from compiler folks)
so maybe the problem is that since the code is using spinlock sometime (but not always), and spinlock by nature would be a empty loop, then it might explain for that specific case why it do run more often than the rest of the function. Since that's a macro expanded to multiline code, I guess lcov/gcov might get confused by the fact that some part of the code is running more than the others, or something like that. There is a option on gcov for that (-a), but not in lcov. But again, that would matter only if we were looking at doing something with that precise info, no ?
This bug is moved to https://github.com/gluster/project-infrastructure/issues/7, and will be tracked there from now on. Visit GitHub issues URL for further details