1757804 – Code coverage - call frequency is wrong

Bug 1757804 - Code coverage - call frequency is wrong

Summary: Code coverage - call frequency is wrong

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	project-infrastructure
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-02 13:27 UTC by Yaniv Kaul
Modified:	2020-03-12 12:15 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-12 12:15:37 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yaniv Kaul 2019-10-02 13:27:34 UTC

We might be using an old release of the tool, but currently, it has issues with the call frequency. As an example:

From https://build.gluster.org/job/line-coverage/Line_20Coverage_20Report/libglusterfs/src/fd-lk.c.gcov.html:

1051783 : fd_lk_ctx_unref(fd_lk_ctx_t *lk_ctx)
      64             : {
      65     1051783 :     int ref = -1;
      66             : 
      67     1051783 :     GF_VALIDATE_OR_GOTO("fd-lk", lk_ctx, err);
      68             : 
      69     1051783 :     ref = GF_ATOMIC_DEC(lk_ctx->ref);
      70     1051783 :     if (ref < 0)
      71           0 :         GF_ASSERT(!ref);
      72     1051783 :     if (ref == 0)
      73      738446 :         _fd_lk_destroy_lock_list(lk_ctx);
      74             : 
      75     1051798 :     if (ref == 0) {
      76      738443 :         LOCK_DESTROY(&lk_ctx->lock);
      77      738442 :         GF_FREE(lk_ctx);


How exactly is line 75 called more than the others, and the whole function?!

Comment 1 M. Scherer 2019-10-03 09:49:28 UTC

I would have blamed gcc optimisation, but that's unlikely (despites -O0 doing some minor optimisation, I didn't found anything that could result into that). There is also curious output, such as line 76 called 1 time more than line 77, and less than line 73, despites all of them supposed to be in the same clause. 

Line 52 of the same file is the same:
list_for_each_entry_safe is called 585195 time, while the function is called 583093. 

The same goes in the others file, like
https://build.gluster.org/job/line-coverage/Line_20Coverage_20Report/libglusterfs/src/common-utils.c.gcov.html line 92 ( "hash = XXH64(data, len, seed);") vs 93 ( "XXH64_canonicalFromHash(&c_hash, hash);" ). 

So it could be normal, or it could be a bug. I am gonna take a quick look at the assembly generated out of curiosity, and I did ask to a few compiler folks on irc, but if this doesn't yield anything, I will just close the bug, unless I am missing something. The goal of coverage is to see what is tested from what is not, so even with such errors, it should be good enough for that.

Comment 2 Yaniv Kaul 2019-10-03 10:11:29 UTC

Amar was under the impression we are running an old version of whatever is measuring the coverage. Could it be?

Comment 3 M. Scherer 2019-10-03 10:21:12 UTC

We are using the RHEL 7 version (gcov), so yeah, that's indeed old. We could switch to Fedora builders, but that would imply that the test suite is running on Fedora and compile on it, even with a newer GCC (as this did create issue on the past). We can also push that on Centos 8 once we are ready for it, I didn't had time to look yet.

After (painfully) reading the assembly, I think the problem is that LOCK_DESTROY can take 2 paths, one using pthread_spin_destroy, another using pthread_mutex_destroy, and this could mess with the counter somehow. Now, why does it happen, I suspect there is somehow a bug. 

My build (on RHEL 7) do have HAVE_SPINLOCK defined (in config.log), and yet, the compiled code for fd-lk.c seems to act as if LOCKING_IMPL was undefined (that's in variable defined in ./libglusterfs/src/locking.c). I am not sure if that's by design or not, it could be, but having different behavior for the same macro strike me as unusual.

I guess someone who have a idea of the locking code on gluster should take a look, cause we are at the limits of my skills
(still no answer from compiler folks)

Comment 4 M. Scherer 2019-10-03 10:49:05 UTC

so maybe the problem is that since the code is using spinlock sometime (but not always), and spinlock by nature would be a empty loop, then it might explain for that specific case why it do run more often than the rest of the function. Since that's a macro expanded to multiline code, I guess lcov/gcov might get confused by the fact that some part of the code is running more than the others, or something like that. There is a option on gcov for that (-a), but not in lcov. 

But again, that would matter only if we were looking at doing something with that precise info, no ?

Comment 5 Worker Ant 2020-03-12 12:15:37 UTC

This bug is moved to https://github.com/gluster/project-infrastructure/issues/7, and will be tracked there from now on. Visit GitHub issues URL for further details

Note You need to log in before you can comment on or make changes to this bug.