Bug 1508025 - symbol-check.sh is not failing for legitimate reasons
Summary: symbol-check.sh is not failing for legitimate reasons
Keywords:
Status: NEW
Alias: None
Product: GlusterFS
Classification: Community
Component: tests
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-31 17:30 UTC by Shyamsundar
Modified: 2019-08-31 17:57 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Shyamsundar 2017-10-31 17:30:08 UTC
Description of problem:

On investigation of this (http://lists.gluster.org/pipermail/gluster-devel/2017-October/053861.html) mail thread in devel, it is seen that for 3.10 branch symbol-check.sh should always report a failure, but does not do so.

On master the reason for failure is addressed via the commit:  https://review.gluster.org/#/c/16820/

But, the commit also reads that the problem was identified on a local run and not due to failures in the regression setup.

So, on further investigation of why symbol check is not reporting failures, I found that on my local Fedora26 machine, the test actually fails on the 3.10 branch.

On checking with Nigel, to look into the regression machines, he has noted that it passes there as the script is not catching the symbol.

I will let Nigel post his observations.

The issue seems to stem from the fact that the 'nm' list of symbols in the regression machines spews out a lstat64 and not a __lxstat64 that the script is looking for. This needs correction.

Comment 1 M. Scherer 2017-10-31 20:25:38 UTC
So, Nigel and I discussed about it while waiting for my cheesecake, and he suggested that this could be caused by --enable-debug. As I had my laptop with me, we tested and indeed, the --enable-debug is what matter. 

After a few tries, we narrowed it down to using -O0 vs -O2. --enable-debug switch the flags to -O0)

My understanding is that -O0 do not inline the lstat64 function, but -O2 do replace it with __lxstat64.

A quick search on Google show this is a recent optimisation of glibc and gcc 5:
https://sourceware.org/ml/libc-alpha/2015-08/msg00560.html

Comment 2 Nigel Babu 2017-11-01 13:31:59 UTC
The problem seems to be that the symbols change between debug and non-debug builds. If we want our regression machines we should look for symbols on a debug build. And if we want our developer machines to catch it, we should use a the optimized symbols. Can we do both?

Comment 3 Shyamsundar 2019-04-17 13:54:45 UTC
The problem needs to be solved, as otherwise a future symbol leak is not preventable.

If required we may need an additional job that does not enable-debug (or add a task to an existing job) and checks for symbols.

Is there further information required to resolve the problem?

Comment 4 Yaniv Kaul 2019-04-17 14:39:06 UTC
(In reply to Shyamsundar from comment #3)
> The problem needs to be solved, as otherwise a future symbol leak is not
> preventable.
> 
> If required we may need an additional job that does not enable-debug (or add
> a task to an existing job) and checks for symbols.
> 
> Is there further information required to resolve the problem?

Commitment to solve it - it was entered 1.5 years ago, and no one worked on it.

Comment 5 Shyamsundar 2019-04-17 14:53:50 UTC
(In reply to Yaniv Kaul from comment #4)
> (In reply to Shyamsundar from comment #3)
> > The problem needs to be solved, as otherwise a future symbol leak is not
> > preventable.
> > 
> > If required we may need an additional job that does not enable-debug (or add
> > a task to an existing job) and checks for symbols.
> > 
> > Is there further information required to resolve the problem?
> 
> Commitment to solve it - it was entered 1.5 years ago, and no one worked on
> it.

Are you looking at commitment from me to resolve this? Asking to understand as it was marked NEEDINFO against me.

If so do let me know, I can do what is required and see how best to provide the details to the infra team to add to the smoke jobs. (although I have to add, with the information provided I would assume we know what needs to be done)

Comment 6 M. Scherer 2019-04-17 15:02:00 UTC
yeah, it seems to have been forgotten with all more urgent fire, sorry. However, I miss lots of context on this and can't find the symbol-check.sh script anywhere. I guess our best bet would be to split the symbol check in a separate jobs, so we can do the debug build and do the test, rather than bundle that with the regular regression test. This would permit faster feedback on that matter.

Comment 7 Niels de Vos 2019-04-17 15:44:33 UTC
The script is path or the glusterfs.git repository: https://github.com/gluster/glusterfs/blob/master/tests/basic/symbol-check.sh

Comment 8 Amar Tumballi 2019-06-14 09:12:13 UTC
An update:

While testing https://review.gluster.org/22364 I noticed that 0symbol-check failed when I used access() and not sys_access(). But it didn't fail for stat().

So I suspect only set of 'stat()' functions are missed out.


Note You need to log in before you can comment on or make changes to this bug.