Description of problem:
On investigation of this (http://lists.gluster.org/pipermail/gluster-devel/2017-October/053861.html) mail thread in devel, it is seen that for 3.10 branch symbol-check.sh should always report a failure, but does not do so.
On master the reason for failure is addressed via the commit: https://review.gluster.org/#/c/16820/
But, the commit also reads that the problem was identified on a local run and not due to failures in the regression setup.
So, on further investigation of why symbol check is not reporting failures, I found that on my local Fedora26 machine, the test actually fails on the 3.10 branch.
On checking with Nigel, to look into the regression machines, he has noted that it passes there as the script is not catching the symbol.
I will let Nigel post his observations.
The issue seems to stem from the fact that the 'nm' list of symbols in the regression machines spews out a lstat64 and not a __lxstat64 that the script is looking for. This needs correction.
So, Nigel and I discussed about it while waiting for my cheesecake, and he suggested that this could be caused by --enable-debug. As I had my laptop with me, we tested and indeed, the --enable-debug is what matter.
After a few tries, we narrowed it down to using -O0 vs -O2. --enable-debug switch the flags to -O0)
My understanding is that -O0 do not inline the lstat64 function, but -O2 do replace it with __lxstat64.
A quick search on Google show this is a recent optimisation of glibc and gcc 5:
The problem seems to be that the symbols change between debug and non-debug builds. If we want our regression machines we should look for symbols on a debug build. And if we want our developer machines to catch it, we should use a the optimized symbols. Can we do both?
The problem needs to be solved, as otherwise a future symbol leak is not preventable.
If required we may need an additional job that does not enable-debug (or add a task to an existing job) and checks for symbols.
Is there further information required to resolve the problem?
(In reply to Shyamsundar from comment #3)
> The problem needs to be solved, as otherwise a future symbol leak is not
> If required we may need an additional job that does not enable-debug (or add
> a task to an existing job) and checks for symbols.
> Is there further information required to resolve the problem?
Commitment to solve it - it was entered 1.5 years ago, and no one worked on it.
(In reply to Yaniv Kaul from comment #4)
> (In reply to Shyamsundar from comment #3)
> > The problem needs to be solved, as otherwise a future symbol leak is not
> > preventable.
> > If required we may need an additional job that does not enable-debug (or add
> > a task to an existing job) and checks for symbols.
> > Is there further information required to resolve the problem?
> Commitment to solve it - it was entered 1.5 years ago, and no one worked on
Are you looking at commitment from me to resolve this? Asking to understand as it was marked NEEDINFO against me.
If so do let me know, I can do what is required and see how best to provide the details to the infra team to add to the smoke jobs. (although I have to add, with the information provided I would assume we know what needs to be done)
yeah, it seems to have been forgotten with all more urgent fire, sorry. However, I miss lots of context on this and can't find the symbol-check.sh script anywhere. I guess our best bet would be to split the symbol check in a separate jobs, so we can do the debug build and do the test, rather than bundle that with the regular regression test. This would permit faster feedback on that matter.
The script is path or the glusterfs.git repository: https://github.com/gluster/glusterfs/blob/master/tests/basic/symbol-check.sh
While testing https://review.gluster.org/22364 I noticed that 0symbol-check failed when I used access() and not sys_access(). But it didn't fail for stat().
So I suspect only set of 'stat()' functions are missed out.