Bug 1615582

Summary: test: ./tests/basic/stats-dump.t fails spuriously not finding queue_size in stats output for some brick
Product: [Community] GlusterFS Reporter: Shyamsundar <srangana>
Component: testsAssignee: Shyamsundar <srangana>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-5.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-23 15:17:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shyamsundar 2018-08-13 18:50:11 UTC
Description of problem:
./tests/basic/stats-dump.t test fails as follows:

  01:07:31 not ok 20 , LINENUM:42
  01:07:31 FAILED COMMAND: grep .queue_size
/var/lib/glusterd/stats/glusterfsd__d_backends_patchy1.dump

  18:35:43 not ok 21 , LINENUM:43
  18:35:43 FAILED COMMAND: grep .queue_size
/var/lib/glusterd/stats/glusterfsd__d_backends_patchy2.dump

Basically when grep'ing for a pattern in the stats dump it is not
finding the second grep pattern of "queue_size" in one or the other bricks.

The above seems incorrect, if it found "aggr.fop.write.count" it stands
to reason that it found a stats dump, further there is a 2 second sleep
as well in the test case and the dump interval is 1 second.

How reproducible: Sporadic

Additional info:
This has failed in mux and non-mux environments,
Runs with failure:
https://build.gluster.org/job/regression-on-demand-multiplex/175/consoleFull
(no logs)

https://build.gluster.org/job/regression-on-demand-full-run/59/consoleFull
(has logs)

The only reason for this to fail could hence possibly be that the file
was just (re)opened (by the io-stats dumper thread) for overwriting
content, at which point the fopen uses the mode "w+", and the file was
hence truncated, and the grep CLI also opened the file at the same time,
and hence found no content.

Comment 1 Worker Ant 2018-08-13 19:20:52 UTC
REVIEW: https://review.gluster.org/20726 (tests: Fix spurious failures in stats-dump.t test) posted (#1) for review on master by Shyamsundar Ranganathan

Comment 2 Worker Ant 2018-08-16 06:09:48 UTC
COMMIT: https://review.gluster.org/20726 committed in master by "Amar Tumballi" <amarts> with a commit message- tests: Fix spurious failures in stats-dump.t test

The test fails to grep and find queue_size, in a brick stats
dump, having succesfully found aggr.* values in the same.

The troubleshot is that, the writer thread in io-stats, that
dumps this in a particular interval, truncates the file just before
the grep attempts to read the contents, and hence the failure.

The fix is to stop the dumper thread, and then wait for a couple
of seconds and then check the output, so that the thread writer
does not interfere with the test.

Fixes: bz#1615582
Change-Id: I29f95488a2ad693abe1dd525b1d87a9d1eee29a2
Signed-off-by: ShyamsundarR <srangana>

Comment 3 Shyamsundar 2018-10-23 15:17:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/