Bug 1380247

Summary: Get diagnostic information for aborted test runs
Product: [Community] GlusterFS Reporter: Nigel Babu <nigelb>
Component: project-infrastructureAssignee: Nigel Babu <nigelb>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, nbalacha, rgowdapp
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-20 04:49:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nigel Babu 2016-09-29 05:41:40 UTC
> Dump from RaghavendraG's email

Do we collect any diagnostic information before aborting tests as in [1]? If yes, where can I find them?

If no, I think following information would be useful

1. ps output of all relevant gluster processes and tests running on them (to find status of processes like 'D' etc)
2. Statedump of client and brick processes (better to dump all information like inodes, call-stack, etc)
3. coredump of client and brick processes

If you think any other information is helpful, please add to the list.

Some points to note are

* to enable dumping of all objects we need to do,

echo "all=yes" >> $statedumpdir/glusterdump.options

before we issue commands to collect statedumps.

* we need to do

# kill -SIGUSR1 <pid-of-mount-process>

to collect statedump of client process as there is no cli command to issue.

* for bricks,

[root@unused rhs-glusterfs]# gluster volume statedump
Usage: volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history]

Comment 1 Nigel Babu 2016-10-18 12:40:10 UTC
So, the statedumps are dumped to /var/run and there's no easy way to predict the names of the files.

The best way to go about it:

0. At the start of the job remove any files older than 15 days from /var/log/glusterdump/
1. Set the statedump path to /var/log/glusterdump/<testname>-<number>/ at the start of the job.
2. If the job aborts create a publisher to actually take statedumps similar to the logic used to return centos ci nodes (Needs verification this works for aborted jobs)
3. Archive and publish link.

Comment 2 Nigel Babu 2017-07-31 05:54:14 UTC
*** Bug 1475350 has been marked as a duplicate of this bug. ***

Comment 3 Nigel Babu 2017-07-31 05:55:11 UTC
We now have logs of aborted runs. See: https://build.gluster.org/job/centos6-regression/5779/console

Comment 4 Nigel Babu 2018-07-20 04:49:44 UTC
We no longer abort jobs. Moving this to WONTFIX.