Bug 1380247 - Get diagnostic information for aborted test runs
Summary: Get diagnostic information for aborted test runs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nigel Babu
QA Contact:
URL:
Whiteboard:
: 1475350 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-29 05:41 UTC by Nigel Babu
Modified: 2018-07-20 04:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-20 04:49:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nigel Babu 2016-09-29 05:41:40 UTC
> Dump from RaghavendraG's email

Do we collect any diagnostic information before aborting tests as in [1]? If yes, where can I find them?

If no, I think following information would be useful

1. ps output of all relevant gluster processes and tests running on them (to find status of processes like 'D' etc)
2. Statedump of client and brick processes (better to dump all information like inodes, call-stack, etc)
3. coredump of client and brick processes

If you think any other information is helpful, please add to the list.

Some points to note are

* to enable dumping of all objects we need to do,

echo "all=yes" >> $statedumpdir/glusterdump.options

before we issue commands to collect statedumps.

* we need to do

# kill -SIGUSR1 <pid-of-mount-process>

to collect statedump of client process as there is no cli command to issue.

* for bricks,

[root@unused rhs-glusterfs]# gluster volume statedump
Usage: volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history]

Comment 1 Nigel Babu 2016-10-18 12:40:10 UTC
So, the statedumps are dumped to /var/run and there's no easy way to predict the names of the files.

The best way to go about it:

0. At the start of the job remove any files older than 15 days from /var/log/glusterdump/
1. Set the statedump path to /var/log/glusterdump/<testname>-<number>/ at the start of the job.
2. If the job aborts create a publisher to actually take statedumps similar to the logic used to return centos ci nodes (Needs verification this works for aborted jobs)
3. Archive and publish link.

Comment 2 Nigel Babu 2017-07-31 05:54:14 UTC
*** Bug 1475350 has been marked as a duplicate of this bug. ***

Comment 3 Nigel Babu 2017-07-31 05:55:11 UTC
We now have logs of aborted runs. See: https://build.gluster.org/job/centos6-regression/5779/console

Comment 4 Nigel Babu 2018-07-20 04:49:44 UTC
We no longer abort jobs. Moving this to WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.