Bug 1128972

Summary:	[RFE] Include size and md5sum of original archive in task manager summary display - at least for failed tasks
Product:	[Fedora] Fedora EPEL	Reporter:	Brad Hubbard <bhubbard>
Component:	retrace-server	Assignee:	Matej Marušák <mmarusak>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	epel7	CC:	dkwon, dwysocha, harshula, jberan, mmarusak, rvokal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	retrace-server-1.17.0-1.fc26 retrace-server-1.17.0-1.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-04-03 16:09:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1428040

Description Brad Hubbard 2014-08-12 02:19:50 UTC

Description of problem:

Many times it would be nice to be able to quickly reference the md5sum and size of the original archive that retrace has processed. I believe it would be nice to see this on the summzry screen for the task.

Comment 2 Dave Wysochanski 2016-02-29 15:36:15 UTC

I think this is useful though I'm not sure about running it by default.  Also usually we get md5sums when we have things like split vmcores.  Is it worth the overhead to run it every time?  Possibly.

If / when we do this, probably what we need here is:

1. a 'md5sum' file inside the RetraceServer class

2. way to see the contents easily from 'manager' page
  a) refactor the 'manager' page so we can display it directly
  b) add a clickable link on the 'manager' page to cat the file

3. ? Possibly an /etc/retrace-server.conf option to run md5sum by default

4. ? Checkbox for run md5sum in task submission

I wonder if now is a good time to think about refactoring the 'manager' page a bit.  Maybe we can address the problem with the manager page taking a long time if the # of ftp files go bonkers (https://bugzilla.redhat.com/show_bug.cgi?id=1124462) as well as think about the UI for userspace core support (https://bugzilla.redhat.com/show_bug.cgi?id=1292556)

Comment 3 Dave Wysochanski 2016-04-01 11:41:17 UTC

I thought about this some more, and I don't think we need to do md5sum on the manager page, nor do we need to do it for all tasks.  We mainly just need the md5sum when the task fails in some way, and we can probably just put this in retrace_log.

Unfortunately today the task may *succeed* in that retrace-server identifies the kernel version, but then crash *fails*.  In this case we need to change the 'pass / fail' status based on crash results.

Comment 4 Matej Marušák 2016-11-29 09:45:15 UTC

I looked at this and have some comments.
1) I think that the best way is to make this feature configurable from config file (enable/disable by default) as well as from submission page (checkbox) rather than after failed task. And that for two reasons:
a) Task can fail for many reasons and I don't think that in most cases user want to check the md5sum.
b) After crash we don't have the original archive. In fact it gets deleted right after it is unpacked. Of course we can keep it for the whole time of retracing, but it takes space.

2) It is possible to print the md5sum into retrace_log, but it would be better to have it in file. My question is if you want to have separate checksum for each downloaded resource or one for all of them. If for each one, the output file must contain on each line pair (filename md5sum). Or I thought that it could be useful to edit 'downloaded' file to contain this pair. And it would be displayed in manager page under 'Downloaded resources:'.

Comment 5 Matej Marušák 2016-12-08 13:56:50 UTC

Hi,

I done what I thought would be best.
https://github.com/marusak/retrace-server/commit/58269c1867dd19b3a517d938ea3c67595dbda16c
Could you look at it and say, if it implements desired feature?

Comment 6 Dave Wysochanski 2017-01-24 22:21:33 UTC

(In reply to Matej Marušák from comment #5)
> Hi,
> 
> I done what I thought would be best.
> https://github.com/marusak/retrace-server/commit/
> 58269c1867dd19b3a517d938ea3c67595dbda16c
> Could you look at it and say, if it implements desired feature?

It looks ok but the problem is a user won't know ahead of time if the vmcore is damaged.

Here's a couple options:

1. Unconditionally calculate the md5sum on all vmcores.  This would mean all tasks would encounter a delay though and it could be pretty large for huge vmcores so I don't think this is a good idea.

2. Make this bug dependent on whether retrace is successful or not, and ideally that crash is able to be run successfully.  I think we need a dependency on https://bugzilla.redhat.com/show_bug.cgi?id=1232019 to do it right but we could just do the current definition of 'failed' for now.

3. Allow a user to easily resubmit a failed vmcore and get the md5sum

I'd vote for some form of #2 but not sure how hard it is.

Comment 7 Matej Marušák 2017-01-30 15:13:33 UTC

The patch I sent implements no. 1. Or better to say, it enables for users to unconditionally calculate the md5sum. So it can implements also no.3 but there is no button for resubmiting. (user can run the task, if it fails can run it again but select running it with calculating md5).

As I wrote in previous comment, no.2 is the hardest. Retrace server deletes original archives immediately after its extraction (and there is a good reason for that). If we want to implement no.2 we have to keep this archives until (un)successful end and then decide what to do with them.

Comment 8 Dave Wysochanski 2017-02-15 16:13:31 UTC

(In reply to Matej Marušák from comment #7)
> The patch I sent implements no. 1. Or better to say, it enables for users to
> unconditionally calculate the md5sum. So it can implements also no.3 but
> there is no button for resubmiting. (user can run the task, if it fails can
> run it again but select running it with calculating md5).
> 
> As I wrote in previous comment, no.2 is the hardest. Retrace server deletes
> original archives immediately after its extraction (and there is a good
> reason for that). If we want to implement no.2 we have to keep this archives
> until (un)successful end and then decide what to do with them.

Can you make sure you add a separate 'STATUS' in src/retrace/retrace.py for calculating an md5sum, and make sure a log is written with a timestamp in retrace-log for the before / after time of md5sum so we can determine how much time this takes, and whether or not we want to enable md5sum by default in production?

Now that I re-read comment #0 and think  about the design I realize you're right - option 2 is not feasible today.  I think for now what you have in comment #5 is ok (with a timestamp added) and we can try it as long as we can enable/disable the default of md5sum or not on a global basis which seems to be the case.  DISCLAIMER: I didn't test your patch but can test this when we get a build.

Comment 9 Matej Marušák 2017-02-16 10:26:52 UTC

I like the idea with the STATUS. So I did so.
Now you can see in the log something like this:
    2017-02-16 11:16:49 Calculating md5sum
    2017-02-16 11:16:53 Post-processing downloaded file
so you can see it took 4 seconds.
I created PR https://github.com/abrt/retrace-server/pull/143

Comment 10 Dave Wysochanski 2017-02-21 15:51:47 UTC

(In reply to Matej Marušák from comment #9)
> I like the idea with the STATUS. So I did so.
> Now you can see in the log something like this:
>     2017-02-16 11:16:49 Calculating md5sum
>     2017-02-16 11:16:53 Post-processing downloaded file
> so you can see it took 4 seconds.
> I created PR https://github.com/abrt/retrace-server/pull/143

Ok I built a test package as follows:
* Tue Feb 21 2017 Dave Wysochanski <dwysocha> 1.16-5
- test build from upstream "e07aa37" plus "d15c513 Include md5sum"

The md5sum patch looks like it has a bug.  If I try to queue any vmcore via the web (http://retrace-server-url/manager/filanme) on the screen I get an "Internal Server Error" (error 500), and here's what is in the ssl_error_log:

# tail /var/log/httpd/ssl_error_log
[Tue Feb 21 10:38:19 2017] [error] [client 10.12.214.15] mod_wsgi (pid=28898): Exception occurred processing WSGI script '/usr/share/retrace-server/manager.wsgi'.
[Tue Feb 21 10:38:19 2017] [error] [client 10.12.214.15] Traceback (most recent call last):
[Tue Feb 21 10:38:19 2017] [error] [client 10.12.214.15]   File "/usr/share/retrace-server/manager.wsgi", line 395, in application
[Tue Feb 21 10:38:19 2017] [error] [client 10.12.214.15]     if task.has_md5sum():
[Tue Feb 21 10:38:19 2017] [error] [client 10.12.214.15] UnboundLocalError: local variable 'task' referenced before assignment
[Tue Feb 21 10:38:45 2017] [error] [client 10.12.214.15] mod_wsgi (pid=28899): Exception occurred processing WSGI script '/usr/share/retrace-server/manager.wsgi'.
[Tue Feb 21 10:38:45 2017] [error] [client 10.12.214.15] Traceback (most recent call last):
[Tue Feb 21 10:38:45 2017] [error] [client 10.12.214.15]   File "/usr/share/retrace-server/manager.wsgi", line 395, in application
[Tue Feb 21 10:38:45 2017] [error] [client 10.12.214.15]     if task.has_md5sum():
[Tue Feb 21 10:38:45 2017] [error] [client 10.12.214.15] UnboundLocalError: local variable 'task' referenced before assignment


I think you need:
        if not ftptask and task.has_md5sum():

Once I fix that up I get another similar error though, which looks unrelated to your patch:
 tail /var/log/httpd/ssl_error_log
[Tue Feb 21 10:42:46 2017] [error] [client 10.12.214.15] mod_wsgi (pid=28901): Exception occurred processing WSGI script '/usr/share/retrace-server/manager.wsgi'.
[Tue Feb 21 10:42:46 2017] [error] [client 10.12.214.15] Traceback (most recent call last):
[Tue Feb 21 10:42:46 2017] [error] [client 10.12.214.15]   File "/usr/share/retrace-server/manager.wsgi", line 398, in application
[Tue Feb 21 10:42:46 2017] [error] [client 10.12.214.15]     starttime = task.get_default_started_time()
[Tue Feb 21 10:42:46 2017] [error] [client 10.12.214.15] UnboundLocalError: local variable 'task' referenced before assignment
[Tue Feb 21 10:45:47 2017] [error] [client 10.12.214.15] mod_wsgi (pid=29039): Exception occurred processing WSGI script '/usr/share/retrace-server/manager.wsgi'.
[Tue Feb 21 10:45:47 2017] [error] [client 10.12.214.15] Traceback (most recent call last):
[Tue Feb 21 10:45:47 2017] [error] [client 10.12.214.15]   File "/usr/share/retrace-server/manager.wsgi", line 398, in application
[Tue Feb 21 10:45:47 2017] [error] [client 10.12.214.15]     starttime = task.get_default_started_time()
[Tue Feb 21 10:45:47 2017] [error] [client 10.12.214.15] UnboundLocalError: local variable 'task' referenced before assignment


Looks like there's multiple problems in the latest code.  Can you take a closer look?

Comment 11 Matej Marušák 2017-02-21 17:12:00 UTC

Thanks for testing the patch!
You have in config allowed 'UseFTPTasks', right? As I look into the code I see that a bug was introduced a while ago. I duplicated this bug to my patch as well. I will fix both - the old bug and mine as well.

Comment 12 Dave Wysochanski 2017-02-22 13:45:31 UTC

(In reply to Matej Marušák from comment #11)
> Thanks for testing the patch!
> You have in config allowed 'UseFTPTasks', right? As I look into the code I

Yes you're right.
$ grep UseFTP /etc/retrace-server.conf
UseFTPTasks = 1

> see that a bug was introduced a while ago. I duplicated this bug to my patch
> as well. I will fix both - the old bug and mine as well.

Thanks!

Comment 13 Dave Wysochanski 2017-03-21 19:07:30 UTC

This has been merged in https://github.com/abrt/retrace-server/pull/143

Comment 14 Fedora Update System 2017-03-30 14:10:51 UTC

retrace-server-1.17.0-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-3d55370e77

Comment 15 Fedora Update System 2017-03-30 14:11:37 UTC

retrace-server-1.17.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-ffb8a84c9c

Comment 16 Fedora Update System 2017-03-30 14:11:55 UTC

retrace-server-1.17.0-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-9390d60e0d

Comment 17 Fedora Update System 2017-03-30 18:54:10 UTC

retrace-server-1.17.0-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-ffb8a84c9c

Comment 18 Fedora Update System 2017-03-31 03:47:26 UTC

retrace-server-1.17.0-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-9390d60e0d

Comment 19 Fedora Update System 2017-03-31 03:48:49 UTC

retrace-server-1.17.0-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-3d55370e77

Comment 20 Fedora Update System 2017-04-03 16:09:38 UTC

retrace-server-1.17.0-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2017-04-18 21:18:23 UTC

retrace-server-1.17.0-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.