Bug 598249

Summary: [abrt] crash in system-config-printer-1.2.2-1.fc13: jobviewer.py:224:on_job_cancel_prompt_response:IndexError: list index out of range
Product: [Fedora] Fedora Reporter: Iván Jiménez <icj>
Component: system-config-printerAssignee: Tim Waugh <twaugh>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: jpopelka, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: abrt_hash:4240b118
Fixed In Version: system-config-printer-1.2.3-3.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-01 18:38:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
error_log none

Description Iván Jiménez 2010-05-31 21:43:28 UTC
abrt 1.1.0 detected a crash.

architecture: x86_64
cmdline: python /usr/share/system-config-printer/system-config-printer.py
comment: I'm trying to cancel a job in a HP Deskjet D2360, but it doesn't. Job says "canceled, no paper" but printer's led is blinking and it doesn't shutdown. First time I used Cancel, then I used Delete and the crash occured.
component: system-config-printer
executable: /usr/share/system-config-printer/system-config-printer.py
kernel: 2.6.33.4-95.fc13.x86_64
package: system-config-printer-1.2.2-1.fc13
reason: jobviewer.py:224:on_job_cancel_prompt_response:IndexError: list index out of range
release: Fedora release 13 (Goddard)

backtrace
-----
jobviewer.py:224:on_job_cancel_prompt_response:IndexError: list index out of range

Traceback (most recent call last):
  File "/usr/share/system-config-printer/jobviewer.py", line 224, in on_job_cancel_prompt_response
    self.connection.cancelJob (self.jobids[0], self.purge_job,
IndexError: list index out of range

Local variables in innermost frame:
self: <jobviewer.CancelJobsOperation instance at 0x296e908>
operation: 'eliminando trabajo'
c: <asyncconn.Connection object at 0x27c5290>
dialog: <gtk.Dialog object at 0x296f2d0 (GtkDialog at 0x24adb50)>
response: -8

Comment 1 Iván Jiménez 2010-05-31 21:43:31 UTC
Created attachment 418403 [details]
File: backtrace

Comment 2 Iván Jiménez 2010-05-31 21:57:23 UTC
I went to localhost:631 and purged all jobs, but it still appears. I'll attach error.log and messages contains:

May 31 16:35:43 endeavour hp[986]: io/hpmud/musb.c 1403: unable to write data hp:/usb/Deskjet_D2300_series?serial=TH66G160B804KT: 45 second io timeout
May 31 16:35:43 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:36:13 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:36:43 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:37:08 endeavour python: abrt: detected unhandled Python exception in /usr/share/system-config-printer/system-config-printer.py
May 31 16:37:08 endeavour abrtd: Directory 'pyhook-1275341828-1841' creation detected
May 31 16:37:08 endeavour abrt[1883]: saved python crash dump of pid 1841 to /var/cache/abrt/pyhook-1275341828-1841.new
May 31 16:37:09 endeavour abrtd: New crash /var/cache/abrt/pyhook-1275341828-1841, processing
May 31 16:37:09 endeavour abrtd: Registered Action plugin 'RunApp'
May 31 16:37:09 endeavour abrtd: RunApp('/var/cache/abrt/pyhook-1275341828-1841','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log .')
May 31 16:37:13 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:37:15 endeavour abrtd: Getting crash infos...
May 31 16:37:39 endeavour ntpd[1092]: 0.0.0.0 c615 05 clock_sync
May 31 16:37:43 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:37:54 endeavour abrtd: Creating report...
May 31 16:38:13 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:38:43 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:39:13 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:39:43 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:40:58 endeavour hp[986]: io/hpmud/musb.c 1403: unable to write data hp:/usb/Deskjet_D2300_series?serial=TH66G160B804KT: 45 second io timeout
May 31 16:40:58 endeavour hp[986]: prnt/backend/hp.c 611: ERROR: 1009 media-empty-error; will retry in 30 seconds...
May 31 16:43:29 endeavour abrtd: New bug id: 598249
May 31 16:49:31 endeavour abrtd: Getting crash infos...

Comment 3 Iván Jiménez 2010-05-31 21:58:05 UTC
Created attachment 418405 [details]
error_log

Comment 4 Iván Jiménez 2010-05-31 22:15:59 UTC
I reproduced it; it happens when deleting a job second time. To be able to power off the printer, I restarted cups, in messages appeared:

E [31/May/2010:17:07:31 -0500] [Job 369] Aborting job because it has no files.
E [31/May/2010:17:07:31 -0500] [Job 370] Aborting job because it has no files.

and job disappeared. As the led was still blinking, I unpluged and repluged the power cable.

Comment 5 Jiri Popelka 2010-06-03 12:44:14 UTC
I was not able to reproduce it
(deleting/canceling a job second time doesn't help in my case),
but upstream commit (24f07c0) should avoid the crash.

Comment 6 Tim Waugh 2010-06-03 14:15:40 UTC
I wonder if the real problem is some sort of race condition between JobViewer.job_removed() and JobViewer.on_job_cancel_activate().

Here's the scenario I'm thinking of:

user selects job
Job is cancelled/completed separately
cupsd sends D-Bus signal saying something changed
monitor receives signal, schedules a call to get_notifications()
User clicks Cancel
Monitor.get_notifications() runs, calls JobViewer.job_removed()
JobViewer.job_removed() removes it from self.jobids[]
Now JobViewer.on_job_cancel_activate() is called but self.jobids is empty

Does that sound possible?

In other words, I think the len(self.jobids)==0 check should probably go into JobViewer.on_job_cancel_activate2.

Comment 7 Iván Jiménez 2010-06-04 05:12:34 UTC
Some context for it: When I turned on the pc the printer's led was blinking. I tried to power it off but it didn't, so I went and canceled a pending job (error: out paper) from s-c-p but after 30 s nothing happened. So I tried delete and hit the crash. I'm trying to reproduce it but if I print without paper, the printer tries, goes to blinking and doesn't power off, but there aren't any pending jobs in s-c-p or cups; it appears as completed. So I guess it's very hw dependent.

Comment 8 Jiri Popelka 2010-06-04 10:46:36 UTC
Tim,
yes I had thought about some race condition.
My scenario has been this:
- User clicks Cancel
- JobViewer.on_job_cancel_activate() calls on_job_cancel_activate2()
- Dialog with "Do you really want to cancel/delete these job(s)?" is displayed.
- Monitor.get_notifications() runs, calls JobViewer.job_removed()
- JobViewer.job_removed() removes it from self.jobids[]
- User responses "Yes" to that dialog and on_job_cancel_prompt_response() is called but self.jobids[] is empty

In other words, I think that when I place (and I had thought about that) the condition in on_job_cancel_activate2() and the JobViewer.job_removed() is called when the user is looking at the "Really delete/cancel job(s)?" dialog, the crash occurs anyway.

I think it's not much likely that the JobViewer.job_removed() is called
between user's Cancel clicking and JobViewer.on_job_cancel_activate() calling.
But my solution (better would be "if not self.jobids:") can catch this possibility either.

Am I wrong ?

Comment 9 Tim Waugh 2010-06-07 15:17:07 UTC
There are two lists named 'jobids', one for each of the objects:

* JobViewer.jobids, a list of job IDs for the rows that are selected

* CancelJobsOperation.jobids, a list of job IDs to cancel

When JobViewer.on_job_cancel_activate2() is called it creates a new instance of the CancelJobsOperation class, passing in the current JobViewer.jobids list as the list of jobs to cancel:

    def on_job_cancel_activate2(self, purge_job):
        op = CancelJobsOperation (self, self.jobids, purge_job)

...

class CancelJobsOperation:
    def __init__ (self, jobviewer, jobids, purge_job):
...
        self.jobids = jobids

I'm fairly sure that instantiating CancelJobsOperation this way creates a 'deep copy' of the list, i.e. we end up with two separate Python list objects.

It is the CancelJobsOperation class which is responsible for presenting the confirmation dialog, and if the Monitor calls JobViewer.job_removed() at this point, the JobViewer.jobids list can become empty.  But the CancelJobsOperation object has its own jobids list which is not touched by that.

[Hmm, looking at CancelJobsOperation.cancelJob_error() it seems it could handle this situation (job canceled during confirmation dialog) better -- looks like it will just stop at that point, but it would be better if it carried on cancelling the other jobs in the list.]

Comment 10 Jiri Popelka 2010-06-10 12:07:35 UTC
(In reply to comment #9)
> I'm fairly sure that instantiating CancelJobsOperation this way creates a 'deep
> copy' of the list, i.e. we end up with two separate Python list objects.
Hmm, my python is not so sure about that.
This snippet of code:
class CancelJobsOperation:
    def __init__ (self, jobids):
        self.jobids = jobids
    def show(self):
        print self.jobids

class Something:
    def __init__ (self):
        self.jobids = [1,2,3]
    def do_something(self):
        y = CancelJobsOperation(self.jobids)
        del self.jobids[0]
        y.show()

s = Something()
s.do_something()

prints
[2, 3]

Can you look at this at clarify it? thanks

> [Hmm, looking at CancelJobsOperation.cancelJob_error() it seems it could handle
> this situation (job canceled during confirmation dialog) better -- looks like
> it will just stop at that point, but it would be better if it carried on
> cancelling the other jobs in the list.]    
Yes, in case of deep copy it would be good to cancel the remaining jobs.

Comment 11 Tim Waugh 2010-06-10 14:01:57 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > I'm fairly sure that instantiating CancelJobsOperation this way creates a 'deep
> > copy' of the list, i.e. we end up with two separate Python list objects.
> Hmm, my python is not so sure about that.

Aha, good catch!  So I guess the fix here will be to change it to:

class CancelJobsOperation:
    def __init__ (self, jobids):
        self.jobids = list (jobids)

i.e. explicitly create a new list from the one we're passed.

I wonder if there are any other places where I've got that sort of thing wrong...

Comment 12 Jiri Popelka 2010-06-10 17:49:48 UTC
Thanks.
Another try :-) is commited upstream.

Comment 13 Fedora Update System 2010-06-28 16:32:11 UTC
system-config-printer-1.2.3-3.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/system-config-printer-1.2.3-3.fc13

Comment 14 Fedora Update System 2010-06-28 23:37:50 UTC
system-config-printer-1.2.3-3.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update system-config-printer'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/system-config-printer-1.2.3-3.fc13

Comment 15 Fedora Update System 2010-07-01 18:37:54 UTC
system-config-printer-1.2.3-3.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.