1111600 – [IMAPx] Indefinite waiting for message download

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1111600 - [IMAPx] Indefinite waiting for message download

Summary: [IMAPx] Indefinite waiting for message download

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	evolution-data-server
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Matthew Barnes
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-20 13:47 UTC by Lubomir Rintel
Modified:	2015-11-19 07:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:	evolution-data-server-3.12.11-14.el7
Doc Type:	Bug Fix
Doc Text:	Cause: message download, or other IMAP account operations, could starve in a queue of jobs to be done Consequence: certain operations could cause other jobs to starve Fix: make sure the job queue is properly cleaned up when a job is finished, either properly or with an error Result: jobs do not starve in a job queue
Clone Of:
Clones:	1440734 (view as bug list)
Environment:
Last Closed:	2015-11-19 07:56:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
imapx log (202.22 KB, text/plain) 2014-07-01 09:42 UTC, Lubomir Rintel	no flags	Details
proposed eds patch (1.68 KB, patch) 2014-07-14 14:03 UTC, Milan Crha	no flags	Details \| Diff
Screencast (306.83 KB, video/webm) 2014-07-15 09:51 UTC, Lubomir Rintel	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNOME Bugzilla	720197	0	None	None	None	Never
Red Hat Product Errata	RHBA-2015:2226	0	normal	SHIPPED_LIVE	evolution bug fix and enhancement update	2015-11-19 08:37:43 UTC

Description Lubomir Rintel 2014-06-20 13:47:53 UTC

Description of problem:

With the 7.0 GA version of evolution, when clicking to a message there's a very high chance (around 50%) that it won't reload. The status bar indicates "Retrieving message" task being perpetually restarted.

Version-Release number of selected component (if applicable):

evolution-data-server-3.8.5-24.el7.x86_64

How reproducible:

It happens to me very often after I just click the message title to load it.

Additional info:

When I comment out evolution-data-server-3.8.5-imapx-check-cancelled-jobs.patch the issue goes away.

Comment 2 Milan Crha 2014-06-24 12:30:36 UTC

Thanks for a bug report. I tried to reproduce this, but I'm not able to. I can click between messages and they are loaded as expected. It sometimes takes longer, when there is a slower response from the server when re-logging in due to previous operation cancel, but nothing else apart of it.

Could you describe in more detail what you do, what you see and what evolution does, please? I would like to see a backtrace of the evolution in that "perpetually restarted" Retrieving message option. You can grab the backtrace with this command:
   $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).

It would also worth to catch the IMAPx communication log, to see what it does. it will print everything between the client and the server, thus make sure you'll not expose anything private (including actual message content). The command for this log is:
   $ CAMEL_DEBUG=imapx:io,imapx:conman evolution &>log.txt

Comment 3 Lubomir Rintel 2014-07-01 09:42:33 UTC

Created attachment 913716 [details]
imapx log

Enable logging and the problem goes away! Sorry for the delay :)

I managed to get the attached log today. It seems that the cancel & reload loops is not indefinite, but not only goes away when I switch to different message, but apparently when I switch windows in GNOME too.

That's why I'm not attaching backtrace. I've added a launcher to the favourites bar that gets the backtrace, so I hope I'll catch it red handed next time.

This is the message that gets reloaded:

Desktop♥ awk '/Message-ID:/ {m=$0} /was cancelled/ {print m; print}' lala.txt 
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:A] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:C] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:B] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:D] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:E] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:F] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:G] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:H] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:I] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:J] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:K] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:L] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:M] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:N] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:O] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:P] Data read failed with error 'Error fetching message: Operation was cancelled'
Message-ID: <bug-1110911-265412-qmFPuWJD3Q.com>
[imapx:Q] Data read failed with error 'Error fetching message: Operation was cancelled'
Desktop♥

Comment 4 Milan Crha 2014-07-01 10:22:39 UTC

Aha, the operation is cancelled for some reason. Once you get to the state this loop, please attach gdb to the running evolution and set a breakpoint to imapx_job_cancelled_cb(), then get a backtrace from this place. it might show where it was cancelled (not necessarily why, but it'll give at least some clue). The gdb command might look like this:
   $ gdb evolution `pidof evolution` --ex "b imapx_job_cancelled_cb" --ex c \
     --ex bt --ex c --ex bt --ex c --ex bt --ex "d br 1" --ex c --ex q

Comment 5 Lubomir Rintel 2014-07-07 13:55:27 UTC

When the problem occurs, the imapx_job_cancelled_cb() breakpoint is not being reached.

Comment 6 Lubomir Rintel 2014-07-07 14:10:39 UTC

I've observed that more unexpected cancels happen: when I try to search the messages, the search gets terminated immediately and "Generating message list (cancelled)" flashes in the status bar for a second or so.

Comment 7 Lubomir Rintel 2014-07-10 15:12:13 UTC

I was wrong, or something was wrong with my gdb. However, I managed to break on the reload and get a meaningful backtrace:

#0  imapx_job_cancelled_cb (cancellable=0xea02a0, job=0x7fffb0029210) at camel-imapx-job.c:60
#1  0x00007fffee845af8 in g_closure_invoke (closure=0x7fff982da360, return_value=return_value@entry=0x0, n_param_values=1, param_values=param_values@entry=0x7fffffffd430,
    invocation_hint=invocation_hint@entry=0x7fffffffd3d0) at gclosure.c:777
#2  0x00007fffee8571ad in signal_emit_unlocked_R (node=node@entry=0x643920, detail=detail@entry=0, instance=instance@entry=0xea02a0, emission_return=emission_return@entry=0x0,
    instance_and_params=instance_and_params@entry=0x7fffffffd430) at gsignal.c:3584
#3  0x00007fffee85ef8a in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffffffd5b0) at gsignal.c:3328
#4  0x00007fffee85f23f in g_signal_emit (instance=instance@entry=0xea02a0, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3384
#5  0x00007fffeeabaa35 in g_cancellable_cancel (cancellable=0xea02a0) at gcancellable.c:507
#6  0x00007fffc350cdcd in mail_reader_message_selected_cb (reader=0xdb79c0, message_uid=<optimized out>) at e-mail-reader.c:3010
#7  0x00007fffee845af8 in g_closure_invoke (closure=0x8333f0, return_value=return_value@entry=0x0, n_param_values=2, param_values=param_values@entry=0x7fffffffd860,
    invocation_hint=invocation_hint@entry=0x7fffffffd800) at gclosure.c:777
#8  0x00007fffee8571ad in signal_emit_unlocked_R (node=node@entry=0x109df00, detail=detail@entry=0, instance=instance@entry=0x10ac290, emission_return=emission_return@entry=0x0,
    instance_and_params=instance_and_params@entry=0x7fffffffd860) at gsignal.c:3584
#9  0x00007fffee85ef8a in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffffffd9f0) at gsignal.c:3328
#10 0x00007fffee85f23f in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3384
#11 0x00007fffc353c46e in message_list_select_uid (message_list=<optimized out>, uid=uid@entry=0x15e0540 "80653", with_fallback=<optimized out>) at message-list.c:756
#12 0x00007fffc353ef4e in regen_list_done (m=0x168c4b0) at message-list.c:4917
#13 0x00007fffd10dc7a1 in mail_msg_idle_cb () at mail-mt.c:399
#14 0x00007fffee555ac6 in g_main_dispatch (context=0x63ef30) at gmain.c:3058
#15 g_main_context_dispatch (context=context@entry=0x63ef30) at gmain.c:3634
#16 0x00007fffee555e48 in g_main_context_iterate (context=0x63ef30, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3705
#17 0x00007fffee55625a in g_main_loop_run (loop=0xd80d00) at gmain.c:3899
#18 0x00007ffff0a2b0ed in gtk_main () at gtkmain.c:1156
#19 0x0000000000403933 in main (argc=1, argv=0x7fffffffde78) at main.c:707

This is essentially repeated.

Comment 8 Lubomir Rintel 2014-07-10 15:19:21 UTC

https://mail.gnome.org/archives/commits-list/2009-September/msg10221.html

I suspect that the condition added there might be wrong?

We want to emit the signal if the node pointed to the cursor is _not_ the same; if we do emit it for the same message, it will just get cancelled and reloaded repeatedly until it's fast enough to finish?

Comment 9 Milan Crha 2014-07-14 13:43:29 UTC

I still cannot reproduce this that easily as you can, but I think I know what's going on here. My way of a reproducer, even not 100% is to select a message which is not loaded yet, and, before it is loaded, double-click it to be opened in a new window, then quickly move to yet another message, and then another, and so on.

The later cancel, the one initiated by the change you found and noted in comment #8 (which is correct), makes the IMAPx think that there was done a failure which deserves disconnect and reconnect. Unfortunately, this disconnect also cancels all pending jobs in imapx on this connection, thus the cancel of the following message cancels the load of the previous message and the view ends with "Retrieving message 'x'" is the preview panel with no actual retrieval in the background, neither in the status bar for me.

I suppose this is what you meant in comment #0.

Comment 10 Milan Crha 2014-07-14 14:03:16 UTC

Created attachment 917876 [details]
proposed eds patch

for evolution-data-server;

Could you try with this patch, please? Apply it on top of the official eds, aka do not forget to include the imapx-check-cancelled-jobs patch too.

Comment 11 Lubomir Rintel 2014-07-14 14:35:54 UTC

Applied the patch. The issue still occurs.

Comment 12 Milan Crha 2014-07-15 05:16:07 UTC

Thanks. Maybe I just do not fully follow the issue. Could you provide some sort of screen cast, thus I see what you are talking about, please? (I'm sorry, but I feel confused.) Also, with the most recent code, could you install debuginfo package for the evolution-data-server and grab a backtrace of running evolution in that state, with a gdb command:
   $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).

Comment 13 Lubomir Rintel 2014-07-15 09:51:33 UTC

Created attachment 918090 [details]
Screencast

Attaching the screencast

Comment 14 Milan Crha 2014-07-16 08:44:01 UTC

Hmm, nice, I never saw that before.

Comment 15 Milan Crha 2014-07-16 12:06:58 UTC

I tried to reproduce it under GNOME Classic, which I suppose you have, according to the screen cast, but still no luck here. Could you capture a set of backtraces which would map what evolution does in say two seconds of this "restart message retrieval" misbehaviour, please? I hope it'll provide some information leading to a "why does it do that" answer, but I'm not completely sure of it. The command to get the backtrace may look like this:
    $ for i in {10..30}; do gdb --batch --ex "t a a bt" \
      -pid=`pidof evolution` &>bt$i.txt; sleep 0.1; done
which should be run when evolution "gets crazy". As always, please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).

Comment 17 Milan Crha 2015-05-04 16:46:24 UTC

It will be good to retest with evolution 3.12.11-1, once it is available.

Comment 18 Milan Crha 2015-05-26 20:53:22 UTC

The rebase to 3.12.11 made this bug partly obsolete, but there are still cases where the jobs in the IAMPx queue can starve, as indicated upstream at bug:
https://bugzilla.gnome.org/show_bug.cgi?id=720197

I propose to use this bug report to backport changes (two commits) from the upstream bug report, because it didn't make it in the 3.12.11.

Comment 20 Milan Crha 2015-06-01 19:49:51 UTC

I'm adding also upstream changes from bug [1], which could seem similar, but the backtrace showed a deadlock.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=750146

Comment 23 errata-xmlrpc 2015-11-19 07:56:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2226.html

Note You need to log in before you can comment on or make changes to this bug.