Bug 1186815 - Make camel_stream_write() try to write all bytes at once
Summary: Make camel_stream_write() try to write all bytes at once
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: evolution-data-server
Version: 21
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Milan Crha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-28 15:14 UTC by Davide Repetto
Modified: 2015-06-18 13:23 UTC (History)
5 users (show)

Fixed In Version: evolution-data-server-3.12.11-4.fc21
Clone Of:
Environment:
Last Closed: 2015-02-09 12:04:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 749292 0 None None None Never

Description Davide Repetto 2015-01-28 15:14:07 UTC
Description of problem:
=======================
Sometimes evolution fails silently to deliver a message.
I have a few hundred customers using evolution and about one year ago some of them started lamenting that occasionally their correspondents didn't receive their messages. I tried and tried to understand why, but I was never able to replicate the problem until I decided to dedicate a few days to stress test evolution directly on some customer machines and surely enough, this time I was able to replicate... Finally I got a cue.
Apparently evolution can occasionally give up a mid transmission and still put the message in the sent folder, living the user none the wiser.
Though a clue of what happens can be found in the console output of evolution, which spits out some camel-WARNING about "failure without setting" GError.

Version-Release number of selected component (if applicable):
=============================================================
evolution-3.12.10-1.fc21 (both x86_64 and i686)

How reproducible:
=================
I found that I could reproduce the problem most consistently by sending a 3-4 megs message on a network connection with reduced tcp windows.
So this is a two fold problem. One because the failure is happening, and one because it is silent.

Steps to Reproduce:
===================
1. Set net.ipv4.tcp_wmem="4096 6144 8192"
2. Set net.ipv4.tcp_rmem="4096 8192 16384"
3. Se a 3/4MB message

Note that it begins to be much more difficult to reproduce the problem with tcp_wmem values of over 18432.

Actual results:
===============
Delivery should silently fail most of the time.

Expected results:
=================
Regular delivery or at least an error prompt.

Additional info:
================
The console output of Evolution contains some clues:


[giorgio@giorgio ~]$ evolution

(evolution:2181): Gtk-WARNING **: Theme parsing error: gtk.css:37:3: 'g' is not a valid property name

(evolution:2181): Gtk-WARNING **: Failed to register client: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files
openjdk version "1.8.0_25"
OpenJDK Runtime Environment (build 1.8.0_25-b18)
OpenJDK Server VM (build 25.25-b02, mixed mode)

(evolution:2181): GLib-GObject-CRITICAL **: g_closure_unref: assertion 'closure->ref_count > 0' failed

(evolution:2181): camel-WARNING **: CamelStreamFilter::write() reported failure without setting its GError

(evolution:2181): camel-WARNING **: CamelDataWrapper::write_to_stream_sync() reported failure without setting its GError

(evolution:2181): camel-WARNING **: CamelMimePart::write_to_stream_sync() reported failure without setting its GError

(evolution:2181): camel-WARNING **: CamelMultipart::write_to_stream_sync() reported failure without setting its GError

(evolution:2181): camel-WARNING **: CamelMimeMessage::write_to_stream_sync() reported failure without setting its GError

(evolution:2181): camel-WARNING **: CamelSmtpTransport::send_to_sync() reported failure without setting its GError

Comment 1 Milan Crha 2015-02-04 15:18:53 UTC
(In reply to Davide Repetto from comment #0)
> Steps to Reproduce:
> ===================
> 1. Set net.ipv4.tcp_wmem="4096 6144 8192"
> 2. Set net.ipv4.tcp_rmem="4096 8192 16384"

Thanks for a bug report and a way to reproduce it. I'd like to try it, but I'm unsure how to do that, specifically the above two steps, the same as how to revert the change back to "normal". Could you help me with that, please?

Comment 2 Davide Repetto 2015-02-05 09:36:27 UTC
Hi Milan,
 thanks for your reply.

you can verify the current values with:
=======================================
[root@dave ~]# sysctl net.ipv4.tcp_wmem
net.ipv4.tcp_wmem = 4096	16384	4194304   # this is an example output
[root@dave ~]# sysctl net.ipv4.tcp_rmem
net.ipv4.tcp_rmem = 4096	87380	6291456

you can change them with:
=========================
[root@dave ~]# sysctl net.ipv4.tcp_wmem="4096 6144 8192"
net.ipv4.tcp_wmem = 4096 6144 8192
[root@dave ~]# sysctl net.ipv4.tcp_rmem="4096 8192 16384"
net.ipv4.tcp_rmem = 4096 8192 16384

and revert them back with:
==========================
[root@dave ~]# sysctl net.ipv4.tcp_wmem="4096 16384 4194304"
net.ipv4.tcp_wmem = 4096 16384 4194304
[root@dave ~]# sysctl net.ipv4.tcp_rmem="4096 87380 6291456"
net.ipv4.tcp_rmem = 4096 87380 6291456

All changes are ephemeral and won't survive a reboot.
Also these commands work well within VMs, but not inside containers. So if you're using containers, consider that they inherit the TCP window settings from the host and that it is usually forbidden to change it from within the container.

Comment 3 Davide Repetto 2015-02-05 09:41:56 UTC
P.S. If needed I can provide a VM or remote access to it.

Comment 4 Milan Crha 2015-02-09 12:04:31 UTC
Thanks for the guide. I managed to reproduce it too and found the cause. The problem was that the CamelStreamFilter::write() expected the underlying stream to write all bytes it asked it to write, but that failed at about 60% of the message write for me, it wrote only 3744 bytes, while requested was 4149 bytes. That's all valid (to not write all what it was asked to write), but the code didn't count with it. I checked also other usages of this API and there are more places which can fail in a similar way, thus I changed the core of the function to behave differently, rather than to take care of this in all places which call this API.

Created commit bae0c64 in eds master (3.13.90+) [1]

[1] https://git.gnome.org/browse/evolution-data-server/commit/?id=bae0c64

Comment 5 Davide Repetto 2015-02-09 15:36:18 UTC
Got any idea when it should trickle down in fedora?

Comment 6 Milan Crha 2015-02-09 16:44:54 UTC
The change missed the final 3.12.11 release today. The 3.13.90 will be released next Monday, when it will also reach rawhide. That means it's ready for the upcoming Fedora 22.

I built a scratch evolution-data-server with that fix included for Fedora 21 for you at [1]. The build will be deleted automatically within the next few days.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=8874733

Comment 7 Davide Repetto 2015-02-09 18:00:27 UTC
Thank you very much for the ad-hoc build.
It's much appreciated.

Comment 8 Milan Crha 2015-05-29 06:21:02 UTC
I decided to create a correct update of the evolution-data-server for Fedora 21, thus all users can benefit from it.

Comment 9 Fedora Update System 2015-05-29 07:13:30 UTC
evolution-data-server-3.12.11-4.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/evolution-data-server-3.12.11-4.fc21

Comment 10 Fedora Update System 2015-06-18 13:23:44 UTC
evolution-data-server-3.12.11-4.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.