| Summary: | Double ampersand in HTML portions of emails | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora EPEL | Reporter: | Matěj Cepl <mcepl> | ||||||||||||||
| Component: | rss2email | Assignee: | Orphan Owner <extras-orphan> | ||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||
| Priority: | unspecified | ||||||||||||||||
| Version: | el5 | CC: | bugs.michael, extras-orphan, lindsey.smith, mcepl, pertusus | ||||||||||||||
| Target Milestone: | --- | ||||||||||||||||
| Target Release: | --- | ||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||
| OS: | Unspecified | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2011-05-14 08:26:14 UTC | Type: | --- | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Attachments: |
|
||||||||||||||||
The original RSS 2.0 feed (http://crookedtimber.org/feed/) has this in the particular part of the text: Then time starts again, and 314 Dapper Men descend from the sky, like a Magritte painting. It is all very charming and surreal and doesn’t make any sense, except in an advantages and disadvantages of vermiculation for life, in a space-time worm sort of sense, sense. So, I guess the fault is 100% somewhere between rss2email and my email client. Created attachment 489606 [details]
testcase
Actually this is one more proof that the issue is in rss2email and not html2text (other one is HTML_MAIL=1 in my configuration). When running html2text on the attached HTML snippet I get correct text:
bradford:~ $ python /usr/lib/python2.7/site-packages/html2text.py <testcase.html
Then time starts again, and 314 Dapper Men descend from the sky, like a
Magritte painting. It is all very charming and surreal and doesn't make any
sense, except in an advantages and disadvantages of vermiculation for life, in
a space-time worm sort of sense, sense.
bradford:~ $
Created attachment 489707 [details] sample email message Still reproduceable with rss2email-2.70-2.el6.noarch, python-html2text-2.38-1.el6.noarch, and python-feedparser-4.1-10.el6.noarch on the entry for http://crookedtimber.org/2011/04/02/connick-v-thompson/ . feedparser gives (IMHO correct) content as >>> print f.entries[0]['content'] [{'base': 'http://crookedtimber.org/feed/', 'type': 'text/html', 'value': u'<p>J.K. Galbraith remarked that conservatism was engaged in a long search for a superior moral justification for selfishness. But that quest may sometimes become boring, or perhaps too difficult. Not to worry, because <a href="http://www.slate.com/id/2290036/">occasions to be straightforwardly vicious</a> are more easily found, if you have the taste for it. Its spiteful tone aside, in substance <em>Connick v. Thompson</em> seems to be a <a href="http://www.kieranhealy.org/blog/archives/2003/04/17/state-sponsored-terror/">Lord Denning Moment</a> for the U.S. Supreme Court. The conservative majority preferred to affirm an obvious wrong rather than face the <a href="http://en.wikipedia.org/wiki/Birmingham_Six#Charges_against_police_and_prison_officers">appalling vista</a> of a brutal and corrupt justice system. To be fair to the system, it’s worse than that. Once the initial wrongdoing came to trial a jury, the district court, and the 5th circuit (twice) all decided the other way. It’s only when we get to Thomas, Scalia, Roberts, Alito, and Kennedy that the system chose to <a href="http://prospect.org/cs/articles?article=the_impunity_of_the_roberts_court">further institutionalize prosecutorial immunity</a>. Stitch-ups should be seamless: if someone could pull at a stray thread, the whole thing might unravel, after all.</p>', 'language': None}] >>> (see "It’s"), but the message has (in the HTML part, thus I assume untouched by html2text) (line 104) it&’s (line 106) It&’s= Which seems to lead to the bug in rss2email itself Made a noise in the upstream fora http://www.allthingsrss.com/rss2email/2011/03/version-2-71-release-plus-other-major-updates/comment-page-1/#comment-7281 The same goes for this example (from http://notreligious.typepad.com/notreligious/2011/04/why-we-needed-rules-based-faith-and-why-we-need-to-move-past-it-charles-park.html). feedparser shows Harvey Cox laments how the Roman Empire co-opted Christianity and pushed the \u2018Age of Belief\u2019 onto us.\xa0 He looks email has Harvey= Cox laments how the Roman Empire co-opted Christianity and pushed the =E2= =80=98Age of Belief=E2=80=99 onto us.&  He looks Re: comment 3 Cannot reproduce with Fedora 14: $ rpm -q rss2email python-feedparser python-html2text rss2email-2.70-1.fc14.noarch python-feedparser-4.1-12.fc14.noarch python-html2text-2.38-2.1.noarch I also don't get two-part mails but just either text/html (for HTML_MAIL=1) or text/plain (for HTML_MAIL=0). They are not encoded quoted-printable. Only seldomly, but not for this example feed. (In reply to comment #6) > Re: comment 3 > Cannot reproduce with Fedora 14: With the configuration in comment 0? Created attachment 489791 [details]
not reproduced with F-14
Sure, same config. Output mail attached. What would I need to change to get a two-part MIME mail like you get? Can you rule out that none of your MTAs does that?
Created attachment 489803 [details] ~/.rss2email/config.py in question (In reply to comment #8) > Created attachment 489791 [details] > not reproduced with F-14 > > Sure, same config. Output mail attached. What would I need to change to get a > two-part MIME mail like you get? Can you rule out that none of your MTAs does > that? I have no idea, how would it be done otherwise. I always thought it is generated by rss2email. Well, there is luther.ceplovi.cz ... that's RHEL 6 with Postfix postfix-2.6.6-2.el6.i686 d1080.master.cz ... that's hosting's CentOS 5 with postfix-2.3.3-2.1.el5_2 smtp-out3.iol.cz ... most likely postfix as well at my ISP's site, but no clue about that (I don't have shell access there) antivir5.iol.cz ... ditto port3.iol.cz ... ditto and back to luther. d1080.master.cz postconf -n is: -bash-3.2$ /usr/sbin/postconf -n alias_database = hash:/etc/aliases alias_maps = hash:/etc/aliases command_directory = /usr/sbin config_directory = /etc/postfix daemon_directory = /usr/libexec/postfix debug_peer_level = 2 html_directory = no inet_interfaces = all local_recipient_maps = unix:passwd.byname $alias_maps mail_owner = postfix mailbox_command = /usr/bin/procmail -f- -a "$USER" mailbox_size_limit = 0 mailq_path = /usr/bin/mailq.postfix manpage_directory = /usr/share/man mydestination = $myhostname, localhost.$mydomain, localhost mydomain = p-lab.cz myhostname = d1080.master.cz myorigin = $myhostname newaliases_path = /usr/bin/newaliases.postfix queue_directory = /var/spool/postfix readme_directory = /usr/share/doc/postfix-2.3.3/README_FILES sample_directory = /usr/share/doc/postfix-2.3.3/samples sendmail_path = /usr/sbin/sendmail.postfix setgid_group = postdrop smtp_generic_maps = hash:/etc/mail/generic smtpd_recipient_restrictions = permit_sasl_authenticated, permit_mynetworks, reject_unauth_destination, reject_unlisted_recipient, reject_unverified_recipient smtpd_sasl_auth_enable = yes smtpd_sender_restrictions = permit_sasl_authenticated unknown_local_recipient_reject_code = 550 virtual_alias_domains = /etc/mail/local-host-names virtual_alias_maps = hash:/etc/mail/virtusertable -bash-3.2$ The same for luther.ceplovi.cz: bradford:tmp $ /usr/sbin/postconf -n alias_database = hash:/etc/aliases alias_maps = hash:/etc/aliases biff = no command_directory = /usr/sbin config_directory = /etc/postfix daemon_directory = /usr/libexec/postfix data_directory = /var/lib/postfix debug_peer_level = 4 default_destination_concurrency_limit = 200 default_destination_recipient_limit = 1000 html_directory = no inet_interfaces = loopback-only inet_protocols = ipv4 mail_owner = postfix mailq_path = /usr/bin/mailq.postfix manpage_directory = /usr/share/man mydestination = $myhostname, localhost.$mydomain, localhost newaliases_path = /usr/bin/newaliases.postfix queue_directory = /var/spool/postfix readme_directory = /usr/share/doc/postfix-2.8.2/README_FILES recipient_delimiter = + relayhost = smtp.o2isp.cz sample_directory = /usr/share/doc/postfix-2.8.2/samples sender_canonical_maps = hash:/etc/postfix/sender_canonical sender_dependent_relayhost_maps = hash:/etc/postfix/sender_relayhost sendmail_path = /usr/sbin/sendmail.postfix setgid_group = postdrop smtp_sasl_auth_enable = yes smtp_sasl_password_maps = hash:/etc/postfix/sasl_password smtp_sasl_security_options = transport_maps = hash:/etc/postfix/transport unknown_local_recipient_reject_code = 550 virtual_alias_maps = hash:/etc/postfix/virtual bradford:tmp $ Project maintainer here. I was unable to reproduce this with the latest build (v2.71) on Ubuntu using the provided config. Note that builds from the project site use feedparser v5.x which is vastly superior but new enough that distros may not have caught up with yet. Created attachment 489804 [details]
debug patch against rss2email 2.70
Attached patch for rss2email 2.70's /usr/share/rss2email/rss2email.py will print the message contenttype + content to stdout before passing it on to sendmail.
One of the attached example mails says
X-Mailer: Zarafa 7.0.0-24874
X-Original-To: matej
so clearly there is some forwarding and reprocessing involved.
I also have access to a RHEL5 box and I was unable to reproduce using v2.71 of the original project release. (In reply to comment #11) > One of the attached example mails says > X-Mailer: Zarafa 7.0.0-24874 > X-Original-To: matej Oh, right. I will check with Zarafa folks. I believe this was really a bug in Zarafa 7beta* and it has been fixed already in 7RC1. |
Created attachment 478449 [details] example email with double ampresands Description of problem: With this configuration DEFAULT_FROM="rss2email" FORCE_FROM = 1 HTML_MAIL=1 SMTP_SEND=0 TRUST_GUID = 1 DATE_HEADER=1 USE_PUBLISHER_EMAIL = 0 UNICODE_SNOB=1 I am getting emails with all ampresands in HTML portion of the message doubled: escend from the sky, like a Magritte painting. It is all very charming an= d surreal and doesn&’t make any sense, except in an advantages and = disadvantages of vermiculation for life, in a space-time worm sort of sen= which results in line like this in Thunderbird: descend from the sky, like a Magritte painting. It is all very charming and surreal and doesn&’t make any sense, except in an advantages and disadvantages of vermiculation for life, in a space-time worm sort of sense, Version-Release number of selected component (if applicable): rss2email-2.60-3.el5 python-feedparser-4.1-10.el5 python-html2text-2.26-2.el5 How reproducible: 100% Steps to Reproduce: 1.let rss2email work with the above configuration 2.read emails in HTML-aware email client 3. Actual results: all & doubled resulting in corrupted text in email client Expected results: the entity should be just ’ in the above example. Additional info: