Bug 223429

Summary: sieve can't process the folded Received header
Product: Red Hat Enterprise Linux 4 Reporter: ryo fujita <rfujita>
Component: cyrus-imapdAssignee: Tomas Janousek <tjanouse>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 4.4   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-02 04:23:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ryo fujita 2007-01-19 09:16:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; ja-jp) AppleWebKit/418.9.1 (KHTML, like Gecko) Safari/419.3

Description of problem:
I set a sieve script like below.
(header :regex :comparator "i;ascii-casemap" "Received" "\\(([^\\.]+\\.)?[0-9][^\\.]*\\.[^\\.]+\\.[^ ]+\
\.[a-z][^ ]* \\[[0-9\\.]+\\]\\) by mx[123]\\.redhat\\.com", true)
But sieve didn't place emails to the folder by this rule.

I guess the reason why sieve didn't work properly.
Firstly, Received header was folded with CRLF, leading spaces or tabs.

Received: from helo_host_name (FQDN [ip_addr])
___by mx1.redhat.com (8.12.11.20060308/8.12.1.......

Though, sieve seems not to consider this rule from RFC.

from line 656 in cyrus-imapd-2.2.12/sieve/bc_eval.c from cyrus-imapd-2.2.12-3.RHEL4.1.src.rpm

            for (y=0; val[y]!=NULL && !res; y++)
            {
                if  (match == B_COUNT) {
                    count++;
                } else {
                    /*search through all the data*/
                    currd=datai+2;
                    for (z=0; z<numdata && !res; z++)
                    {
                        const char *data_val;

                        currd = unwrap_string(bc, currd, &data_val, NULL);

                        if (isReg) {
                            reg= bc_compile_regex(data_val, ctag, errbuf,
                                                  sizeof(errbuf));
                            if (!reg)
                            {
                                /* Oops */
                                res=-1;
                                goto alldone;
                            }

                            res |= comp(val[y], (const char *)reg,
                                        comprock);
                            free(reg);
                        } else {
                            res |= comp(val[y], data_val, comprock);
                        }

And I found a function massage_header() (Is it a typo for message_header?) in line 4681 in cyrus-
imapd-2.2.12/imap/index.c.
It seems the function needed by sieve, isn't it?

static void massage_header(char *hdr)
{
    int n = 0;
    char *p, c;

    for (p = hdr; *p; p++) {
        if (*p == ' ' || *p == '\t' || *p == '\r') {
            if (!n || *(p+1) == '\n') {
                /* no leading or trailing whitespace */
                continue;
            }
            /* replace with space */
            c = ' ';
        }
        else if (*p == '\n') {
            if (*(p+1) == ' ' || *(p+1) == '\t') {
                /* folded header */
                continue;
            }
            /* end of header */
            break;
        }
        else
            c = *p;

        hdr[n++] = c;
    }
    hdr[n] = '\0';
}


Version-Release number of selected component (if applicable):
cyrus-imapd- 2.2.12-3.RHEL4.1.i386.rpm

How reproducible:
Always


Steps to Reproduce:
1.Set a script with sieveshell
2.
3.

Actual Results:
nothing

Expected Results:
Mails are placed to each folders by sieve rule.

Additional info:

Comment 1 Tomas Janousek 2007-01-30 14:08:31 UTC
Are you sure the problem is in sieve? I just tried the following on our poboxes:

(header :regex :comparator "i;ascii-casemap" "Received" "by
mx[123]gaga\\.redhat\\.com", true)

And the mail contained:
Received: from helo_host_name (FQDN [1.2.3.4])
	by mx1gaga.redhat.com (8.12.11.20060308/8.12.1)

And it did match.

I would recommend adding some whitespace catching regex or even .* in front of
"by mx" and trying again.

Comment 2 ryo fujita 2007-01-31 04:22:46 UTC
(In reply to comment #1)
> Are you sure the problem is in sieve? I just tried the following on our poboxes:

Sure.
I'd like to check BOTH FQDN and a hostname in "by" field.

For example,
Received: from dhcp123456.example.com (dhcp123456.example.com [1.2.3.4])
	by mx1.redhat.com (8.12.11.20060308/8.12.1)

This header means that a mail was received by external MTA from "dhcp-suspicious" host.
Most of hosts having strings like dhcp/adsl/dynamic/ppp in hostname is suspected of non-MTA.
If there are plural Received headers in a mail header, I need to check if the host in "by" field faces the 
Internet.
And if FQDN matches some rules, the mail may be a SPAM.

Reference:Selective smtp rejection
http://www.gabacho-net.jp/en/anti-spam/anti-spam-system.html

One more example,
Received: from outmx.example.com (outmx.example.com [1.2.3.4])
	by mx1.redhat.com (8.12.11.20060308/8.12.1)
Received: from dhcp-2-1-168-192.int.example.com (dhcp-2-1-168-192.int.example.com 
[192.168.1.2])
	by outmx.example.com (8.12.11.20060308/8.12.1)

The host dhcp-1-1-168-192.int.example.com may be internal relaying MTA.
If I can check only FQDN, I will drop this mail to Junk box because Received headers have "dhcp" string.
(In reply to comment #1)
> Are you sure the problem is in sieve? I just tried the following on our poboxes:
> 
> (header :regex :comparator "i;ascii-casemap" "Received" "by
> mx[123]gaga\\.redhat\\.com", true)
> 
> And the mail contained:
> Received: from helo_host_name (FQDN [1.2.3.4])
> 	by mx1gaga.redhat.com (8.12.11.20060308/8.12.1)
> 
> And it did match.
> 
> I would recommend adding some whitespace catching regex or even .* in front of
> "by mx" and trying again.

Comment 3 Tomas Janousek 2007-01-31 11:51:23 UTC
Umm, I wrote a bad example. I tried something like "\) *by
mx[123]\\.redhat\\.com" and yes, it did work. Please follow my advice of
replacing the space before "by" with a whitespace-eating pattern and let me know
if _that_ works.

Comment 4 ryo fujita 2007-02-02 04:23:38 UTC
Thank you for the important hint!
Characters exsisting between ")" and "by" are 0x0a and 0x09.
I had to use a regex like "\)[:blank:]*by" here.

(In reply to comment #3)
> Umm, I wrote a bad example. I tried something like "\) *by
> mx[123]\\.redhat\\.com" and yes, it did work. Please follow my advice of
> replacing the space before "by" with a whitespace-eating pattern and let me know
> if _that_ works.