From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1 Description of problem: mails larger than 100kb or so are getting stuck at DATA and are sitting idly in /var/spool/clientmqueue but smaller size emails are being sent without probs. /v ar/log/maillog shows a bunch of timeouts connecting to localhost but the daemon is definitely running on localhost on port 25. iptables is completely down. i'm not exactly sure what has changed with the introduction of the smmsp aside from sendmail running setgid instead of setuid root but i never had any probs with 8. 11 or earlier with the default config. i've compiled 8.13 on this same box and h ave run into the same prob... ugh. in desperation i upgraded to the 2.4.21-27.0. 2 kernel and the clientmqueue flushed upon rebooting but now they're starting to choke again. any help would be sincerely appreciated. Version-Release number of selected component (if applicable): sendmail-8.12.11-4.RHEL3.1 How reproducible: Always Steps to Reproduce: 1. mail -s test < vacuum.log Actual Results: 1. message gets stuck in /var/spool/clientmqueue 2. /var/log/maillog shows timeouts with 127.0.0.1 Expected Results: mail should be sent successfully Additional info: from /var/log/maillog: Apr 14 00:33:58 testmachine sm-msp-queue[11610]: j3AGmhm2026980: to=postgres, de lay=3+14:35:15, xdelay=00:00:00, mailer=relay, pri=8149088, relay=[127.0.0.1], d sn=4.0.0, stat=Deferred: Connection timed out with [127.0.0.1] Apr 14 00:33:58 testmachine sm-msp-queue[11610]: j3ACR729002195: to=sysadmin@dom ain.com,murray, ctladdr=postgres (503/503), delay=3+19:06:51, xdelay= 0:00:00, mailer=relay, pri=8627790, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection timed out with [127.0.0.1] Apr 14 00:33:58 testmachine sm-msp-queue[11610]: j39Cmfm2025894: to=postgres, de lay=4+18:35:17, xdelay=00:00:00, mailer=relay, pri=10558309, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection timed out with [127.0.0.1] Apr 14 00:33:58 testmachine sm-msp-queue[11610]: j398iVPv002644: to=sysadmin@dom ain.com,murray, ctladdr=postgres (503/503), delay=4+22:49:27, xdelay= 00:00:00, mailer=relay, pri=10947013, relay=[127.0.0.1], dsn=4.0.0, stat=Deferre d: Connection timed out with [127.0.0.1] [root@testmachine root]# sendmail -v -Ac -q Running /var/spool/clientmqueue/j3DMmnm2027088 (sequence 1 of 10) j3DMmnm2027088: locked Running /var/spool/clientmqueue/j3DIYtTH032078 (sequence 2 of 10) sysadmin,murray... Connecting to [127.0.0.1] via relay... 220 testmachine.domain.com ESMTP Sendmail 8.12.11/8.12.11; Thu, 14 Apr 2005 01:2 7:18 -0700 >>> EHLO testmachine.domain.com 250-testmachine.domain.com Hello localhost.localdomain [127.0.0.1], pleased to m eet you 250-ENHANCEDSTATUSCODES 250-PIPELINING 250-8BITMIME 250-SIZE 250-DSN 250-ETRN 250-AUTH GSSAPI 250-DELIVERBY 250 HELP >>> MAIL From:<postgres.com> SIZE=186113 AUTH=postgres@testma chine.domain.com 250 2.1.0 <postgres.com>... Sender ok >>> RCPT To:<sysadmin> >>> RCPT To:<murray> >>> DATA 250 2.1.5 <sysadmin>... Recipient ok 250 2.1.5 <murray>... Recipient ok 354 Enter mail, end with "." on a line by itself [root@testmachine root]# ps -ef | grep sendmail | grep -v grep root 11600 1 0 00:23 ? 00:00:00 sendmail: accepting connections smmsp 11609 1 0 00:23 ? 00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue root 11612 11600 0 00:23 ? 00:00:00 sendmail: j3E7Nw29011612 localho st.localdomain [127.0.0.1]: DATA root 15091 11600 0 01:01 ? 00:00:00 sendmail: j3E81oBR015091 localho st.localdomain [127.0.0.1]: DATA root 15095 11600 0 01:02 ? 00:00:00 sendmail: j3E822si015095 localho st.localdomain [127.0.0.1]: DATA smmsp 17360 11609 0 01:23 ? 00:00:00 sendmail: ./j3DMmnm2027088 [127. 0.0.1]: client DATA 354 root 17361 11600 0 01:23 ? 00:00:00 sendmail: j3E8Nw08017361 localho st.localdomain [127.0.0.1]: DATA root 17515 11600 0 01:27 ? 00:00:00 sendmail: j3E8RInh017515 localho st.localdomain [127.0.0.1]: DATA [root@testmachine root]# netstat -an | grep 25 tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN [root@testmachine root]# iptables -vnL Chain INPUT (policy ACCEPT 1791K packets, 395M bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 1858K packets, 435M bytes) pkts bytes target prot opt in out source destination [root@testmachine mail]# egrep -v "^#" /etc/mail/access localhost.localdomain RELAY localhost RELAY 127.0.0.1 RELAY [root@testmachine mail]# grep -i "max.*message.*size" /etc/mail/sendmail.cf # maximum message size #O MaxMessageSize=1000000 [root@testmachine mail]# grep -i "max.*message.*size" /etc/mail/submit.cf # maximum message size #O MaxMessageSize=1000000 [root@testmachine etc]# egrep "local|testmachine" /etc/hosts 127.0.0.1 localhost.localdomain localhost 10.0.2.23 testmachine.domain.com testmachine [root@testmachine etc]# egrep hosts /etc/nsswitch.conf hosts: files dns
oops, it's reproducible via: mail -s test me < vacuum.log
What are you trying to achieve with your changes?
Hi Thomas, i don't understand what you mean by "changes." the sendmail config is untouched, the cf files are unaltered since the installation. do you mean why don't i use 8.11 instead? sorry, i think my description was misleading. the compiled installation of 8.13 is still running on 1 of the servers but there are 8 other rhel_ws3 servers that are exhibiting this same problem, all with sendmail-8.12.11-4.RHEL3.1 and no installations of 8.13. are these the changes that you're referring to?
Ok, was there a version which was not affected?
yes, 8.11 had no probs sending larger msgs but my company prefers to use recent rpms to keep on top of security vulnerabilities/misc bugfixes. i'm quite surprised that nobody else is reporting similar problems so i wonder if it's a unique combination of things that these servers are doing. they are dual-cpu postgresql machines, typically with a load somewhere between 1-3. they're booting to 2.4.21-27.0.2.ELhugemem kernels and have 16gb ram. there aren't any other significant daemons running on these machines, just ntpd, sshd, kjournald, syslogd... we've discussed rolling back to 8.11 but are reluctant because there is no 8.11 RPM for rhel3 [was there one for rhel3_u1?] and also due to the following: http://www.securityfocus.com/bid/8641 http://www.securityfocus.com/bid/8649
Hi Thomas, i'm assuming that your answer is to revert to sendmail 8.11 since you stopped responding. Thanks for your time, I'll ask sendmail for help.
I tried to reproduce your problem, but failed. Are you sure that you have no network problems or problems with ipv6? Please use tcpdump and look at the differences between the 8.11 and 8.12 version.
after some more testing it looks like postgres, PGSTAT in particular, is causing some havoc on the loopback interface. PGSTAT creates UDP links from localhost to localhost to pass statistical data. stopping postgres and bouncing sendmail has flushed the clientmqueue on 2 machines. our redhat 7.3 machines that run postgres and sendmail don't exhibit this behavior tho. lemme know if there's any other helpful info i can submit. tcpdump with postgresql running: [root@testmachine root]# tcpdump -i lo tcpdump: listening on lo 17:16:40.783274 localhost.localdomain.32772 > localhost.localdomain.32772: udp 46 (DF) 17:16:40.784061 localhost.localdomain.32772 > localhost.localdomain.32772: udp 262 (DF) 17:16:40.794076 localhost.localdomain.32772 > localhost.localdomain.32772: udp 46 (DF) 17:16:40.794586 localhost.localdomain.32772 > localhost.localdomain.32772: udp 157 (DF) 17:16:40.795419 localhost.localdomain.32772 > localhost.localdomain.32772: udp 46 (DF) 17:16:40.795865 localhost.localdomain.32772 > localhost.localdomain.32772: udp 56 (DF) 17:16:40.796064 localhost.localdomain.32772 > localhost.localdomain.32772: udp 46 (DF) 17:16:40.796519 localhost.localdomain.32772 > localhost.localdomain.32772: udp 131 (DF)
I think this bug has to get assigned agains postgresql, it seems to be a logging problem.
Um ... I see no evidence here that connects postgres to the mail problem. I'm not saying that there couldn't be a connection, just that being on the same machine at the same time isn't enough connection to justify passing the bug my way ...
i switched to postfix over a year ago which got rid of this problem. feel free to close this bug.
I'm switching this back to the sendmail component, since I see no evidence that postgres did anything wrong, and closing as CANTFIX because we are unable to reproduce the problem.