Kernel Version: Linux version 2.4.21-27.0.4.ELsmp The problem happened with previous Kernel Versions. Problem: Large packets can not be written though TCP/IP Sockets over the LO after the machine has been running for awhile(usually about a day) The same packets/tasks work over eth0. It seems to be a general LO problem, but I see it show up most in apache and sendmail. I first noticed it with sendmail(> 40k). Larger emails were not going through. They would get stuck in /var/spool/clientmqueue/. Restarting LO will not make them go through. When I reboot the computer they send. I then noticed that one of my PHP Bindings was having trouble sending large amounts of data through LO (It works fine when apache is accessed via eth0) So I created a sample PHP binding to test the problem. I discovered that 15784 bytes seemed to be the magical number on how much I could write() to a socket. Once I try to write more then 15785 it does not make it, it seems to go to the bit bucket. I monitored the TCP/IP traffic the tcpdump and confiremd it never sent the data. I wrote a simple C http client to test accessing the special web page to make sure that the data wasn't being eaten by a rouge web client (It wasn't). Sample PHP Binding: #define SPEW_SIZE 15784 ZEND_FUNCTION(my_spew) { char crud[SPEW_SIZE]; int i; long result; for(i=0;i<SPEW_SIZE;i++) crud[i] = 'A'; crud[SPEW_SIZE-1] = 0; result = php_write(crud, SPEW_SIZE); RETURN_LONG(result); return; } ========================================================================= To prove it's to LO and not apache or sendmail I wrote a simple C socket server and client: Here's the Server Part.. Nothing to write home about... static int my_receive(int fd) { char buf[1024]; int res; res = recv(fd,buf,1023,0); if(res == 0) return -1; if(res > 0) { buf[res] = 0; printf("%s", buf); } return 1; } static int create_server(long port) { int fd; struct sockaddr_in server_addr; int true = 1; fd = socket(AF_INET, SOCK_STREAM, 0); if(fd < 0) { fprintf(stderr, "CAN NOT OPEN SOCKET"); fd = -1; return -1; } if(setsockopt(fd , SOL_SOCKET, SO_REUSEADDR, (void *) &true, sizeof(true)) == -1) { shutdown(fd, SHUT_RDWR); close(fd); fd = -1; return -1; } /* bind server port */ server_addr.sin_family = AF_INET; server_addr.sin_addr.s_addr = htonl(INADDR_ANY); server_addr.sin_port = htons(port); if(bind(fd, (struct sockaddr *) &server_addr, sizeof(server_addr)) <0 ) { fprintf(stderr, "CAN NOT BIND TO PORT!\n"); shutdown(fd, SHUT_RDWR); fd = -1; return -1; } listen(fd,2); fprintf(stderr, "SUCCESSFULLY LISTENING ON PORT %ld FD=%d\n", port, fd); return fd; } int main(int argc, char **argv) { int fd = -1; int client = -1; long port; struct sockaddr_in client_addr; unsigned int tmp = 0; if(argc <= 1) { fprintf(stderr, "give it a port\n"); exit(1); } port = atoi(argv[1]); while(1) { if(fd < 0) { fd = create_server(port); if(fd < 0) sleep(5); } if(fd > 0 && client < 0) { bzero(&client_addr, sizeof(client_addr)); client = accept(fd, (struct sockaddr *) &client_addr, &tmp); if(client > 0) { fprintf(stderr, "WE HAVE A CLIENT\n"); fcntl(client, F_SETFL, FNDELAY); } } if(client > 0) { if(my_receive(client) == -1) { close(client); client = -1; } } } return 0; } Then I wrote a simple client to conntect to the server and send X amount of bytes. It turns out that when the server is acting up I can't send any more then 15984 bytes through the LO. If I reboot the server I am able to send more then 15984 bytes though (for about a day) I also confirmed that NONE of the 15984 bytes come though at all (Blocking is turned off) I'd be happy to help further track this problem down but I lack the kernel debugging skills required.
RHEL3 is now closed.
Does this mean you will not fix the bug?
Most likely, that's correct. You might want to try the RHEL3 U8 beta kernel when it shows up in the RHN beta channels (probably in a couple of weeks) to see whether the problem still occurs, but it doesn't sound familiar (as something that might have gotten fixed). You might also want to try RHEL4.
I really don't want to have to upgrade all my servers because of a silly LO bug. Who do I need to contact in order to get someone to take a look at this?
I suggest starting with Customer Support, so that they can open an Issue Tracker ticket. Please let them know that you already created a Bugzilla report directly (so they can link the two together).
I'm pretty sure other customers have this problem. It may have been reported as clientmqueue problems with sendmail or "Connection timed out with [127.0.0.1]" Look At Bug 154802 for example. It's might be the same thing as this.
Closing re: comment #7. P.