191403 – Local Loopback starts misbehaving after awhile.

Bug 191403 - Local Loopback starts misbehaving after awhile.

Summary: Local Loopback starts misbehaving after awhile.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-05-11 17:36 UTC by Bob Doan
Modified:	2007-11-30 22:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-09-13 12:37:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Bob Doan 2006-05-11 17:36:56 UTC

Kernel Version: Linux version 2.4.21-27.0.4.ELsmp
The problem happened with previous Kernel Versions.

Problem: Large packets can not be written though TCP/IP Sockets over the LO
after the machine has been running for awhile(usually about a day)

The same packets/tasks work over eth0.

It seems to be a general LO problem, but I see it show up most in apache and
sendmail.

I first noticed it with sendmail(> 40k).  Larger emails were not going through.
 They would get stuck in /var/spool/clientmqueue/.  Restarting LO will not make
them go through.  When I reboot the computer they send.

I then noticed that one of my PHP Bindings was having trouble sending large
amounts of data through LO (It works fine when apache is accessed via eth0)

So I created a sample PHP binding to test the problem.

I discovered that 15784 bytes seemed to be the magical number on how much I
could write() to a socket.  Once I try to write more then 15785 it does not make
it, it seems to go to the bit bucket.  I monitored the TCP/IP traffic the
tcpdump and confiremd it never sent the data.  I wrote a simple C http client to
test accessing the special web page to make sure that the data wasn't being
eaten by a rouge web client (It wasn't).

Sample PHP Binding:

#define SPEW_SIZE 15784

ZEND_FUNCTION(my_spew) {
        char crud[SPEW_SIZE];
        int i;
        long result;

        for(i=0;i<SPEW_SIZE;i++)
                crud[i] = 'A';
        crud[SPEW_SIZE-1] = 0;

        result = php_write(crud, SPEW_SIZE);
        RETURN_LONG(result);
        return;
}

=========================================================================

To prove it's to LO and not apache or sendmail I wrote a simple C socket server
and client:

Here's the Server Part.. Nothing to write home about...


static int my_receive(int fd) {
	char buf[1024];
	int res;
	
	res = recv(fd,buf,1023,0);
	
	if(res == 0)
		return -1;
		
	if(res > 0) {
		buf[res] = 0;
		printf("%s", buf);
	}
	
	return 1;
}

static int create_server(long port) {
	int fd;
	struct sockaddr_in server_addr;
	int true = 1;
	
	fd = socket(AF_INET, SOCK_STREAM, 0);
	if(fd < 0) {
		fprintf(stderr, "CAN NOT OPEN SOCKET");
		fd = -1;
		return -1;
	}

	if(setsockopt(fd , SOL_SOCKET, SO_REUSEADDR, (void *) &true, sizeof(true)) == -1) {
		shutdown(fd, SHUT_RDWR);
		close(fd);
		fd = -1;
		return -1;
	}

  
  /* bind server port */
	server_addr.sin_family = AF_INET; 
	server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
	server_addr.sin_port = htons(port);
  
	if(bind(fd, (struct sockaddr *) &server_addr, sizeof(server_addr)) <0 ) {
		fprintf(stderr, "CAN NOT BIND TO PORT!\n");
		shutdown(fd, SHUT_RDWR);
		fd = -1;
		return -1;
	}

	listen(fd,2);
	fprintf(stderr, "SUCCESSFULLY LISTENING ON PORT %ld FD=%d\n", port, fd);
	return fd;
}

int main(int argc, char **argv) {
	int fd = -1;
	int client = -1;
	long port;
	struct sockaddr_in client_addr;
	unsigned int tmp = 0;
		
	if(argc <= 1) {
		fprintf(stderr, "give it a port\n");
		exit(1);
	}

	port = atoi(argv[1]);
	while(1) {
		if(fd < 0) {
			fd = create_server(port);
			if(fd < 0) 
				sleep(5);
		} 
		if(fd > 0 && client < 0) {
			bzero(&client_addr, sizeof(client_addr));
			client = accept(fd, (struct sockaddr *) &client_addr, &tmp);
			if(client > 0) {
fprintf(stderr, "WE HAVE A CLIENT\n");			
				fcntl(client, F_SETFL, FNDELAY);
			}
		}
		if(client > 0) {
			if(my_receive(client) == -1) {
				close(client);
				client = -1;
			}				
		}
	}
	
	return 0;
}

Then I wrote a simple client to conntect to the server and send X amount of
bytes.  It turns out that when the server is acting up I can't send any more
then 15984 bytes through the LO.  If I reboot the server I am able to send more
then 15984 bytes though (for about a day)  I also confirmed that NONE of the
15984 bytes come though at all (Blocking is turned off)

I'd be happy to help further track this problem down but I lack the kernel
debugging skills required.

Comment 1 Ernie Petrides 2006-05-11 19:00:04 UTC

RHEL3 is now closed.

Comment 2 Bob Doan 2006-05-11 19:07:46 UTC

Does this mean you will not fix the bug?

Comment 3 Ernie Petrides 2006-05-11 19:54:25 UTC

Most likely, that's correct.  You might want to try the RHEL3 U8 beta
kernel when it shows up in the RHN beta channels (probably in a couple
of weeks) to see whether the problem still occurs, but it doesn't sound
familiar (as something that might have gotten fixed).

You might also want to try RHEL4.

Comment 4 Bob Doan 2006-05-11 20:06:23 UTC

I really don't want to have to upgrade all my servers because of a silly LO bug.
 Who do I need to contact in order to get someone to take a look at this?

Comment 5 Ernie Petrides 2006-05-11 20:13:54 UTC

I suggest starting with Customer Support, so that they can open an
Issue Tracker ticket.  Please let them know that you already created
a Bugzilla report directly (so they can link the two together).

Comment 6 Bob Doan 2006-05-11 20:19:15 UTC

I'm pretty sure other customers have this problem.  It may have been reported as
clientmqueue problems with sendmail or "Connection
timed out with [127.0.0.1]"

Look At Bug 154802 for example.  It's might be the same thing as this.

Comment 8 Prarit Bhargava 2007-09-13 12:37:48 UTC

Closing re: comment #7.

P.

Note You need to log in before you can comment on or make changes to this bug.