111697 – multicast sockets fail with ENODEV if drivers are removed then re-installed

Bug 111697 - multicast sockets fail with ENODEV if drivers are removed then re-installed

Summary: multicast sockets fail with ENODEV if drivers are removed then re-installed

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	David Miller
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-12-08 22:14 UTC by Albert Chu
Modified:	2007-11-30 22:06 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-03 01:13:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Albert Chu 2003-12-08 22:14:53 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2)
Gecko/20030708

Description of problem:
If you set a socket to a particular interface using the setsockopt
IP_MULTICAST_IF option, bring down that interface with ifconfig,
remove that interface's device driver with rmmod, re-install a driver
(new or the same driver) for that interface with insmod, and the
interface comes back up with the same IP address, writing to the
socket will fail with ENODEV.

Version-Release number of selected component (if applicable):
kernel-2.4.21-4.0.1EL

How reproducible:
Always

Steps to Reproduce:
1.  Run my code (Be sure to edit host IP address appropriately for a
multicast interface on your machine)
2.  Wait a little bit, then cat "output.txt" to see that the code is
working fine.
3.  Use ifconfig to bring down every network interface.
4.  Use rmmod to remove all network device drivers.
5.  Wait a little bit, then cat "output.txt" to see that write() fails
with EINVAL.
6.  Use insmod to add back all network device drivers (I assume this
will always bring up all appropriate network interfaces with the same
IP adddresses as before.)
7.  Wait a little bit, then cat "output.txt" to see that the write()
fails with ENODEV.



Actual Results:  In output.txt:

rv = 1, errno = 0, str = Success
rv = 1, errno = 0, str = Success
rv = 1, errno = 0, str = Success
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 19, str = No such device
rv = -1, errno = 19, str = No such device
rv = -1, errno = 19, str = No such device



Expected Results:  After the network device drivers and network
interfaces are brought
back up, writes through the socket should continue to work.  In other
words, we should see:

rv = 1, errno = 0, str = Success

in output.txt

Additional info:

Here is my console output (with duplicate output data removed for
clarity):

tdev2|/tmp 3>gcc reproducer.c
tdev2|/tmp 4>./a.out &
[1] 16119
tdev2|/tmp 5>cat output.txt
rv = 1, errno = 0, str = Success
rv = 1, errno = 0, str = Success
tdev2|/tmp 6>ifconfig eth0 down; ifconfig eth1 down
tdev2|/tmp 7>rmmod e1000
tdev2|/tmp 9>cat output.txt
rv = 1, errno = 0, str = Success
rv = 1, errno = 0, str = Success
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 22, str = Invalid argument
tdev2|/tmp 10>insmod e1000
Using
/lib/modules/2.4.21-ia64-test/kernel/drivers/net/e1000/e10Intel(R)
PRO/1000 Network Driver - version 5.1.11-k1
C00.o
opyright (c) 1999-2003 Intel Corporation.
PCI: Found IRQ 51 for device 01:00.0
eth0: Intel(R) PRO/1000 Network Connection
PCI: Found IRQ 53 for device 06:01.0
eth1: Intel(R) PRO/1000 Network Connection
PCI: Found IRQ 54 for device 06:01.1
eth2: Intel(R) PRO/1000 Network Connection
tdev2|/tmp 11>e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
e1000: eth1 NIC Link is Up 100 Mbps Full Duplex
arping(16251): unaligned access to 0x60000fffffffbf15,
ip=0xe000000004754850
arping(16258): unaligned access to 0x60000fffffffbf15,
ip=0xe000000004754850

tdev2|/tmp 11>arping(16312): unaligned access to 0x60000fffffffbf15,
ip=0xe000000004754850
arping(16313): unaligned access to 0x60000fffffffbf15,
ip=0xe000000004754850

tdev2|/tmp 11>cat output.txt
rv = 1, errno = 0, str = Success
rv = 1, errno = 0, str = Success
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 22, str = Invalid argument
rv = -1, errno = 19, str = No such device
rv = -1, errno = 19, str = No such device
rv = -1, errno = 19, str = No such device

Comment 1 Albert Chu 2003-12-08 22:18:04 UTC

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <netdb.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
#include <errno.h>

#define PERROR(x)   do { perror(x); exit(1); } while (0)

int main() {
  int output; 
  int sock;
  int len, rv;
  struct sockaddr_in src, dest;

  /* Reproducer brings down all interfaces, so I assume you can only
   * run this in the console.  So I write data to a file rather
   * than stderr/stdout
   */
  output = open("output.txt", O_CREAT | O_WRONLY | O_APPEND, S_IRUSR |
S_IWUSR);
  if (output < 0)
    PERROR("open");

  sock = socket(AF_INET, SOCK_DGRAM, 0);
  if (sock < 0)
    PERROR("socket");

  /* Change this to a local multicast interface IP on your machine */
  rv = inet_pton(AF_INET, "192.168.20.3", &src.sin_addr); 
  if (rv <= 0)
    PERROR("inet_pton");

  rv = setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, 
                  (const void *)&src.sin_addr, sizeof(struct in_addr));
  if (rv < 0)
    PERROR("setsockopt");

  /* Make up some multicast destination address */
  memset(&dest, '\0', sizeof(dest));
  dest.sin_family = AF_INET;
  dest.sin_port = htons(9000);  
  rv = inet_pton(AF_INET, "239.2.11.72", &dest.sin_addr); 
  if (rv <= 0)
    PERROR("inet_pton");

  rv = connect(sock, (struct sockaddr *)&dest, sizeof(dest));
  if (rv < 0)
    PERROR("connect");
  
  /* Loop forever, always trying to write to this socket */
  while (1) {
    char data = 0;
    char buffer[1000];

    errno = 0;                  /* just in case */
   
    rv = write(sock, &data, 1);
    len = sprintf(buffer, "rv = %d, errno = %d, str = %s\n",
                  rv, errno, strerror(errno));
    write(output, buffer, len); /* i assume this always works */
    sleep(3);                   /* sleep a bit  */
  }
}

Comment 2 David Miller 2003-12-08 22:20:06 UTC

U1 should have a fix for this bug.

Comment 3 Ben Woodard 2003-12-08 22:39:21 UTC

We just retested this with 2.4.21-5EL and it has the same problem.

Comment 4 Bernd Schmidt 2004-02-26 01:28:55 UTC

I asked davem to comment on whether this is a reasonable request, and
he told me to contact David Stevens at IBM.  Here's what he had to say:

      It is a judgement call, since I don't there is any established
practice-- you can't remove an interface on 4.3BSD systems. But I
would have to say, "no", the program is not reasonable. Here's why:

      The bug poster seems to have the idea that multicast group
membership is associated with an IP address and should therefore
be there in the later instance of the interface. But, even on BSD
systems, and certainly on Linux systems, if you join a group on an
interface, then delete the IP address and re-add that IP address on
a different interface, none of the group memberships move with the
IP address. Group membership is associated with the logical device,
and that's particularly clear when using ip_mreqn or IPv6, which
specify an interface by index, not address. You don't even have to
have an IP address to join a group on an interface.
      There isn't any practical way to support what they're after,
because you have cases like: two addresses, IP1 & IP2, on eth0, and
you delete them and then add IP1 to eth1 and IP2 to eth2. I'm guessing
he'd want the groups joined via IP1 to go to eth1 and the groups
joined via IP2 to go to eth2, but there is no context like that
saved, and where would groups joined by index go to?
      I think the way to think of it is that removing the module
logically removes the interface, which is what the groups are
associated with. At that point, they are gone, and even if the
same physical interface is re-added with the same addresses, it
still has a different interface index and is logically not the same
interface as the groups were joined on.
      The reason it doesn't work in Linux is because each device
has its list of group memberships and that is destroyed when the
device is unregistered. The new device, when registered, will have
no memberships until new group joins are done.
      I don't see this as a bug, and I don't believe any other OS
supports anything like it. If they're for a high-availability failover
mechanism, I think they want group member ship in a "parent" logical
device with child physical devices that can come and go. I thought
that's how ethernet bonding works, though I really know nothing about
it. With something like that, the physical devices can come and go
but as long as the parent device doesn't, the multicast group
memberships aren't affected.

Comment 5 Albert Chu 2004-03-01 21:50:54 UTC

Hello, 

Based on your points, I agree that this may be correct behavior. 
However, are the errno values correct??  When I first saw this
problem, errno == EINVAL suggested to me that bringing the network
device back up would "fix" the invalid argument.  

Based on the statements by Dave Stevens, it would seem that errno
should not equal EINVAL at any point in time.  That ENODEV should be
returned after you rmmod the NIC driver.

Al

Comment 6 Bernd Schmidt 2004-03-22 20:52:29 UTC

Created attachment 98755 [details]
A patch to change the returned error

This patch changes the returned EINVAL to ENODEV.  davem, any comments on
whether this is the right thing to do or not?

Comment 7 David Miller 2004-03-23 02:28:37 UTC

This code is checking to see if an ipv4 address is local
to the system.  ENODEV is quite an odd error code to return
for that.

EINVAL is a perfectly fine error return, I see no reason to
change it.

Note You need to log in before you can comment on or make changes to this bug.