Hi Redhat- I posted the following to the comp.os.linux.development.system newsgroup several days ago but haven't heard anything back. So I thought that reporting it here was appropriate. ================================ POST ================================== I have been trying to (unsuccessfully) implement user level (server side) TCP handling. I am using ipchains to DENY the port that I am using for the server (selected from an unused number greater than 50000) to prevent the normal kernel handling from seeing incoming packets and am using AF_PACKET sockets to get underneath the kernel firewall -- this seems to work [thanks Andi K]. Although not shown in the snippet below, all errors are checked, etc. short snippet: (on kernel 2.2.14) ---------------------------------- fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_IP)); // ETH_P_ALL does same ... struct sockaddr sa; n_recv = recvfrom(fd, mes, mes_sz, 0, &sa, &sa_sz); // fine, no ll-header // sa is in the link layer family: struct sockaddr_ll *sa_ll = (struct sockaddr_ll *) &sa; sa_ll.sll_pkttype = PACKET_OUTGOING; // flip the sock to outgoing (necessary?) // same problems even if you don't do this // I have verified the sa struct, it is in right family, has ARPHDR_ETHER for // sll_hatype, etc. Do I have to do my own ARP here? I tried putting a perm. // arp entry for the destination IP, makes no difference. What are the arp requirements // for using AF_PACKET, SOCK_DGRAM anyway? // next form a IP/TCP packet in mes2 (this is correct afaik, and verified several ways) ... // send response back to the interface from which it came (makes sense ?): n_sent = sendto(fd, mes2, mes2_sz, 0, &sa, sa_sz); // sendto succeeds, it goes on the wire but the destination ethernet address as // viewed using ethereal is WRONG (and seems to be stomped, see below). ----------------------------------- The last two bytes of the ethernet destination address are suspiciously "45 00" which happen to be the first two bytes of the IP packet. The beginning bytes are the correct for the destination, however. Is the link-layer header length wrong? Ethereal (and tcpdump) show the IP packet as being correct. The same code does not work through the 'lo' interface either (where the Ethernet II frame is shown with all 0's via Ethereal). Am I doing something wrong with this? Do I need to ARP? Is there a kernel upgrade that I am missing? Any suggestions or ideas on what might be wrong, or alternative ways of implementing user level protocols, welcome! thanks, dave
It is not possible to do anything about your report without a full reproducable test case. You show a tiny code snippet, nothing more, and with only this I can't reproduce myself the problem you are reporting and thus cannot verify it still exists nor fix it.
In the course of preparing an explicit and simplifed test case showing the problem, I did the following. I manually set the sll_addr[] fields for the sendto call and this works...even though sll_halen is specified as 0? In trying to get at the heart of the matter, this raises the following thoughts: 1) I know that AF_PACKET sockets sit below the IP fragmentation and IP firewall mechanisms. This is from private conversations, it is not clear in the man page for packet(7) at all. 2) I _thought_ that a SOCK_DGRAM packet socket sat above the ARP/RARP mechanisms, but perhaps that is a wrong assumption. I assumed that packets sent to an ethernet interface, for example, would create an ARP by the system (if needed) and that the system would fill in the link-layer stuff. This seems to be partially true in that the first 4 parts of the ethernet address are filled in correctly with the destination address but the last two seemed 'stomped' by the beginning of the IP packet. but perhaps this is a fluke (e.g. data left over from previous/other sends/recvs)? 3) the use of sockaddr_ll for sendto's on packet sockets is not clear (packet (7)). while bind on such sockets states that only sll_protocol and sll_ifindex are used, what about a sendto call? If I have to ARP myself and figure all this out, then there seems to be no semantic differences between a socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_IP)) and a socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP) call? Thanks a lot, Dave
OK, more information ... I am now doing my own ARP and transferring the results into the sockaddr_ll structure used for the sendto ... this works and seems to imply that AF_PACKET sockets sit even below the system's ARP mechanism. On the downside is that transfers through the LO interface still do not work ... it seems that correct the sll_addr is all 0's for such transfers but these packets are dropped when incoming? So, how should the link-layer structure be set for the LO interface? -dave
This is turning into a "how do I program using AF_PACKET sockets" and not a bug report, therefore I am closing it as bugzilla is not the appropriate place for this.
Well not to beat a dead horse, but I think I see the problem more clearly now (which I do believe is a bug). Producing a complete example is a bit involved, but I am certainly willing to do that if you want to re-open this (perhaps I could give you a simple receive version and just let you wait for a stray IP packet). I suspect that there are not too many folks using packet level sockets. Here is another snippet (without error checks, etc.) of the problem though: ============================================================================= unsigned char buf[1024]; int sock; struct sockaddr sa; int sa_sz, n_recv; struct sockaddr_ll *sa_ll = (struct sockaddr_ll *) &sa; sock = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_IP)); ... // set up sa as a link-layer sa_ll->sll_family = AF_PACKET; sa_ll->sll_ifindex = <SYSTEM DEPENDENT ... might be 2 for ETHERNET>; sa_ll->sll_hatype = ARPHDR_ETHER; // different for other ifaces sa_ll->sll_halen = <6 for ethernet, but other values possible> // use ARP to figure out sa_ll->addr ... sa_sz = sizeof(sa); // happens to be 16, before call // man/doc says that a struct sockaddr * should // be used ... n_recv = recvfrom(sock, buf, 1024, 0, &sa, &sa_sz); // after call, sa_sz is now 20! (same as the size of // struct sockaddr_ll which is what it is returning) ... recvfrom // violated input size spec of 16? causing parts of the memory stomp. // man page for recvfrom says that argument 6 of recvfrom is // initialized with the size of arg 5. ============================================================================= ---also the last two bytes of the hardware address are stomped with the beginning part of the data (which for IP packets will be "45 00") and the data comes back correctly. In other circumstances, the data returned is corrupted as well. If recvfrom complained that sa_sz was too small (each sockaddr type may have its own size) then this would be better behavior, but the data being passed in is valid and I think that recvfrom is supposed to work for any valid sockaddr type along with its contents are correct. Similar issues arise for sendto's. The workaround (in the case of link-layer sockets) is to copy the 'sa' structure into a temporary of struct sockaddr_ll size and use that size for the sa_sz as well. The last bytes of the temporary struct (the end of the hardware address) will get stomped but afaik nothing else bad is happening (but more testing is needed). -thanks again dave