From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b3) Gecko/20050818 Fedora/1.1-0.2.7.deerpark.alpha2.1 Firefox/1.0+ Description of problem: After booting this new kernel, network service fails to start properly. In particular, although an IP address is being obtained using DHCP, no default route is being set, and during initscripts sequence, I'm seeing messages saying something like RTNETLINK invalid parameter. All works fine in 1492_FC5. Attempting to "service network restart" gives similar output. Version-Release number of selected component (if applicable): kernel-2.6.12-1.1499_FC5 How reproducible: Always Steps to Reproduce: 1. 2. 3. Additional info: Aug 19 15:52:47 pikachu dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67 Aug 19 15:52:47 pikachu dhclient: DHCPACK from 10.152.63.254 Aug 19 15:52:47 pikachu NET[3256]: /sbin/dhclient-script : updated /etc/resolv.conf Aug 19 15:52:47 pikachu dhclient: bound to 80.5.156.222 -- renewal in 59429 seconds. Aug 19 15:52:49 pikachu ntpd[2035]: sendto(66.187.224.4): Network is unreachable Aug 19 15:53:54 pikachu ntpd[2035]: sendto(66.187.224.4): Network is unreachable
Confirmed!
Please identify the exact program and command line which runs when you encounter these RTNETLINK errors. I'll note in passing that when I use system-config-network, give it a static IP configuration, then later click on the DHCP check-box to switch from static to dynamic IP, it still installs the static route after getting the IP address from DHCP. I wonder if that's related. Please also note if any other packages got updated, besides your kernel, since the last successful boot. This could be a boot scripts bug.
When rebooting back into the previous kernel, everything works fine again. I'll attach /etc/sysconfig/network* The messages appear as "/etc/init.d/network" is being run. I already had a "file exists" error due to routes being added twice for lo/lo:0 (I have an alias on the loopback so I have a hostname with address 127.0.0.2 ... long story, but it stops the hostname changing every time I connect to my cable inet). When booting with the new kernel, two new messages appear, both saying something like "invalid argument" (they don't appear in the dmesg output, alas). I'll try booting it again tomorrow and try to write the output down ...
Created attachment 117931 [details] /etc/sysconfig/network
Created attachment 117932 [details] ifcfg-eth0
Created attachment 117933 [details] ifcfg-lo
Created attachment 117934 [details] ifcfg-lo:1
Anyway, basically the problem is that no default route is getting set, nor one of the static routes being sent back by DHCP, thus I don't get and DNS lookups, squid fails to start and a whole snowball of problems :) It should end up like this: Destination Gateway Genmask Flags Metric Ref Use Iface 80.5.156.0 * 255.255.255.0 U 0 0 0 eth0 169.254.0.0 * 255.255.0.0 U 0 0 0 eth0 default cpc3-rdng2-6-0- 0.0.0.0 UG 0 0 0 eth0 But with the new kernel, I only get the first of those, and the two other routes are rejected. So I'm fairly confident the commands being run (and failing) are those attempts to set those routes.
Working fine for me with kernel 1502. Sammy, can you confirm?
This is caused by gcc4 builds done with -Os. 1502 disables that, and goes back to -O2. Unfortunatly, this is such a large body of code, that tracking down the exact code that gcc is miscompiling will be a nightmare. Jakub, the 1499 kernel was the first one to have -Os. 1502 turned it back off. Davem, any ideas to narrow down which object files could be miscompiled ?
You can do a binary search to find which .o file breaks if compiled with -Os but not with -O2. Even inside of one .o file you can do a binary search, by compiling it once with -Os and once with -O2 into assembly and then weld parts of the assembly together (which is more work, but doable, have done it several times). Alternatively you can start from the oops backtrace or whatever other info you have about the broken kernel and limit the binary search only to files that have anything to do with it. Just for completeness, if something works with -O2 and not -Os, it doesn't mean it must be a GCC bug.
Yes, working fine with 1504!
I think there are 4 files you can use to track this down: net/core/rtnetlink.c net/netlink/af_netlink.c net/ipv4/fib_hash.c net/ipv4/fib_frontend.c That should cover all of the code paths for adding a route via netlink.
You're right, kernel compiled with -Os except net/ipv4/fib_frontend.c, which is compiled with -O2, works, while if net/ipv4/fib_frontend.c is also compiled with -Os, the no default route problem appears.
Particularly the inet_check_attr routine. If the whole kernel but this routine is compiled with -Os, it works.
Distilled testcase (suitable for gcc.c-torture/execute/) that fails with -m32 -Os: struct rtattr { unsigned short rta_len; unsigned short rta_type; }; __attribute__ ((noinline)) int inet_check_attr (void *r, struct rtattr **rta) { int i; for (i = 1; i <= 14; i++) { struct rtattr *attr = rta[i - 1]; if (attr) { if (attr->rta_len - sizeof (struct rtattr) < 4) return -22; if (i != 9 && i != 8) rta[i - 1] = attr + 1; } } return 0; } extern void abort (void); int main (void) { struct rtattr rt[2]; struct rtattr *rta[14]; int i; rt[0].rta_len = sizeof (struct rtattr) + 8; rt[0].rta_type = 0; rt[1] = rt[0]; for (i = 0; i < 14; i++) rta[i] = &rt[0]; if (inet_check_attr (0, rta) != 0) abort (); for (i = 0; i < 14; i++) if (rta[i] != &rt[i != 7 && i != 8]) abort (); for (i = 0; i < 14; i++) rta[i] = &rt[0]; rta[1] = 0; rt[1].rta_len -= 8; rta[5] = &rt[1]; if (inet_check_attr (0, rta) != -22) abort (); for (i = 0; i < 14; i++) if (i == 1 && rta[i] != 0) abort (); else if (i != 1 && i <= 5 && rta[i] != &rt[1]) abort (); else if (i > 5 && rta[i] != &rt[0]) abort (); return 0; }
(In reply to comment #16) > Distilled testcase (suitable for gcc.c-torture/execute/) that fails with -m32 > -Os: > struct rtattr > { > unsigned short rta_len; > unsigned short rta_type; > }; > > __attribute__ ((noinline)) > int inet_check_attr (void *r, struct rtattr **rta) > { > int i; > > for (i = 1; i <= 14; i++) > { > struct rtattr *attr = rta[i - 1]; > if (attr) > { > if (attr->rta_len - sizeof (struct rtattr) < 4) > return -22; > if (i != 9 && i != 8) > rta[i - 1] = attr + 1; > } > } > return 0; > } > > extern void abort (void); > > int > main (void) > { > struct rtattr rt[2]; > struct rtattr *rta[14]; > int i; > > rt[0].rta_len = sizeof (struct rtattr) + 8; > rt[0].rta_type = 0; > rt[1] = rt[0]; > for (i = 0; i < 14; i++) > rta[i] = &rt[0]; > if (inet_check_attr (0, rta) != 0) > abort (); > for (i = 0; i < 14; i++) > if (rta[i] != &rt[i != 7 && i != 8]) > abort (); > for (i = 0; i < 14; i++) > rta[i] = &rt[0]; > rta[1] = 0; > rt[1].rta_len -= 8; > rta[5] = &rt[1]; > if (inet_check_attr (0, rta) != -22) > abort (); > for (i = 0; i < 14; i++) > if (i == 1 && rta[i] != 0) > abort (); > else if (i != 1 && i <= 5 && rta[i] != &rt[1]) > abort (); > else if (i > 5 && rta[i] != &rt[0]) > abort (); > return 0; > } > I tested the above code. Results (1) A change of a comparison code of inet_check_attr does not raise abort. if(i != 89 && !=8) rta[i - 1] = attr + 1; -----> if(i !=8){ if(i != 9){ rta[i-1] = attr + 1; } } (2) An insertion of printf("i = %d\n", i) into for loop does not also. if(i != 8 && i != 9 ){ printf("i = %d\n", i); rta[i-1] = attr +1; } (3) A test of linux-2.6.12-1.1509_FC5 I made a change listed above (1) . Then RTNETLINK errors do not occurred.
Should be fixed in gcc-4.0.1-11.