Bug 166353

Summary: gcc4 & -Os break netlink.
Product: [Fedora] Fedora Reporter: Bill Crawford <billcrawford1970>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: davej, davem, mtakahashi, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 4.0.1-11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-27 08:12:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/etc/sysconfig/network
none
ifcfg-eth0
none
ifcfg-lo
none
ifcfg-lo:1 none

Description Bill Crawford 2005-08-19 17:26:13 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b3) Gecko/20050818 Fedora/1.1-0.2.7.deerpark.alpha2.1 Firefox/1.0+

Description of problem:
After booting this new kernel, network service fails to start properly. In particular, although an IP address is being obtained using DHCP, no default route is being set, and during initscripts sequence, I'm seeing messages saying something like RTNETLINK invalid parameter.
All works fine in 1492_FC5.
Attempting to "service network restart" gives similar output.


Version-Release number of selected component (if applicable):
kernel-2.6.12-1.1499_FC5

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.
  

Additional info:

Aug 19 15:52:47 pikachu dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Aug 19 15:52:47 pikachu dhclient: DHCPACK from 10.152.63.254
Aug 19 15:52:47 pikachu NET[3256]: /sbin/dhclient-script : updated /etc/resolv.conf
Aug 19 15:52:47 pikachu dhclient: bound to 80.5.156.222 -- renewal in 59429 seconds.
Aug 19 15:52:49 pikachu ntpd[2035]: sendto(66.187.224.4): Network is unreachable
Aug 19 15:53:54 pikachu ntpd[2035]: sendto(66.187.224.4): Network is unreachable

Comment 1 Sammy 2005-08-19 18:04:59 UTC
Confirmed! 

Comment 2 David Miller 2005-08-19 23:21:29 UTC
Please identify the exact program and command line which runs when
you encounter these RTNETLINK errors.

I'll note in passing that when I use system-config-network, give it
a static IP configuration, then later click on the DHCP check-box
to switch from static to dynamic IP, it still installs the static
route after getting the IP address from DHCP.  I wonder if that's
related.

Please also note if any other packages got updated, besides your kernel,
since the last successful boot.  This could be a boot scripts bug.


Comment 3 Bill Crawford 2005-08-19 23:27:01 UTC
When rebooting back into the previous kernel, everything works fine again.
I'll attach /etc/sysconfig/network*
The messages appear as "/etc/init.d/network" is being run. I already had a "file
exists" error due to routes being added twice for lo/lo:0 (I have an alias on
the loopback so I have a hostname with address 127.0.0.2 ... long story, but it
stops the hostname changing every time I connect to my cable inet).
When booting with the new kernel, two new messages appear, both saying something
like "invalid argument" (they don't appear in the dmesg output, alas).
I'll try booting it again tomorrow and try to write the output down ...


Comment 4 Bill Crawford 2005-08-19 23:31:00 UTC
Created attachment 117931 [details]
/etc/sysconfig/network

Comment 5 Bill Crawford 2005-08-19 23:35:35 UTC
Created attachment 117932 [details]
ifcfg-eth0

Comment 6 Bill Crawford 2005-08-19 23:36:17 UTC
Created attachment 117933 [details]
ifcfg-lo

Comment 7 Bill Crawford 2005-08-19 23:36:52 UTC
Created attachment 117934 [details]
ifcfg-lo:1

Comment 8 Bill Crawford 2005-08-19 23:39:43 UTC
Anyway, basically the problem is that no default route is getting set, nor one
of the static routes being sent back by DHCP, thus I don't get and DNS lookups,
squid fails to start and a whole snowball of problems :)

It should end up like this:
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
80.5.156.0      *               255.255.255.0   U     0      0        0 eth0
169.254.0.0     *               255.255.0.0     U     0      0        0 eth0
default         cpc3-rdng2-6-0- 0.0.0.0         UG    0      0        0 eth0

But with the new kernel, I only get the first of those, and the two other routes
are rejected. So I'm fairly confident the commands being run (and failing) are
those attempts to set those routes.


Comment 9 Bill Crawford 2005-08-20 16:26:05 UTC
Working fine for me with kernel 1502.

Sammy, can you confirm?

Comment 10 Dave Jones 2005-08-20 18:13:09 UTC
This is caused by gcc4 builds done with -Os.  1502 disables that, and goes back
to -O2.  Unfortunatly, this is such a large body of code, that tracking down the
exact code that gcc is miscompiling will be a nightmare.

Jakub, the 1499 kernel was the first one to have -Os. 1502 turned it back off.

Davem, any ideas to narrow down which object files could be miscompiled ?


Comment 11 Jakub Jelinek 2005-08-20 18:33:30 UTC
You can do a binary search to find which .o file breaks if compiled with -Os
but not with -O2.
Even inside of one .o file you can do a binary search, by compiling it once with
-Os and once with -O2 into assembly and then weld parts of the assembly together
(which is more work, but doable, have done it several times).
Alternatively you can start from the oops backtrace or whatever other info you
have about the broken kernel and limit the binary search only to files that have
anything to do with it.
Just for completeness, if something works with -O2 and not -Os, it doesn't mean
it must be a GCC bug.

Comment 12 Sammy 2005-08-22 13:21:21 UTC
Yes, working fine with 1504! 

Comment 13 David Miller 2005-08-22 17:13:25 UTC
I think there are 4 files you can use to track this down:

net/core/rtnetlink.c
net/netlink/af_netlink.c
net/ipv4/fib_hash.c
net/ipv4/fib_frontend.c

That should cover all of the code paths for adding a route via
netlink.


Comment 14 Jakub Jelinek 2005-08-24 18:27:38 UTC
You're right, kernel compiled with -Os except net/ipv4/fib_frontend.c, which
is compiled with -O2, works, while if net/ipv4/fib_frontend.c is also compiled
with -Os, the no default route problem appears.

Comment 15 Jakub Jelinek 2005-08-24 22:01:23 UTC
Particularly the inet_check_attr routine.  If the whole kernel but this routine
is compiled with -Os, it works.

Comment 16 Jakub Jelinek 2005-08-24 22:54:44 UTC
Distilled testcase (suitable for gcc.c-torture/execute/) that fails with -m32
-Os:
struct rtattr
{
  unsigned short rta_len;
  unsigned short rta_type;
};

__attribute__ ((noinline))
int inet_check_attr (void *r, struct rtattr **rta)
{
  int i;

  for (i = 1; i <= 14; i++)
    {
      struct rtattr *attr = rta[i - 1];
      if (attr)
        {
          if (attr->rta_len - sizeof (struct rtattr) < 4)
            return -22;
          if (i != 9 && i != 8)
            rta[i - 1] = attr + 1;
        }
    }
  return 0;
}

extern void abort (void);

int
main (void)
{
  struct rtattr rt[2];
  struct rtattr *rta[14];
  int i;

  rt[0].rta_len = sizeof (struct rtattr) + 8;
  rt[0].rta_type = 0;
  rt[1] = rt[0];
  for (i = 0; i < 14; i++)
    rta[i] = &rt[0];
  if (inet_check_attr (0, rta) != 0)
    abort ();
  for (i = 0; i < 14; i++)
    if (rta[i] != &rt[i != 7 && i != 8])
      abort ();
  for (i = 0; i < 14; i++)
    rta[i] = &rt[0];
  rta[1] = 0;
  rt[1].rta_len -= 8;
  rta[5] = &rt[1];
  if (inet_check_attr (0, rta) != -22)
    abort ();
  for (i = 0; i < 14; i++)
    if (i == 1 && rta[i] != 0)
      abort ();
    else if (i != 1 && i <= 5 && rta[i] != &rt[1])
      abort ();
    else if (i > 5 && rta[i] != &rt[0])
      abort ();
  return 0;
}


Comment 17 MASAO TAKAHASHI 2005-08-25 06:47:24 UTC
(In reply to comment #16)
> Distilled testcase (suitable for gcc.c-torture/execute/) that fails with -m32
> -Os:
> struct rtattr
> {
>   unsigned short rta_len;
>   unsigned short rta_type;
> };
> 
> __attribute__ ((noinline))
> int inet_check_attr (void *r, struct rtattr **rta)
> {
>   int i;
> 
>   for (i = 1; i <= 14; i++)
>     {
>       struct rtattr *attr = rta[i - 1];
>       if (attr)
>         {
>           if (attr->rta_len - sizeof (struct rtattr) < 4)
>             return -22;
>           if (i != 9 && i != 8)
>             rta[i - 1] = attr + 1;
>         }
>     }
>   return 0;
> }
> 
> extern void abort (void);
> 
> int
> main (void)
> {
>   struct rtattr rt[2];
>   struct rtattr *rta[14];
>   int i;
> 
>   rt[0].rta_len = sizeof (struct rtattr) + 8;
>   rt[0].rta_type = 0;
>   rt[1] = rt[0];
>   for (i = 0; i < 14; i++)
>     rta[i] = &rt[0];
>   if (inet_check_attr (0, rta) != 0)
>     abort ();
>   for (i = 0; i < 14; i++)
>     if (rta[i] != &rt[i != 7 && i != 8])
>       abort ();
>   for (i = 0; i < 14; i++)
>     rta[i] = &rt[0];
>   rta[1] = 0;
>   rt[1].rta_len -= 8;
>   rta[5] = &rt[1];
>   if (inet_check_attr (0, rta) != -22)
>     abort ();
>   for (i = 0; i < 14; i++)
>     if (i == 1 && rta[i] != 0)
>       abort ();
>     else if (i != 1 && i <= 5 && rta[i] != &rt[1])
>       abort ();
>     else if (i > 5 && rta[i] != &rt[0])
>       abort ();
>   return 0;
> }
> 
I tested the above code.
Results
  (1) A change of a comparison code of inet_check_attr does not raise abort.
     if(i != 89 && !=8)
        rta[i - 1] = attr + 1;
       ----->
      if(i !=8){
         if(i != 9){
            rta[i-1] = attr + 1;
         }
       }
 (2) An insertion of printf("i = %d\n", i) into for loop does not also.
     if(i != 8 && i != 9   ){
        printf("i = %d\n", i);
        rta[i-1] = attr +1;
     }

 (3) A test of linux-2.6.12-1.1509_FC5
     I made a change listed above (1) .
     Then RTNETLINK errors do not occurred.


Comment 18 Jakub Jelinek 2005-08-27 08:12:38 UTC
Should be fixed in gcc-4.0.1-11.