Bug 705301

Summary: nash, libdhcp/libnl interaction: "ERROR: Interface setup failed: pumpSetupInterface failed: get link - 19: No such device."
Product: Red Hat Enterprise Linux 5 Reporter: Konstantin Khorenko <khorenko>
Component: libdhcpAssignee: David Cantrell <dcantrell>
Status: CLOSED DUPLICATE QA Contact: Release Test Team <release-test-team>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.6CC: atodorov
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-19 18:49:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 726828    
Attachments:
Description Flags
make libdhcp use rtnl_link_get_by_number() instread of rtnl_link_get()
none
add rtnl_get_link_by_number() function to libnl none

Description Konstantin Khorenko 2011-05-17 10:18:51 UTC
Created attachment 499296 [details]
make libdhcp use rtnl_link_get_by_number() instread of rtnl_link_get()

Problem description:
One of our customers faced a problem that turned out to be a bug in "libdhcp" and "libnl" libraries interaction.

In brief: nic_get_links() function from "libdhcp" library uses rtnl_link_get() function from "libnl" library in order to get interfaces configuration from the kernel via netlink.
Doing this nic_get_links() assumes that network interfaces have strictly consequent indexes (which is incorrect in general).

static int nic_get_links(NLH_t nh, char *if_name, int if_index) {
...
    nitems = nl_cache_nitems(cache);

    while (i <= nitems) {
        if ((rlink = rtnl_link_get(cache, i)) == NULL) {
            eprintf(NIC_FATAL, "%s (%d): %s\n",
            ...
        }
...
        rtnl_link_put(rlink);
        i++;
...
    }


nic_get_links() calls rtnl_link_get(cache, i) - it tries to get i-th interface, while rtnl_link_get() searches not for i-th interface but for network interface with index == i.

struct rtnl_link *rtnl_link_get(struct nl_cache *cache, int ifindex)
{
...
        nl_list_for_each_entry(link, &cache->c_items, ce_list) {
                if (link->l_index == ifindex) {
                        nl_object_get((struct nl_object *) link);
                        return link;
                }
        }

        return NULL;
}

In case network interfaces have indexes with "holes" - this code does not work.

In our particular case the node could not boot from an iSCSI device, because "nash" inside initrd image failed to configure network interfaces due to this issue (Parallels Virtuozzo Containers kernel virtualizes network interfaces => their indexes may be non-consequent).

But this issue can be easily reproducible on a stock RHEL5 kernel - even on an already booted and fully operational node:

# Initially we see the network interface indexes are sequential:
[root@dhcp-10-30-20-94 ~]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:1c:42:42:44:47 brd ff:ff:ff:ff:ff:ff
    inet 10.30.20.94/16 brd 10.30.255.255 scope global eth0
    inet6 2a00:1b48:1003:30:21c:42ff:fe42:4447/64 scope global dynamic 
       valid_lft 2592000sec preferred_lft 604800sec
    inet6 fe80::21c:42ff:fe42:4447/64 scope link 
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop 
    link/sit 0.0.0.0 brd 0.0.0.0

# Let's make them non-sequential:
[root@dhcp-10-30-20-94 ~]# vconfig add eth0 10
...
[root@dhcp-10-30-20-94 ~]# vconfig add eth0 11
...
[root@dhcp-10-30-20-94 ~]# vconfig rem eth0.10
...

[root@dhcp-10-30-20-94 ~]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:1c:42:42:44:47 brd ff:ff:ff:ff:ff:ff
    inet 10.30.20.94/16 brd 10.30.255.255 scope global eth0
    inet6 2a00:1b48:1003:30:21c:42ff:fe42:4447/64 scope global dynamic 
       valid_lft 2591992sec preferred_lft 604792sec
    inet6 fe80::21c:42ff:fe42:4447/64 scope link 
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop 
    link/sit 0.0.0.0 brd 0.0.0.0
5: eth0.11@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop 
    link/ether 00:1c:42:42:44:47 brd ff:ff:ff:ff:ff:ff

# And here is the command sequence that fails:
[root@dhcp-10-30-20-94 ~]# nash
(running in test mode).
Red Hat nash version 5.1.19.6 starting
netname 00:1c:42:42:44:47 eth0.11
network --device eth0.11 --bootproto static --ip 10.0.0.2 --netmask 255.255.255.0
<<< press Ctrl-d here >>>
ERROR: Interface setup failed: pumpSetupInterface failed: get link - 19: No such device.


1) i did this test on the following node:
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
2.6.18-194.el5 x86_64 kernel
nash-5.1.19.6-61

But originally the issue was found on a RHEL5.6 => nash-5.1.19.6-68.1, libnl-1.0-0.10.pre5.5, libdhcp-1.20-10.el5 are also affected.
("nash" uses "libdhcp" and "libnl" but compiled statically => it should be also considered)

2) the example above shows that "nash" is affected, but in fact any tool that uses "libdhcp"/"libnl" will behave the same way.

3) how to solve this:
it seems the simplest way here is to add another function to "libnl" which will return i-th interface configuration and make nic_get_links() function from "libdhcp" library use it instead of rtnl_link_get().
Attaching patches with the implementation.

Hope that helps.

--
Best regards,

Konstantin Khorenko,
PVCfL/OpenVZ developer,
Parallels

Comment 1 Konstantin Khorenko 2011-05-17 10:19:51 UTC
Created attachment 499297 [details]
add rtnl_get_link_by_number() function to libnl

Comment 2 RHEL Program Management 2011-08-05 12:36:05 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 David Cantrell 2011-08-19 18:49:09 UTC

*** This bug has been marked as a duplicate of bug 599633 ***