Bug 705301 - nash, libdhcp/libnl interaction: "ERROR: Interface setup failed: pumpSetupInterface failed: get link - 19: No such device."
Summary: nash, libdhcp/libnl interaction: "ERROR: Interface setup failed: pumpSetupInt...
Keywords:
Status: CLOSED DUPLICATE of bug 599633
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libdhcp
Version: 5.6
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: David Cantrell
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 726828
TreeView+ depends on / blocked
 
Reported: 2011-05-17 10:18 UTC by Konstantin Khorenko
Modified: 2011-08-19 18:49 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-19 18:49:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
make libdhcp use rtnl_link_get_by_number() instread of rtnl_link_get() (515 bytes, patch)
2011-05-17 10:18 UTC, Konstantin Khorenko
no flags Details | Diff
add rtnl_get_link_by_number() function to libnl (1.64 KB, patch)
2011-05-17 10:19 UTC, Konstantin Khorenko
no flags Details | Diff

Description Konstantin Khorenko 2011-05-17 10:18:51 UTC
Created attachment 499296 [details]
make libdhcp use rtnl_link_get_by_number() instread of rtnl_link_get()

Problem description:
One of our customers faced a problem that turned out to be a bug in "libdhcp" and "libnl" libraries interaction.

In brief: nic_get_links() function from "libdhcp" library uses rtnl_link_get() function from "libnl" library in order to get interfaces configuration from the kernel via netlink.
Doing this nic_get_links() assumes that network interfaces have strictly consequent indexes (which is incorrect in general).

static int nic_get_links(NLH_t nh, char *if_name, int if_index) {
...
    nitems = nl_cache_nitems(cache);

    while (i <= nitems) {
        if ((rlink = rtnl_link_get(cache, i)) == NULL) {
            eprintf(NIC_FATAL, "%s (%d): %s\n",
            ...
        }
...
        rtnl_link_put(rlink);
        i++;
...
    }


nic_get_links() calls rtnl_link_get(cache, i) - it tries to get i-th interface, while rtnl_link_get() searches not for i-th interface but for network interface with index == i.

struct rtnl_link *rtnl_link_get(struct nl_cache *cache, int ifindex)
{
...
        nl_list_for_each_entry(link, &cache->c_items, ce_list) {
                if (link->l_index == ifindex) {
                        nl_object_get((struct nl_object *) link);
                        return link;
                }
        }

        return NULL;
}

In case network interfaces have indexes with "holes" - this code does not work.

In our particular case the node could not boot from an iSCSI device, because "nash" inside initrd image failed to configure network interfaces due to this issue (Parallels Virtuozzo Containers kernel virtualizes network interfaces => their indexes may be non-consequent).

But this issue can be easily reproducible on a stock RHEL5 kernel - even on an already booted and fully operational node:

# Initially we see the network interface indexes are sequential:
[root@dhcp-10-30-20-94 ~]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:1c:42:42:44:47 brd ff:ff:ff:ff:ff:ff
    inet 10.30.20.94/16 brd 10.30.255.255 scope global eth0
    inet6 2a00:1b48:1003:30:21c:42ff:fe42:4447/64 scope global dynamic 
       valid_lft 2592000sec preferred_lft 604800sec
    inet6 fe80::21c:42ff:fe42:4447/64 scope link 
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop 
    link/sit 0.0.0.0 brd 0.0.0.0

# Let's make them non-sequential:
[root@dhcp-10-30-20-94 ~]# vconfig add eth0 10
...
[root@dhcp-10-30-20-94 ~]# vconfig add eth0 11
...
[root@dhcp-10-30-20-94 ~]# vconfig rem eth0.10
...

[root@dhcp-10-30-20-94 ~]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:1c:42:42:44:47 brd ff:ff:ff:ff:ff:ff
    inet 10.30.20.94/16 brd 10.30.255.255 scope global eth0
    inet6 2a00:1b48:1003:30:21c:42ff:fe42:4447/64 scope global dynamic 
       valid_lft 2591992sec preferred_lft 604792sec
    inet6 fe80::21c:42ff:fe42:4447/64 scope link 
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop 
    link/sit 0.0.0.0 brd 0.0.0.0
5: eth0.11@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop 
    link/ether 00:1c:42:42:44:47 brd ff:ff:ff:ff:ff:ff

# And here is the command sequence that fails:
[root@dhcp-10-30-20-94 ~]# nash
(running in test mode).
Red Hat nash version 5.1.19.6 starting
netname 00:1c:42:42:44:47 eth0.11
network --device eth0.11 --bootproto static --ip 10.0.0.2 --netmask 255.255.255.0
<<< press Ctrl-d here >>>
ERROR: Interface setup failed: pumpSetupInterface failed: get link - 19: No such device.


1) i did this test on the following node:
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
2.6.18-194.el5 x86_64 kernel
nash-5.1.19.6-61

But originally the issue was found on a RHEL5.6 => nash-5.1.19.6-68.1, libnl-1.0-0.10.pre5.5, libdhcp-1.20-10.el5 are also affected.
("nash" uses "libdhcp" and "libnl" but compiled statically => it should be also considered)

2) the example above shows that "nash" is affected, but in fact any tool that uses "libdhcp"/"libnl" will behave the same way.

3) how to solve this:
it seems the simplest way here is to add another function to "libnl" which will return i-th interface configuration and make nic_get_links() function from "libdhcp" library use it instead of rtnl_link_get().
Attaching patches with the implementation.

Hope that helps.

--
Best regards,

Konstantin Khorenko,
PVCfL/OpenVZ developer,
Parallels

Comment 1 Konstantin Khorenko 2011-05-17 10:19:51 UTC
Created attachment 499297 [details]
add rtnl_get_link_by_number() function to libnl

Comment 2 RHEL Program Management 2011-08-05 12:36:05 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 David Cantrell 2011-08-19 18:49:09 UTC

*** This bug has been marked as a duplicate of bug 599633 ***


Note You need to log in before you can comment on or make changes to this bug.