Bug 1441919 - Network Service is failing with DPDK
Summary: Network Service is failing with DPDK
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 11.0 (Ocata)
Assignee: Saravanan KR
QA Contact: Eyal Dannon
URL:
Whiteboard:
Depends On: 1428013
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-13 05:42 UTC by Saravanan KR
Modified: 2017-05-17 20:21 UTC (History)
12 users (show)

Fixed In Version: os-net-config-6.0.0-3.el7ost
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of: 1428013
Environment:
Last Closed: 2017-05-17 20:21:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1657661 0 None None None 2017-04-13 05:42:57 UTC
OpenStack gerrit 443237 0 None MERGED Network service is failing with DPDK 2020-05-09 16:35:12 UTC
Red Hat Product Errata RHEA-2017:1245 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC

Comment 2 Eyal Dannon 2017-04-30 05:33:48 UTC
I've verified this bug on RHOS11 with puddle: 2017-04-24.2

$ rpm -qa | grep os-net
os-net-config-6.0.0-3.el7ost.noarch

The interface which acts as dpdk0 is not visible for the kernel, the config file for that interface is no more available after deployment.

Thanks.

Comment 3 Eyal Dannon 2017-04-30 08:19:38 UTC
After a reboot the interface is not presented but I still see the network.service down.
moreover, part or the interfaces which suppose to be up are down such as vlan interfaces.

● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sun 2017-04-30 08:17:01 UTC; 27s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 1370 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)

Apr 30 08:16:51 compute-0.localdomain network[1370]: [FAILED]
Apr 30 08:16:51 compute-0.localdomain ovs-vsctl[1939]: ovs|00001|vsctl|INFO|Called as ovs-v...l
Apr 30 08:17:01 compute-0.localdomain network[1370]: Bringing up interface vlan397:  2017-...k)
Apr 30 08:17:01 compute-0.localdomain network[1370]: /etc/sysconfig/network-scripts/ifup-o...A}
Apr 30 08:17:01 compute-0.localdomain network[1370]: ERROR    : [/etc/sysconfig/network-sc...n.
Apr 30 08:17:01 compute-0.localdomain network[1370]: [FAILED]
Apr 30 08:17:01 compute-0.localdomain systemd[1]: network.service: control process exited,...=1
Apr 30 08:17:01 compute-0.localdomain systemd[1]: Failed to start LSB: Bring up/down netwo...g.
Apr 30 08:17:01 compute-0.localdomain systemd[1]: Unit network.service entered failed state.
Apr 30 08:17:01 compute-0.localdomain systemd[1]: network.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

--- ens2f1 used as dpdk interface

[root@compute-0 ~]# ll /etc/sysconfig/network-scripts/ifcfg-*
-rw-r--r--. 1 root root 265 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-br-isolated
-rw-r--r--. 1 root root 197 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-br-link
-rw-r--r--. 1 root root 160 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-dpdk0
-rw-r--r--. 1 root root 135 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-eno1
-rw-r--r--. 1 root root 186 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-ens1f0
-rw-r--r--. 1 root root  59 Apr 30 07:13 /etc/sysconfig/network-scripts/ifcfg-ens1f1
-rw-r--r--. 1 root root 176 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-ens2f0
-rw-r--r--. 1 root root 254 Apr 30 07:13 /etc/sysconfig/network-scripts/ifcfg-lo
-rw-r--r--. 1 root root 248 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-vlan396
-rw-r--r--. 1 root root 248 Apr 30 07:18 /etc/sysconfig/network-scripts/ifcfg-vlan397

------------ no vlan interfaces are presented
[root@compute-0 ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether e0:07:1b:f4:cc:74 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::e207:1bff:fef4:cc74/64 scope link 
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e0:07:1b:f4:cc:75 brd ff:ff:ff:ff:ff:ff
4: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 14:02:ec:7c:88:44 brd ff:ff:ff:ff:ff:ff
    inet 192.0.40.6/24 brd 192.0.40.255 scope global ens1f0
       valid_lft forever preferred_lft forever
    inet6 fe80::1602:ecff:fe7c:8844/64 scope link 
       valid_lft forever preferred_lft forever
5: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e0:07:1b:f4:cc:76 brd ff:ff:ff:ff:ff:ff
6: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e0:07:1b:f4:cc:77 brd ff:ff:ff:ff:ff:ff
7: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 14:02:ec:7c:88:45 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1602:ecff:fe7c:8845/64 scope link 
       valid_lft forever preferred_lft forever
8: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 14:02:ec:7c:87:7c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1602:ecff:fe7c:877c/64 scope link 
       valid_lft forever preferred_lft forever
10: ens5f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 14:02:ec:7c:92:d8 brd ff:ff:ff:ff:ff:ff
11: ens5f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 14:02:ec:7c:92:d9 brd ff:ff:ff:ff:ff:ff
12: ens4f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 14:02:ec:7c:93:40 brd ff:ff:ff:ff:ff:ff
13: ens4f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 14:02:ec:7c:93:41 brd ff:ff:ff:ff:ff:ff

------- and are presented here:
[root@compute-0 ~]# ovs-vsctl show
874169ea-c22d-4249-a93b-ed75e1cd9e9e
    Manager "ptcp:6640:127.0.0.1"
    Bridge br-ex
        Controller "tcp:127.0.0.1:6633"
        fail_mode: secure
        Port br-ex
            Interface br-ex
                type: internal
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
    Bridge br-isolated
        fail_mode: standalone
        Port "vlan396"
            tag: 396
            Interface "vlan396"
                type: internal
        Port "vlan397"
            tag: 397
            Interface "vlan397"
                type: internal
        Port "ens2f0"
            Interface "ens2f0"
        Port br-isolated
            Interface br-isolated
                type: internal
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
        fail_mode: secure
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port br-int
            Interface br-int
                type: internal
        Port int-br-link
            Interface int-br-link
                type: patch
                options: {peer=phy-br-link}
    Bridge br-link
        Controller "tcp:127.0.0.1:6633"
        fail_mode: standalone
        Port br-link
            Interface br-link
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port phy-br-link
            Interface phy-br-link
                type: patch
                options: {peer=int-br-link}

Comment 7 Saravanan KR 2017-05-10 10:07:43 UTC
Debugged the panther01 environment (OSP11), where this issue has been verified. 

The issue is after reboot the ovs-vswitchd service has failed, with reason as invalid value for socket-mem. The network-environment file has the parameter NeutronDpdkSocketMemory as "'1024,1024'", which is setting the ovsdb value dpdk-socket-mem as '1024,1024' at the Step4 (puppet-vswitch). When the compute node is restarted, the new configuration which is set by puppet is applied, which is making the service to fail. As the ovs-vswitchd service has failed, the network.service is also failing. After modifying the parameter NeutronDpdkSocketMemory as "1024,1024", the reboot is successfully starting network service.

Note: It is working fine immediately after the deployment, as we are using first-boot bash script to enable dpdk on boot. Issue will happen only after reboot.

Comment 8 Eyal Dannon 2017-05-11 08:16:40 UTC
(In reply to Saravanan KR from comment #7)
> Debugged the panther01 environment (OSP11), where this issue has been
> verified. 
> 
> The issue is after reboot the ovs-vswitchd service has failed, with reason
> as invalid value for socket-mem. The network-environment file has the
> parameter NeutronDpdkSocketMemory as "'1024,1024'", which is setting the
> ovsdb value dpdk-socket-mem as '1024,1024' at the Step4 (puppet-vswitch).
> When the compute node is restarted, the new configuration which is set by
> puppet is applied, which is making the service to fail. As the ovs-vswitchd
> service has failed, the network.service is also failing. After modifying the
> parameter NeutronDpdkSocketMemory as "1024,1024", the reboot is successfully
> starting network service.
> 
> Note: It is working fine immediately after the deployment, as we are using
> first-boot bash script to enable dpdk on boot. Issue will happen only after
> reboot.

As Saravanan mentioned above, verified and fixed in the documentation.

Comment 9 errata-xmlrpc 2017-05-17 20:21:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245


Note You need to log in before you can comment on or make changes to this bug.