Bug 1126653 - mkdumprd can't get non-default static route in a corner case
Summary: mkdumprd can't get non-default static route in a corner case
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kexec-tools
Version: 7.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Baoquan He
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On: 1125182 1126656
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-05 02:02 UTC by Qiao Zhao
Modified: 2015-11-16 08:03 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1125182
Environment:
Last Closed: 2015-06-24 02:01:02 UTC


Attachments (Terms of Use)

Description Qiao Zhao 2014-08-05 02:02:15 UTC
+++ This bug was initially created as a clone of Bug #1125182 +++

Description of problem:
Be different than bug 806992, my environment:
+-----------------------+       +------------------------+
| guest1                |       |  guest2                |
| eth0  10.66.xx.xx     |       | eth0 10.66.xx.xx       |
| eth1  192.168.10.x    |       | eth1 192.168.20.x      |
+-----------------------+       +------------------------+
[guest1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.10.1
NETMASK=255.255.255.0
HWADDR=02:00:00:00:00:10

[guest2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.20.1
NETMASK=255.255.255.0
HWADDR=02:00:00:00:00:12

Add route to guest1/guest2
[guest1 ~]# ip route add 192.168.20.0/24 dev eth1
# ip route show
192.168.20.0/24 dev eth1  scope link 
192.168.10.0/24 dev eth1  proto kernel  scope link  src 192.168.10.1 
10.66.86.0/23 dev eth0  proto kernel  scope link  src 10.66.87.250 
169.254.0.0/16 dev eth0  scope link  metric 1002 
169.254.0.0/16 dev eth1  scope link  metric 1003 
default via 10.66.87.254 dev eth0 

[guest2 ~]# ip route add 192.168.10.0/24 dev eth1
# ip route show
192.168.20.0/24 dev eth1  proto kernel  scope link  src 192.168.20.1 
192.168.10.0/24 dev eth1  scope link 
10.66.86.0/23 dev eth0  proto kernel  scope link  src 10.66.86.250 
169.254.0.0/16 dev eth0  scope link  metric 1002 
169.254.0.0/16 dev eth1  scope link  metric 1003 
default via 10.66.87.254 dev eth0 

In guest1:
[guest1 ~]# ping 192.168.20.1
PING 192.168.20.1 (192.168.20.1) 56(84) bytes of data.
64 bytes from 192.168.20.1: icmp_seq=1 ttl=64 time=3.53 ms

start kdump service in guest1
# grep -v ^# /etc/kdump.conf

net 192.168.20.1:/export/tmp
path /var/crash
core_collector makedumpfile -c --message-level 1 -d 31

# service kdump restar

# echo c > /proc/sysrq-trigger

[console log]
 vda: vda1 vda2
mapping eth1 to eth1
8021q: adding VLAN 0 to HW filter on device eth1
ip: RTNETLINK answers: No such process
Saving to remote location 192.168.20.1:/export/tmp
mount: RPC: Remote system error - Network is unreachable
Restarting system.
machine restart
[/console log]

Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-278.el6

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
= 1 =  [use `ip route add 192.168.20.0/24 dev eth1`] - dump failed
+ local dev=eth1
++ /sbin/ip route show
++ grep '^[[:digit:]].*via.* eth1 '
+ local routes=
+ '[' -z '' ']'
++ /sbin/ip route show
++ awk '/^default/ {print $3}'
+ GATEWAY=10.66.87.254
+ '[' -n 10.66.87.254 ']'
+ echo '  ' gateway 10.66.87.254
+ '[' -n '' ']'
+ set +x
Starting kdump:                                            [  OK  ]

= 2 =  [use `ip route add 192.168.20.0/24 via 192.168.10.2 dev eth1`] - dump ok (192.168.20.2 is another machine)
+ local dev=eth1
++ /sbin/ip route show
++ grep '^[[:digit:]].*via.* eth1 '
+ local 'routes=192.168.20.0/24 via 192.168.10.2 dev eth1 '
+ '[' -z '' ']'
++ /sbin/ip route show
++ awk '/^default/ {print $3}'
+ GATEWAY=10.66.87.254
+ '[' -n 10.66.87.254 ']'
+ echo '  ' gateway 10.66.87.254
+ '[' -n '192.168.20.0/24 via 192.168.10.2 dev eth1 ' ']'
+ /sbin/ip route show
+ grep '^[[:digit:]].*via.* eth1 '
+ set +x
Starting kdump:                                            [  OK  ]

--- Additional comment from Vivek Goyal on 2014-07-31 09:35:55 EDT ---

Bao, I thought we solved static route issue in rhel6. Is it some corner case configuration issue.

--- Additional comment from Baoquan He on 2014-07-31 23:03:18 EDT ---

(In reply to Vivek Goyal from comment #1)
> Bao, I thought we solved static route issue in rhel6. Is it some corner case
> configuration issue.

Yes, it's a very weird case. When 2 machines are connected directly or by a bridge, usually they are configured in the same subnet, 
say 192.168.10.1 <--> 192.168.10.2
In this case, nothing is needed, they can communicate with each other directly.

However, if they are configured in different subnet, 
e.g 192.168.10.1 <--> 192.168.20.1
In this case, though they are connected directly, a specified route has to be configured by network admin. And you can skip nexthop ipaddr. Means below 2 routes works well.

192.168.20.0/24 via 192.168.20.1 dev eth0

192.168.20.0/24 dev eth0

The case Qiao are taking about is the 2nd route which is without "via xxx". Because in kdump implementation, we grep "via xxx" to find a crossing subnet route. Then this route is skipped too. 

I can change kdump script to find all routes which go through a certain NIC, but then the direct connection route will be added too. the direct connection route is created by kernel stack when a NIC is configured. Say ip addr is configured on NIC eth0, 192.168.10.1, then a direct network connection "192.168.10.0/24 dev eth0" is added automatically. I can't distinguish them.

--- Additional comment from Baoquan He on 2014-08-01 00:24:26 EDT ---


So for this case, 3 choices:

1. we keep this bug open, and handle it if any customers complain it.

2. Add a description to DOC, ask customers to add nexthop explicitly if they configure in this corner case. This suggested by Qiao.

3. grep all routes which go through a certain NIC. The defect is the unnecessary direct connection route will be added too.

--- Additional comment from Baoquan He on 2014-08-01 01:23:40 EDT ---

Hi Marc,

What do you say about this issue? From your point view or customers side, any suggestion or idea?

Thanks
Baoquan

--- Additional comment from Vivek Goyal on 2014-08-01 09:22:59 EDT ---

(In reply to Baoquan He from comment #2)
> (In reply to Vivek Goyal from comment #1)
> > Bao, I thought we solved static route issue in rhel6. Is it some corner case
> > configuration issue.
> 
> Yes, it's a very weird case. When 2 machines are connected directly or by a
> bridge, usually they are configured in the same subnet, 
> say 192.168.10.1 <--> 192.168.10.2
> In this case, nothing is needed, they can communicate with each other
> directly.
> 
> However, if they are configured in different subnet, 
> e.g 192.168.10.1 <--> 192.168.20.1
> In this case, though they are connected directly, a specified route has to
> be configured by network admin. And you can skip nexthop ipaddr. Means below
> 2 routes works well.
> 
> 192.168.20.0/24 via 192.168.20.1 dev eth0
> 
> 192.168.20.0/24 dev eth0
> 
> The case Qiao are taking about is the 2nd route which is without "via xxx".
> Because in kdump implementation, we grep "via xxx" to find a crossing subnet
> route. Then this route is skipped too. 
> 
> I can change kdump script to find all routes which go through a certain NIC,
> but then the direct connection route will be added too. the direct
> connection route is created by kernel stack when a NIC is configured. Say ip
> addr is configured on NIC eth0, 192.168.10.1, then a direct network
> connection "192.168.10.0/24 dev eth0" is added automatically. I can't
> distinguish them.

Can't we look at NIC ip and netmask and see if route is a subnet route for that ip (which will be automatically added) or that rotue does not belong to subnet. If it does not belong to subnet, then it is a static route.

--- Additional comment from Marc Milgram on 2014-08-01 10:01:08 EDT ---

Baoquan,

I don't see any customer cases linked to this BZ, and I haven't run into a customer hitting this problem.

On the other hand, some customer will try to do this.  It should probably get the propper fix.

Regards,

Marc

--- Additional comment from Xin Long on 2014-08-04 00:25:37 EDT ---

(In reply to Baoquan He from comment #2)
> (In reply to Vivek Goyal from comment #1)
> > Bao, I thought we solved static route issue in rhel6. Is it some corner case
> > configuration issue.
> 
> Yes, it's a very weird case. When 2 machines are connected directly or by a
> bridge, usually they are configured in the same subnet, 
> say 192.168.10.1 <--> 192.168.10.2
> In this case, nothing is needed, they can communicate with each other
> directly.
> 
> However, if they are configured in different subnet, 
> e.g 192.168.10.1 <--> 192.168.20.1
> In this case, though they are connected directly, a specified route has to
> be configured by network admin. And you can skip nexthop ipaddr. Means below
> 2 routes works well.
> 
> 192.168.20.0/24 via 192.168.20.1 dev eth0
> 
> 192.168.20.0/24 dev eth0
> 

actually, 192.168.20.0/24 via 192.168.20.1 dev eth0, this route cannot work, because 192.168.20.1 is not a direct address, so if you want do it through this methods , you need to add another route, like:
 
ip route add 192.168.20.1 dev eth0

but when you ping 192.168.20.1, the first route still is *ip route add 192.168.20.1 dev eth0*, so I think, the issue cannot be solved, it's not a good workaround.

if custom has this network topo to run kdump, I sugguest we should fix this bug. 
> The case Qiao are taking about is the 2nd route which is without "via xxx".
> Because in kdump implementation, we grep "via xxx" to find a crossing subnet
> route. Then this route is skipped too. 
> 
> I can change kdump script to find all routes which go through a certain NIC,
> but then the direct connection route will be added too. the direct
> connection route is created by kernel stack when a NIC is configured. Say ip
> addr is configured on NIC eth0, 192.168.10.1, then a direct network
> connection "192.168.10.0/24 dev eth0" is added automatically. I can't
> distinguish them.

--- Additional comment from Baoquan He on 2014-08-04 04:43:03 EDT ---


Hi all,

As Long said, the 2nd choice I wrote doesn't work, is not a work around.

Now, after discussion with Long and wpan who are familiar with networking, 2 ways come up:

1. Just find all routes through a certain NIC, and add them into route table in kdump kernel. Though direct connections are included too, it doesn't impact anything. Just a notice comes up to say that route exists, especially for direct connection routes.  This way is the simplest and direct.

2. Use netmask to find the specific route. say currently a NIC is configured as
192.168.10.1, target is 192.168.20.1.

Then ip route show :

192.168.10.0/24 /dev eth0 proto kernel  scope link src 192.168.10.1
192.168.20.0/24 /dev eth0
or
192.168.10.0/24 via 192.168.10.2 /dev eth0

Here use netmask to do the AND operation with target ip addr. If the result is equal to the route address, then this route is that we want and only this one is added to kdump kernel.

Surely this way is a little more complicated, the IP address need be transformed to a decimal integer and get the netmask.


So above 2 ways, which one do you prefer?  Any defects or any suggestions, better ideas?

Thanks
Baoquan

Comment 2 Baoquan He 2014-11-03 06:34:36 UTC
Patch has been merged fedora, can be back ported into rhel-7.2.

Comment 4 Baoquan He 2015-06-23 02:00:12 UTC
Hi Marc,

Patch for this bug has been merged into fedora. Do you think it's necessary to pull it back to rhel7? Since this is found by rhel QA and is a rare corner case. Do you have any suggestion because the original static route is reported by you and customers whom you discussed with?

If you think it's necessary, I will ask people to back port from fedora to rhel7.

Thanks
Baoquan


Note You need to log in before you can comment on or make changes to this bug.