Bug 870391 - igb: i350 does not report dropped packets count correctly
igb: i350 does not report dropped packets count correctly
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
All Linux
medium Severity medium
: rc
: ---
Assigned To: Stefan Assmann
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-26 07:02 EDT by Stefan Assmann
Modified: 2013-11-19 09:51 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-06 03:36:05 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dhcp-ack-eth1.10.pcap (409 bytes, application/octet-stream)
2012-10-26 07:02 EDT, Stefan Assmann
no flags Details
regdump.diff (5.91 KB, text/plain)
2012-10-26 10:52 EDT, Stefan Assmann
no flags Details
i350-regdump1.txt (37.40 KB, text/plain)
2012-11-02 04:50 EDT, Stefan Assmann
no flags Details
i350-regdump2.txt (37.40 KB, text/plain)
2012-11-02 04:51 EDT, Stefan Assmann
no flags Details
Test patch for malicious block feature disable on i350 (854 bytes, application/octet-stream)
2012-11-09 12:29 EST, carolyn.wyborny
no flags Details
i350-regdiff-mal-block-off.diff (4.27 KB, text/plain)
2012-11-14 05:19 EST, Stefan Assmann
no flags Details
eth5-before-mal-off.txt (37.40 KB, text/plain)
2012-11-20 03:41 EST, Stefan Assmann
no flags Details
eth5-after-mal-off.txt (37.40 KB, text/plain)
2012-11-20 03:41 EST, Stefan Assmann
no flags Details
patch to call dump code if problem packet value is encountered (821 bytes, patch)
2012-11-20 14:05 EST, carolyn.wyborny
no flags Details | Diff
LVMMC.patch (1.46 KB, patch)
2012-12-10 08:58 EST, Stefan Assmann
no flags Details | Diff

  None (edit)
Description Stefan Assmann 2012-10-26 07:02:49 EDT
Created attachment 633799 [details]
dhcp-ack-eth1.10.pcap

Description of problem:
When replaying the attached DHCP ACK packet over a VLAN on i350 the packet gets dropped by the hardware, but the dropped packet does not show up in the stats.

Version-Release number of selected component (if applicable):
kernel-2.6.32-279.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. vconfig add eth1 10
2. tcpreplay -i eth1 dhcp-ack-eth1.10.pcap
Comment 2 Stefan Assmann 2012-10-26 10:52:43 EDT
Created attachment 633905 [details]
regdump.diff

Comparing the output of ethtool -d from before and after replaying the packet does not show any dropped packets.
Comment 3 Stefan Assmann 2012-10-26 10:54:45 EDT
Carolyn,
could you please investigate why the packet seems to be silently dropped without incrementing any counters?
Thanks!
Comment 4 carolyn.wyborny 2012-10-29 11:52:32 EDT
The packet drop count is direct from the hw, but I will investigate with the hardware team on whether there are any errata on this.
Comment 5 carolyn.wyborny 2012-11-01 19:06:19 EDT
Hello Stefan, 

So, some additional investigation shows that the tx_dropped counter is not from the hardware.  I was mistaken on that.  Its a net_stat and not an adapter stat, as I initially thought.  We do increment some net_stat counters in the driver, but not this one.  We grab it from the stack before outputting it in ethtool.  I'm not sure how its determined. 

What stat were you looking at to determine drops?  Total packets transmitted?  If so, we determine that during the tx cleanup process.  What we find in this case, is that the hw marks the descriptor immediately as "done" and we "clean" it up, so the stat would include it in this case, even though it never does exit the device.  The driver does calc this stat, but I'm not sure how the driver could detect this particular case.  Let me know if you are using another stat to determine packets sent.   

For the hw side of the investigation, I need some more info.  Can I get a full register dump from the device the problem has been reproduced on?  Can you confirm whether or not any vf's were configured on it for the repro?

Thanks,
Comment 6 Stefan Assmann 2012-11-02 03:49:24 EDT
(In reply to comment #5)
> Hello Stefan, 
> 
> So, some additional investigation shows that the tx_dropped counter is not
> from the hardware.  I was mistaken on that.  Its a net_stat and not an
> adapter stat, as I initially thought.  We do increment some net_stat
> counters in the driver, but not this one.  We grab it from the stack before
> outputting it in ethtool.  I'm not sure how its determined. 

At first I was looking at the ethtool -S output as well but then as you just mention these stats may not come from the hardware but the stack, so this didn't seem to be the best source to look at.

> 
> What stat were you looking at to determine drops?  Total packets
> transmitted?  If so, we determine that during the tx cleanup process.  What
> we find in this case, is that the hw marks the descriptor immediately as
> "done" and we "clean" it up, so the stat would include it in this case, even
> though it never does exit the device.  The driver does calc this stat, but
> I'm not sure how the driver could detect this particular case.  Let me know
> if you are using another stat to determine packets sent.

Interesting, it is good to understand what the hardware is really doing. So I switched to looking at the register dump with ethtool -d as that should reflect any change to the hardware registers. I've attached the diff to this bugzilla. But looking at that I cannot see any change in tx related registers although I've replayed the packet in between the dumps. So the packet seems to completely vanish, no?
   
> 
> For the hw side of the investigation, I need some more info.  Can I get a
> full register dump from the device the problem has been reproduced on?  Can
> you confirm whether or not any vf's were configured on it for the repro?

IIRC no VFs were configured during the test. I'll attach a full register dump.
Comment 7 Stefan Assmann 2012-11-02 04:50:28 EDT
Created attachment 636990 [details]
i350-regdump1.txt

Still have the full register dumps from regdump.diff around. Attaching.
Comment 8 Stefan Assmann 2012-11-02 04:51:31 EDT
Created attachment 636992 [details]
i350-regdump2.txt
Comment 9 carolyn.wyborny 2012-11-09 12:29:16 EST
Created attachment 641725 [details]
Test patch for malicious block feature disable on i350
Comment 10 carolyn.wyborny 2012-11-09 12:32:31 EST
Ack, can't add attachment and new comments at the same time..  

Thanks for the full dump.  

i350 has a feature designed for virutalization that tries to identify potentially malicious vf's. This feature is disabled by default, but some malformed packets are still checked and dropped by default.  I've attached a patch to disable that feature.  Can you try a repro with the patch applied?  If it still fails then we are dealing with something else, but if not, then the malformed packets in this case are being detected by this feature.
Comment 11 Stefan Assmann 2012-11-14 05:19:06 EST
Created attachment 644727 [details]
i350-regdiff-mal-block-off.diff

I've tested your patch and diffed the output of ethtool -d again but didn't see any new changes in the diff. Guess this feature isn't the culprit.
Comment 12 carolyn.wyborny 2012-11-19 13:51:04 EST
Hmm..  register offset is 0x3590 DTXCTL under Transmit section.  Its not in the diff, so assume it didn't get changed?  It should be changed if the patch applied correctly.  Can I get the full dump?  HW team is requesting it for review.  If you have an example of the descriptor data, please attach that, otherwise, I'll locate a copy from our in-house repro.
Comment 13 Stefan Assmann 2012-11-20 03:41:09 EST
Created attachment 648342 [details]
eth5-before-mal-off.txt

Sorry, I had to recreate the full dumps, so these don't match with the diff I posted earlier.
Comment 14 Stefan Assmann 2012-11-20 03:41:52 EST
Created attachment 648343 [details]
eth5-after-mal-off.txt
Comment 15 Stefan Assmann 2012-11-20 03:44:53 EST
Not sure what you mean by "an example of the descriptor data". The packet I replayed is attached to this bz.
Comment 16 carolyn.wyborny 2012-11-20 13:40:08 EST
Thanks Stefan, 

Interesting that the bit is set in both the before and after.  I'm posting another patch to output the descriptor data when the problem scenario occurs to the dmesg log.  I need that output to send to the hw team.
Comment 17 carolyn.wyborny 2012-11-20 14:05:11 EST
Created attachment 648728 [details]
patch to call dump code if problem packet value is encountered
Comment 18 carolyn.wyborny 2012-11-20 14:06:39 EST
The dump that will output in the dmesg log is the descriptor data created by the driver for the hardware to transmit, which doesn't actually get transmitted in this case.
Comment 19 Stefan Assmann 2012-11-21 02:43:16 EST
(In reply to comment #16)
> Thanks Stefan, 
> 
> Interesting that the bit is set in both the before and after.  I'm posting
> another patch to output the descriptor data when the problem scenario occurs
> to the dmesg log.  I need that output to send to the hw team.

Hi Carolyn thanks for looking into it!
I'm not sure why the bit should not be set. Let me clarify on my testing.
- boot whatever test kernel
- setup VLAN
- capture ethtool -d (before)
- tcpreplay -i eth5 dhcp-ack-eth1.10.pcap
- capture ethtool -d (after)
So the bit should be set in both dumps since the code is called during igb_configure.
Comment 20 Stefan Assmann 2012-11-21 06:50:37 EST
While capturing that descriptor data I realized that with the malicious block feature turned off the injected packet does indeed make its way to the other host!
So that feature seems to be what "eats" the packet.

Anyway here's the dump you requested.
igb 0000:00:04.0: Net device Info
igb: Device Name     state            trans_start      last_rx
igb: eth5            0000000000000003 00000001003119DB 0000000000000000
igb 0000:00:04.0: Register Dump
igb:  Register Name   Value
igb: CTRL            581c0241
igb: STATUS          0028078b
igb: CTRL_EXT        101400c0
igb: ICR             00000001
igb: RCTL            04048022
igb: RDLEN[0-3]      00001000 00001000 00001000 00001000
igb: RDH[0-3]        00000000 00000000 00000000 00000000
igb: RDT[0-3]        000000ff 000000ff 000000ff 000000ff
igb: RXDCTL[0-3]     02040808 02040808 02040808 02040808
igb: RDBAL[0-3]      3b983000 3c1b5000 3bbec000 37bdc000
igb: RDBAH[0-3]      00000000 00000000 00000000 00000000
igb: TCTL            a503f0fa
igb: TDBAL[0-3]      3b983000 3c1b5000 3bbec000 37bdc000
igb: TDBAH[0-3]      00000000 00000000 00000000 00000000
igb: TDLEN[0-3]      00001000 00001000 00001000 00001000
igb: TDH[0-3]        00000000 00000000 00000012 00000000
igb: TDT[0-3]        00000000 00000000 00000012 00000000
igb: TXDCTL[0-3]     02100108 02100108 02100108 02100108
igb: TDFH            0000004d
igb: TDFT            0000004d
igb: TDFHS           0000004d
igb: TDFPC           00000000
igb 0000:00:04.0: TX Rings Summary
igb: Queue [NTU] [NTC] [bi(ntc)->dma  ] leng ntw timestamp
igb:      0     0     0 0000000000000000 0000 (null) 0000000000000000
igb:      1     0     0 0000000000000000 0000 (null) 0000000000000000
igb:      2    12    12 0000000000000000 0000 (null) 0000000000000000
igb:      3     0     0 0000000000000000 0000 (null) 0000000000000000
igb 0000:00:04.0: RX Rings Summary
igb: Queue [NTU] [NTC]
igb:      0    FF     0
igb:      1    FF     0
igb:      2    FF     0
igb:      3    FF     0
Comment 21 carolyn.wyborny 2012-11-21 13:27:18 EST
That is what the hw team suspected, that the malicious blocking feature was grabbing the bad packet.  The feature is on by default in order to prevent malformed packets from getting out.  Does this info suffice to explain the drops?
Comment 22 Stefan Assmann 2012-11-22 09:37:14 EST
I'm torn with this. Having this feature turned on by default seems like a sensible thing as it prevents malformed packets to be transmitted. On the other hand having packets being dropped without any record somewhere may cause all kinds of bug reports which will he hard to track.

Can you confirm with the team that packets being dropped by this feature are not accounted for somewhere and just get silently dropped? Is there a way to figure if a packet is going to be dropped by the hw from the kernel?

Any other NICs having this feature?
Comment 23 Stefan Assmann 2012-12-04 04:24:59 EST
Any update on my questions?
Comment 24 carolyn.wyborny 2012-12-04 19:02:04 EST
I hope to have more info tomorrow on this.  Sorry for the delay.
Comment 25 carolyn.wyborny 2012-12-05 18:51:20 EST
There is a register that lists causes of bad packets, its the LVMMC, at offset 0x03548, it should be visible with ethregs or other tools, but does not show in current ethtool output.  Here's the link to the datasheet, the description of it is in there.  http://www.intel.com/content/www/us/en/ethernet-controllers/ethernet-controller-i350-datasheet.html

The register is a cause register, but not a counter.  You could see there was an error of some type in it and know that at least one packet was not trasnmitted.  I suspect the dhcp packets in question would toggle bit 0 in this register.
Comment 26 Stefan Assmann 2012-12-07 05:48:00 EST
Thanks Carolyn! I'll give it a try.
Comment 27 Stefan Assmann 2012-12-10 08:58:08 EST
Created attachment 660885 [details]
LVMMC.patch

I've investigated the LVMMC register and it only works if DTXCTL.MDP_EN is set. Normally this isn't the case so I did it via the patch attached and voila LVMMC reports dropped packets. However DTXCTL.MDP_EN=1 has the side effect of blocking the queue the "malicious" packet was in which triggers a driver reset.That's probably not what we want for production.
Comment 28 Stefan Assmann 2012-12-10 09:46:11 EST
On a closer look it seems that if a malicious packet is discovered and MDP_EN is set an interrupt is generated. There should be a way to avoid the reset by re-enable the relevant VFTE bit in the ISR.

The interrupt cause being generated is ICR_MDDET (previously known as ICR_DOUTSYNC). The ISR already checks for ICR_DOUTSYNC and increases the tx_dma_out_of_sync stat.

So if we clear the VFTE bit in the ISR could this be a workable solution?
Comment 29 John Ronciak 2012-12-10 18:53:50 EST
Stefan, there are 2 things we need to point out.  First is that these checks should only be used when VF's are enabled.  Our HW however does still do "some" malicious packet checking (which is what's happening here) but it really is not intended for use when VF's are not configured for use.  This is what you are doing here.  You have the stack giving the driver a malformed packet which it should never be doing in the first place.  Second, once the stack is fixed to not give the driver a malicious packet we have never seen the issue.  So unless you can prove that with a fixed stack (i.e. no malicious packets being passed to the driver) you are seeing the issue, this is a non-issue.  With a working stack you don't see the problem, at least it's never been reported to us and we have never seen it.

In addition, our HW team tell us that if the condition is detected there is no way to (from the driver's point of view) fix what is wrong.  So the way for the driver to fix this is to reset the part to return everything to a known state.  So the idea of just clearing the bit doesn't work.  It is not a workable solution.

We think you are making way too much out of this.  With a stack that doesn't hand the driver a malformed packet, you don't see the problem.
Comment 30 Stefan Assmann 2012-12-12 07:36:19 EST
John,
what I'm trying to do here is to evaluate the options on how to deal with hardware that drops packets without any way for the OS to know about this happening.

So what are our options here.
a) Looking for ways to get information that the hardware dropped packets. Which was the intention of this bugzilla.
b) Disable the feature because there's no way for us to know if packets are being dropped. Not doing so is a support problem because you can never be sure the stack is always doing the right thing.
Comment 31 John Ronciak 2012-12-12 12:34:14 EST
As we have said there is no way to get that count.  The HW does not track it in these cases.  Since you can only create this situation when the stack is doing something bad (or artificially making it happen) we don't view this as any sort of real problem.  If the stack does the right thing (not give the driver bad data) there is no problem.  

As far as our HW goes, it also does the right thing in that the packets that are bad are dropped and the system is notified when the driver is configured correctly  (VF's enabled) which is the case that the HW was designed for.

So as I said, until you come have a case where this is happening with a non-broken stack or where you artificially cause the driver to get bad data we consider this closed.  We aren't going to work on this because it is not an issue.  Also, since it is not an issue you (nor us) will be getting customer calls on it.

Please drop this.  It is not an issue now that the stack has been fixed.

PS - The comment is b) above is not useful.  If the driver can't count on the stack to be giving it correct data the _whole_ system is seriously broken and should be re-installed.  This is exactly the same as getting data from the kernel.  If the kernel gives the driver bad data, bad things will happen to the driver and most likely the system.  I would not tell your customers that you don't trust what  the stack is giving to the drivers.  You will get calls on making that statement.
Comment 32 Vasily Averin 2013-11-19 09:51:10 EST
(In reply to John Ronciak from comment #31)
> So as I said, until you come have a case where this is happening with a
> non-broken stack or where you artificially cause the driver to get bad data
> we consider this closed.

Dear John,
you can found such example in https://bugzilla.redhat.com/show_bug.cgi?id=1032100

Note You need to log in before you can comment on or make changes to this bug.