Bug 1471221 - vmware VM crashes with: kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1412! [NEEDINFO]
vmware VM crashes with: kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1412!
Status: NEW
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
26
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Neil Horman
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-14 13:46 EDT by colin
Modified: 2017-12-22 10:54 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
nhorman: needinfo? (colinhd8)


Attachments (Terms of Use)
kernelbug with vmxnet3 (63.41 KB, image/png)
2017-07-14 13:46 EDT, colin
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1654319 None None None 2017-07-14 13:49 EDT
Red Hat Knowledge Base (Solution) 3114351 None None None 2017-07-14 13:46 EDT

  None (edit)
Description colin 2017-07-14 13:46:36 EDT
Created attachment 1298558 [details]
kernelbug with vmxnet3

Description of problem:
Sometimes, when login or logout by ssh, system goes to freeze(nothing can be done except restart), and cpu goes to 100%(monitor by esxi)

Version-Release number of selected component (if applicable):
Fedora 26 fresh install
in fact fedora 25 has the same problem.


How reproducible:


Steps to Reproduce:
1.login by ssh
2.type exit
3.login by ssh
4.type exit
......
sometimes freeze at first login or logout action, sometimes needs retry.

Actual results:
system freeze

Expected results:
system stay in normal state.


Additional info:
Fedora 26 installed on Esxi 6.5d
open a console(not ssh), then login and type tail -f /var/log/messages
when it's freeze, sometimes it will show the messages:
Comment 1 colin 2017-07-14 13:53:10 EDT
when it's freeze, sometimes it will show the messages:
(please refer to the attachment 1298558 [details], sorry for this)
Comment 2 Neil Horman 2017-12-21 19:44:41 EST
That bug halt is the result of buf_type field in the received descriptor from the hypervisor doesn't have the value VMXNET3_RX_BUF_PAGE.  its possible that the latest hypervisor added a new buffer type that the fedora driver isn't ready for, but I don't see any update upstream that would suggest that.  Can you add a stap script to dump out the value of buf_type at the bug halt to tell us what the reported type is?
Comment 3 colin 2017-12-22 10:11:19 EST
How to dump out the value of buf_type.

By the way, the hypervisor is esxi 6.5.

And i found that if i change nic type from vmxnet3 to e1000e, the bug gone.
Comment 4 Neil Horman 2017-12-22 10:54:56 EST
write a systemtap script to do it.  Probe line 1310 of vmxnet3_drv.c (or the appropriate line for the specific kernel version you are using if its changed, and print $rbi->buf_type)

And of yes, changing the driver fixes the problem, and thats expected.  The problem you are reporting is a BUG halt that triggers when the vmxnet3 driver notes a problem with the virutal hardware descriptor that gets passed from the hypervisor.  It would never happen with e1000 because it has different checks specific to its hardware.

Note You need to log in before you can comment on or make changes to this bug.