Bug 977557 - Kernel stack trace in dmesg for Chelsio 10GbE card - cxgb3 driver
Kernel stack trace in dmesg for Chelsio 10GbE card - cxgb3 driver
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
18
x86_64 Linux
unspecified Severity low
: ---
: ---
Assigned To: fedora-kernel-ethernet
Fedora Extras Quality Assurance
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-24 18:37 EDT by Justin Clift
Modified: 2015-07-13 00:35 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-27 11:09:11 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Justin Clift 2013-06-24 18:37:52 EDT
Description of problem:

  While investigating an IPoIB problem on one of the QA boxes (gqaib-02.sbu.lab.eng.bos.redhat.com), noticed a kernel stack trace for the cxgb3 driver.

  That driver is for a Chelsio 10GbE card that isn't cabled to the switch,
  so nothing critical.  But, it might be of interest to whoever works with
  this part of the kernel.

  [   27.547324] iw_cxgb3: Chelsio T3 RDMA Driver - version 1.1
  [   27.557570] iw_cxgb3: Initialized device 0000:0a:00.0
  [   27.557580] cxgb3: p1p1, iscsi set MaxRxData to 16224 (0x3f603000)
  [   27.557585] ------------[ cut here ]------------
  [   27.557595] WARNING: at mm/page_alloc.c:2387 __alloc_pages_nodemask+0x8f8/0xae0()
  [   27.557597] Hardware name: PowerEdge 1950
  [   27.557599] Modules linked in: bnep bluetooth rfkill be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad mlx4_en iw_cxgb3 ib_core bnx2 mperf coretemp iTCO_wdt iTCO_vendor_support dcdbas microcode lpc_ich serio_raw mfd_core i5000_edac mlx4_core edac_core i5k_amb cxgb3 mdio shpchp vhost_net tun macvtap macvlan kvm_intel kvm uinput radeon i2c_algo_bit drm_kms_helper ttm drm mptsas i2c_core mptscsih mptbase scsi_transport_sas
  [   27.557654] Pid: 655, comm: NetworkManager Not tainted 3.9.6-200.fc18.x86_64 #1
  [   27.557656] Call Trace:
  [   27.557665]  [<ffffffff8105ef85>] warn_slowpath_common+0x75/0xa0
  [   27.557669]  [<ffffffff8105efca>] warn_slowpath_null+0x1a/0x20
  [   27.557672]  [<ffffffff8113d7f8>] __alloc_pages_nodemask+0x8f8/0xae0
  [   27.557677]  [<ffffffff8101b923>] ? native_sched_clock+0x13/0x80
  [   27.557681]  [<ffffffff810880d2>] ? up+0x32/0x50
  [   27.557686]  [<ffffffff8117c0a8>] alloc_pages_current+0xb8/0x190
  [   27.557691]  [<ffffffff8113834a>] __get_free_pages+0x2a/0x80
  [   27.557696]  [<ffffffff81187ba9>] kmalloc_order_trace+0x39/0xb0
  [   27.557699]  [<ffffffff81187de0>] __kmalloc+0x1c0/0x250
  [   27.557708]  [<ffffffffa04227c1>] cxgbi_ddp_init+0x71/0x260 [libcxgbi]
  [   27.557715]  [<ffffffffa04390e7>] cxgb3i_dev_open+0xf7/0x384 [cxgb3i]
  [   27.557719]  [<ffffffff81657bd5>] ? printk+0x61/0x63
  [   27.557736]  [<ffffffffa026349e>] cxgb3_add_clients+0x3e/0x60 [cxgb3]
  [   27.557743]  [<ffffffffa024f22c>] cxgb_open+0x32c/0x370 [cxgb3]
  [   27.557749]  [<ffffffff81555bfe>] __dev_open+0xce/0x150
  [   27.557752]  [<ffffffff81555ee1>] __dev_change_flags+0xa1/0x180
  [   27.557756]  [<ffffffff81556078>] dev_change_flags+0x28/0x70
  [   27.557760]  [<ffffffff81561ab1>] do_setlink+0x351/0x980
  [   27.557766]  [<ffffffff81321751>] ? nla_parse+0x31/0xe0
  [   27.557769]  [<ffffffff815647ae>] rtnl_newlink+0x36e/0x580
  [   27.557774]  [<ffffffff8118a373>] ? __kmalloc_node_track_caller+0x63/0x2a0
  [   27.557777]  [<ffffffff81564253>] rtnetlink_rcv_msg+0x113/0x300
  [   27.557781]  [<ffffffff8154742c>] ? __alloc_skb+0x7c/0x290
  [   27.557784]  [<ffffffff81564140>] ? __rtnl_unlock+0x20/0x20
  [   27.557789]  [<ffffffff8157f571>] netlink_rcv_skb+0xb1/0xc0
  [   27.557792]  [<ffffffff81560975>] rtnetlink_rcv+0x25/0x40
  [   27.557795]  [<ffffffff8157ee91>] netlink_unicast+0x1a1/0x220
  [   27.557798]  [<ffffffff8157f211>] netlink_sendmsg+0x301/0x3c0
  [   27.557804]  [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0
  [   27.557807]  [<ffffffff8153bf31>] ? sock_recvmsg+0xc1/0xf0
  [   27.557811]  [<ffffffff8153be5c>] __sys_sendmsg+0x3ac/0x3c0
  [   27.557815]  [<ffffffff8153de29>] sys_sendmsg+0x49/0x90
  [   27.557820]  [<ffffffff8166a5d9>] system_call_fastpath+0x16/0x1b
  [   27.557822] ---[ end trace 7813defcd04c69c4 ]---


Version-Release number of selected component (if applicable):

  # uname -a
  Linux gqaib-02.sbu.lab.eng.bos.redhat.com 3.9.6-200.fc18.x86_64 #1 SMP Thu Jun 13 18:56:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

  # modinfo cxgb3
  filename:       /lib/modules/3.9.6-200.fc18.x86_64/kernel/drivers/net/ethernet/chelsio/cxgb3/cxgb3.ko
  firmware:       cxgb3/ael2020_twx_edc.bin
  firmware:       cxgb3/ael2005_twx_edc.bin
  firmware:       cxgb3/ael2005_opt_edc.bin
  firmware:       cxgb3/t3c_psram-1.1.0.bin
  firmware:       cxgb3/t3b_psram-1.1.0.bin
  firmware:       cxgb3/t3fw-7.12.0.bin
  version:        1.1.5-ko
  license:        Dual BSD/GPL
  author:         Chelsio Communications
  description:    Chelsio T3 Network Driver
  srcversion:     72CE3DC4D9C62460CBB4FF6
  alias:          pci:v00001425d00000037sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000036sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000035sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000032sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000031sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000030sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000026sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000025sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000024sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000023sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000022sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000021sv*sd*bc*sc*i*
  alias:          pci:v00001425d00000020sv*sd*bc*sc*i*
  depends:        mdio
  intree:         Y
  vermagic:       3.9.6-200.fc18.x86_64 SMP mod_unload 
  parm:           dflt_msg_enable:Chelsio T3 default message enable bitmap (int)
  parm:           msi:whether to use MSI or MSI-X (int)
  parm:           ofld_disable:whether to enable offload at init time or not (int)

  # lspci -Qvvs 0a:00.0
  0a:00.0 Ethernet controller: Chelsio Communications Inc T320 10GbE Dual Port Adapter
	Subsystem: Chelsio Communications Inc Device 0001
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fcf7f000 (64-bit, non-prefetchable) [size=4K]
	Region 2: Memory at fc000000 (64-bit, non-prefetchable) [size=8M]
	Region 4: Memory at fcf7e000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at fce00000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [58] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [94] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [9c] MSI-X: Enable+ Count=32 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [100 v1] Device Serial Number 00-00-00-01-00-00-00-01
	Capabilities: [300 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr-
		AERCap:	First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: cxgb3

How reproducible:

  Every time I've rebooted (3 out of 3 so far).


Steps to Reproduce:
1. Reboot the box
2. dmesg
   Stack trace is near the end here


Additional info:

Box is on the Red Hat internal VPN, so can be accessed remotely to investigate if needed.
Comment 1 Josh Boyer 2013-07-01 13:28:41 EDT
Looks like cxgb3 is hitting this WARN_ON_ONCE:

       /*
         * In the slowpath, we sanity check order to avoid ever trying to
         * reclaim >= MAX_ORDER areas which will never succeed. Callers may
         * be using allocators in order of preference for an area that is
         * too large.
         */
        if (order >= MAX_ORDER) {
                WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
                return NULL;
        }
Comment 2 Justin Clift 2013-07-01 13:40:20 EDT
As a data point, that box has since been re-imaged with RHEL 6.4.  The stack trace doesn't show up there.  (guess it's in newer code) :D
Comment 3 Justin M. Forbes 2013-10-18 17:17:57 EDT
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.
Comment 4 Justin M. Forbes 2013-11-27 11:09:11 EST
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  

It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo.  Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue.  As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution

If you experience different issues, please open a new bug report for those.

Note You need to log in before you can comment on or make changes to this bug.