Because of bug 709322, I upgraded my iwl4965 to a iwl6300. In general it seems to work better, but there is a new bug I'm experiencing. At some point, it loses the ability to see ARP replies. The request goes out fine, as does the reply from the router. But the reply is never seen on the client. Sometimes toggling promiscious mode via tcpdump resolves the issue. The card sees other packets just fine. It could be all unicast packages that are lost, but I haven't verified this theory. Nothing in dmesg, traffic just stops.
This popped up in dmesg eventually: [ 4240.820737] iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000. [ 4240.820749] iwlwifi 0000:03:00.0: CSR values: [ 4240.820755] iwlwifi 0000:03:00.0: (2nd byte of CSR_INT_COALESCING is CSR_INT_PERIODIC_REG) [ 4240.820766] iwlwifi 0000:03:00.0: CSR_HW_IF_CONFIG_REG: 0X0048d304 [ 4240.820776] iwlwifi 0000:03:00.0: CSR_INT_COALESCING: 0X00000040 [ 4240.820786] iwlwifi 0000:03:00.0: CSR_INT: 0X00000000 [ 4240.820796] iwlwifi 0000:03:00.0: CSR_INT_MASK: 0X00000000 [ 4240.820806] iwlwifi 0000:03:00.0: CSR_FH_INT_STATUS: 0X00000000 [ 4240.820815] iwlwifi 0000:03:00.0: CSR_GPIO_IN: 0X0000000f [ 4240.820825] iwlwifi 0000:03:00.0: CSR_RESET: 0X00000000 [ 4240.820834] iwlwifi 0000:03:00.0: CSR_GP_CNTRL: 0X080403c5 [ 4240.820844] iwlwifi 0000:03:00.0: CSR_HW_REV: 0X00000074 [ 4240.820854] iwlwifi 0000:03:00.0: CSR_EEPROM_REG: 0X81a50ffd [ 4240.820863] iwlwifi 0000:03:00.0: CSR_EEPROM_GP: 0X90000001 [ 4240.820873] iwlwifi 0000:03:00.0: CSR_OTP_GP_REG: 0X00030001 [ 4240.820882] iwlwifi 0000:03:00.0: CSR_GIO_REG: 0X00080044 [ 4240.820892] iwlwifi 0000:03:00.0: CSR_GP_UCODE_REG: 0X000007d1 [ 4240.820902] iwlwifi 0000:03:00.0: CSR_GP_DRIVER_REG: 0X00000000 [ 4240.820912] iwlwifi 0000:03:00.0: CSR_UCODE_DRV_GP1: 0X00000000 [ 4240.820921] iwlwifi 0000:03:00.0: CSR_UCODE_DRV_GP2: 0X00000000 [ 4240.820930] iwlwifi 0000:03:00.0: CSR_LED_REG: 0X00000058 [ 4240.820940] iwlwifi 0000:03:00.0: CSR_DRAM_INT_TBL_REG: 0X88136c93 [ 4240.820949] iwlwifi 0000:03:00.0: CSR_GIO_CHICKEN_BITS: 0X27800200 [ 4240.820959] iwlwifi 0000:03:00.0: CSR_ANA_PLL_CFG: 0X00000000 [ 4240.820969] iwlwifi 0000:03:00.0: CSR_HW_REV_WA_REG: 0X0001001a [ 4240.820978] iwlwifi 0000:03:00.0: CSR_DBG_HPET_MEM_REG: 0Xffff0000 [ 4240.820985] iwlwifi 0000:03:00.0: FH register values: [ 4240.821004] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X13843800 [ 4240.821023] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X01384480 [ 4240.821043] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_WPTR: 0X00000058 [ 4240.821062] iwlwifi 0000:03:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80811104 [ 4240.821082] iwlwifi 0000:03:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X000000fc [ 4240.821101] iwlwifi 0000:03:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X07030000 [ 4240.821120] iwlwifi 0000:03:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 [ 4240.821140] iwlwifi 0000:03:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0001 [ 4240.821160] iwlwifi 0000:03:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000 [ 4240.821167] iwlwifi 0000:03:00.0: Loaded firmware version: 9.221.4.1 build 25532 [ 4240.821318] iwlwifi 0000:03:00.0: Start IWL Error Log Dump: [ 4240.821325] iwlwifi 0000:03:00.0: Status: 0x000022CC, count: 5 [ 4240.821332] iwlwifi 0000:03:00.0: 0x00000005 | SYSASSERT [ 4240.821338] iwlwifi 0000:03:00.0: 0x000222EC | uPc [ 4240.821345] iwlwifi 0000:03:00.0: 0x00022258 | branchlink1 [ 4240.821351] iwlwifi 0000:03:00.0: 0x00022258 | branchlink2 [ 4240.821357] iwlwifi 0000:03:00.0: 0x00001532 | interruptlink1 [ 4240.821363] iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2 [ 4240.821369] iwlwifi 0000:03:00.0: 0x00000095 | data1 [ 4240.821374] iwlwifi 0000:03:00.0: 0x000000E1 | data2 [ 4240.821380] iwlwifi 0000:03:00.0: 0x00000237 | line [ 4240.821386] iwlwifi 0000:03:00.0: 0x0001769A | beacon time [ 4240.821393] iwlwifi 0000:03:00.0: 0x00001966 | tsf low [ 4240.821398] iwlwifi 0000:03:00.0: 0x00000000 | tsf hi [ 4240.821404] iwlwifi 0000:03:00.0: 0x00000000 | time gp1 [ 4240.821410] iwlwifi 0000:03:00.0: 0x00339367 | time gp2 [ 4240.821416] iwlwifi 0000:03:00.0: 0x00000000 | time gp3 [ 4240.821422] iwlwifi 0000:03:00.0: 0x000109DD | uCode version [ 4240.821428] iwlwifi 0000:03:00.0: 0x00000074 | hw version [ 4240.821434] iwlwifi 0000:03:00.0: 0x0048D304 | board version [ 4240.821440] iwlwifi 0000:03:00.0: 0x04450080 | hcmd [ 4240.821446] iwlwifi 0000:03:00.0: 0x02023080 | isr0 [ 4240.821452] iwlwifi 0000:03:00.0: 0x0103E000 | isr1 [ 4240.821458] iwlwifi 0000:03:00.0: 0x0000001A | isr2 [ 4240.821464] iwlwifi 0000:03:00.0: 0x0140D8C0 | isr3 [ 4240.821470] iwlwifi 0000:03:00.0: 0x00000000 | isr4 [ 4240.821476] iwlwifi 0000:03:00.0: 0x01000112 | isr_pref [ 4240.821482] iwlwifi 0000:03:00.0: 0x0001536C | wait_event [ 4240.821488] iwlwifi 0000:03:00.0: 0x00000080 | l2p_control [ 4240.821494] iwlwifi 0000:03:00.0: 0x00000000 | l2p_duration [ 4240.821501] iwlwifi 0000:03:00.0: 0x0000003F | l2p_mhvalid [ 4240.821507] iwlwifi 0000:03:00.0: 0x00200200 | l2p_addr_match [ 4240.821513] iwlwifi 0000:03:00.0: 0x00000005 | lmpm_pmg_sel [ 4240.821519] iwlwifi 0000:03:00.0: 0x02061043 | timestamp [ 4240.821525] iwlwifi 0000:03:00.0: 0x00005860 | flow_handler [ 4240.821588] iwlwifi 0000:03:00.0: Log capacity 1024 is bogus, limit to 512 entries [ 4240.821594] iwlwifi 0000:03:00.0: Start IWL Event Log Dump: display last 20 entries [ 4240.821619] iwlwifi 0000:03:00.0: EVT_LOGT:0003378146:0x0178d064:1334 [ 4240.821636] iwlwifi 0000:03:00.0: EVT_LOGT:0003378152:0x001e0000:1334 [ 4240.821652] iwlwifi 0000:03:00.0: EVT_LOGT:0003378155:0x0000005e:1334 [ 4240.821669] iwlwifi 0000:03:00.0: EVT_LOGT:0003378156:0x0178d064:1334 [ 4240.821685] iwlwifi 0000:03:00.0: EVT_LOGT:0003378157:0x00000018:0484 [ 4240.821702] iwlwifi 0000:03:00.0: EVT_LOGT:0003378170:0x00000019:0484 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378183:0x00000259:1108 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378183:0x00000024:1108 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378184:0x00000001:1108 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378184:0x00000033:1108 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378189:0x00000001:0463 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378191:0x00000001:0462 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003378217:0x00000001:1575 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003379891:0x000001b4:0602 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003379894:0x000000df:0002 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003380018:0x04450080:0401 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003380020:0x04450080:0700 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003380020:0x00000000:0706 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003380048:0x00000018:0452 [ 4240.821717] iwlwifi 0000:03:00.0: EVT_LOGT:0003380077:0x00000000:0125 [ 4242.810158] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms. [ 4242.810170] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 69 write_ptr 70 [ 4242.810528] ieee80211 phy0: Hardware restart was requested
the sysassert is a duplicate of bug 908304 ... the arp thing I'm not sure about
I did some more testing, and as far as I can tell, it is all unicast traffic that is dropped. Some ingress MAC address filter that decides to go bonkers?
Oh, and it seems I get one of those firmware crash reports every time it happens. But there is some delay to it. Might be when I force it out of its wedged state when bringing the device down and back up again.
Apparently not. Seems like the issue resolves itself if left alone (after ~10-20 minutes). The laptop was left running during the night, and it had encountered the problem at least five times. Not a single line in dmesg though.
Created attachment 704453 [details] tcpdump -i wlan0 -n -p -w bad.pcap -s0 Did a packet dump and turned on debugging when it happened this time. Hopefully it might give you some clue.
Created attachment 704454 [details] messages echo 0xffffffff > /sys/module/iwlwifi/parameters/debug
This log doesn't contain the SYSASSERT - but I will try to debug without it and get back to you.
It did not, no. I was primarily focusing on the packet loss. The sequence of events was: 1. I noticed the network breaking on me. arp -n was also reporting failure to find the gateway. 2. I turned on debugging. 3. I started tcpdump. 4. I stopped tcpdump. 5. I turned off debugging. 6. I restarted the interface (this is where the SYSASSERT shows up).
Oh, and disabling N seems to avoid the issue. Been running for a day now without incident.
(In reply to comment #10) > Oh, and disabling N seems to avoid the issue. Been running for a day now > without incident. We are chasing an 11n issue in which the queues gets stalled. You would see: fail to flush queues" or something similar in the logs. So you basically have 2 issues: the scanning issue and the 11n issue... Both are being handled now.
(In reply to Emmanuel Grumbach from comment #11) > So you basically have 2 issues: the scanning issue and the 11n issue... Both > are being handled now. Emmanuel, any progress on this ?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo. Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue. As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution If you experience different issues, please open a new bug report for those.