Bug 966149 - after resume: NETDEV WATCHDOG: eth0 (calxedaxgmac): transmit queue 0 timed out
after resume: NETDEV WATCHDOG: eth0 (calxedaxgmac): transmit queue 0 timed out
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
18
arm Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-22 11:31 EDT by John Brier
Modified: 2014-02-05 16:29 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-05 16:29:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Brier 2013-05-22 11:31:53 EDT
Description of problem:

sleep system with rtcwake command and network doesn't come back as it should and a call trace appears

Version-Release number of selected component (if applicable):
3.9.2-200.fc18.armv7hl

How reproducible:
I've seen it twice so far but I've only tried twice.

Steps to Reproduce:
1.# SLEEP_TIME=2
2.# state=mem
3.# rtcwake -m $state -t `date -u -d "+ $SLEEP_TIME minutes" '+%s'`

Actual results:

The system comes back from suspend but the network is down. After some time a call trace appears and then restarting NetworkManager allows the network to come up


[jbrier@jb-t520 ~]$ console calxeda-arm-soc-02.<removed>
[Enter `^Ec?' for help]
[-- MOTD -- https://home.corp.redhat.com/wiki/conserver]
[51463.906739] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[51463.907153] sd 2:0:0:0: [sda] Stopping disk
[51464.511044] PM: suspend of devices complete after 604.397 msecs
[51464.511209] PM: late suspend of devices complete after 0.159 msecs
[51464.511398] PM: noirq suspend of devices complete after 0.185 msecs
[51464.511401] Disabling non-boot CPUs ...
[51464.512147] CPU1: shutdown
[51464.513314] CPU2: shutdown
[51464.514389] CPU3: shutdown
[51464.515115] Enabling non-boot CPUs ...
[51464.515210] CPU1: Booted secondary processor
[51464.515576] CPU1 is up
[51464.515639] CPU2: Booted secondary processor
[51464.516036] CPU2 is up
[51464.516101] CPU3: Booted secondary processor
[51464.516581] CPU3 is up
[51464.516760] PM: noirq resume of devices complete after 0.172 msecs
[51464.516926] PM: early resume of devices complete after 0.120 msecs
[51464.887558] ata5: SATA link down (SStatus 0 SControl 300)
[51464.917558] ata4: SATA link down (SStatus 0 SControl 300)
[51464.947560] ata2: SATA link down (SStatus 0 SControl 300)
[51467.357558] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[51467.372209] ata3.00: configured for UDMA/133
[51467.372306] sd 2:0:0:0: [sda] Starting disk
[51467.374340] PM: resume of devices complete after 2857.407 msecs
[51467.486737] Restarting tasks ... done.
[52485.087554] ------------[ cut here ]------------
[52485.092184] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x158/0x260()
[52485.099429] NETDEV WATCHDOG: eth0 (calxedaxgmac): transmit queue 0 timed out
[52485.106474] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables highbank_mc_edac xgmac highbank_l2_edac edac_core sata_highbank libahci
[52485.140194] [<c0015f8c>] (unwind_backtrace+0x0/0x124) from [<c003db30>] (warn_slowpath_common+0x54/0x74)
[52485.149699] [<c003db30>] (warn_slowpath_common+0x54/0x74) from [<c003dbe8>] (warn_slowpath_fmt+0x34/0x44)
[52485.159306] [<c003dbe8>] (warn_slowpath_fmt+0x34/0x44) from [<c04709fc>] (dev_watchdog+0x158/0x260)
[52485.168378] [<c04709fc>] (dev_watchdog+0x158/0x260) from [<c004b868>] (call_timer_fn+0x54/0x120)
[52485.177173] [<c004b868>] (call_timer_fn+0x54/0x120) from [<c004ce94>] (run_timer_softirq+0x1f8/0x278)
[52485.186420] [<c004ce94>] (run_timer_softirq+0x1f8/0x278) from [<c0045d60>] (__do_softirq+0x10c/0x25c)
[52485.195671] [<c0045d60>] (__do_softirq+0x10c/0x25c) from [<c00461d0>] (irq_exit+0x7c/0xc4)
[52485.203962] [<c00461d0>] (irq_exit+0x7c/0xc4) from [<c000ecf8>] (handle_IRQ+0x70/0x94)
[52485.211914] [<c000ecf8>] (handle_IRQ+0x70/0x94) from [<c00086bc>] (gic_handle_irq+0x44/0x68)
[52485.220379] [<c00086bc>] (gic_handle_irq+0x44/0x68) from [<c055eac0>] (__irq_svc+0x40/0x50)
[52485.228755] Exception stack(0xc082df70 to 0xc082dfb8)
[52485.233802] BUG: scheduling while atomic: swapper/0/0/0x40000100
[52485.239824] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables highbank_mc_edac xgmac highbank_l2_edac edac_core sata_highbank libahci
[52485.273472] [<c0015f8c>] (unwind_backtrace+0x0/0x124) from [<c05571e8>] (__schedule_bug+0x50/0x68)
[52485.282456] [<c05571e8>] (__schedule_bug+0x50/0x68) from [<c055cd04>] (__schedule+0x78/0x764)
[52485.291020] [<c055cd04>] (__schedule+0x78/0x764) from [<c006b9dc>] (__cond_resched+0x2c/0x3c)
[52485.299571] [<c006b9dc>] (__cond_resched+0x2c/0x3c) from [<c055d494>] (_cond_resched+0x44/0x4c)
[52485.308294] [<c055d494>] (_cond_resched+0x44/0x4c) from [<c0011fc4>] (dump_mem+0x98/0x150)
[52485.316562] [<c0011fc4>] (dump_mem+0x98/0x150) from [<c0016094>] (unwind_backtrace+0x108/0x124)
[52485.325288] [<c0016094>] (unwind_backtrace+0x108/0x124) from [<c003db30>] (warn_slowpath_common+0x54/0x74)
[52485.334973] [<c003db30>] (warn_slowpath_common+0x54/0x74) from [<c003dbe8>] (warn_slowpath_fmt+0x34/0x44)
[52485.344564] [<c003dbe8>] (warn_slowpath_fmt+0x34/0x44) from [<c04709fc>] (dev_watchdog+0x158/0x260)
[52485.353642] [<c04709fc>] (dev_watchdog+0x158/0x260) from [<c004b868>] (call_timer_fn+0x54/0x120)
[52485.362456] [<c004b868>] (call_timer_fn+0x54/0x120) from [<c004ce94>] (run_timer_softirq+0x1f8/0x278)
[52485.371708] [<c004ce94>] (run_timer_softirq+0x1f8/0x278) from [<c0045d60>] (__do_softirq+0x10c/0x25c)
[52485.380953] [<c0045d60>] (__do_softirq+0x10c/0x25c) from [<c00461d0>] (irq_exit+0x7c/0xc4)
[52485.389249] [<c00461d0>] (irq_exit+0x7c/0xc4) from [<c000ecf8>] (handle_IRQ+0x70/0x94)
[52485.397163] [<c000ecf8>] (handle_IRQ+0x70/0x94) from [<c00086bc>] (gic_handle_irq+0x44/0x68)
[52485.405626] [<c00086bc>] (gic_handle_irq+0x44/0x68) from [<c055eac0>] (__irq_svc+0x40/0x50)
[52485.414001] Exception stack(0xc082df70 to 0xc082dfb8)
[52485.419073] df60:                                     c08d11b4 a00f0093 00000000 00000000
[52485.427246] df80: c082c000 c05692a8 c08344f8 c3b14940 0000406a 413fc090 00000000 00000000
[52485.435444] dfa0: 00000000 c082dfb8 c000f0d8 c000f0dc 600f0013 ffffffff
[52485.442089] [<c055eac0>] (__irq_svc+0x40/0x50) from [<c000f0dc>] (cpu_idle+0xbc/0x108)
[52485.450036] [<c000f0dc>] (cpu_idle+0xbc/0x108) from [<c07ad808>] (start_kernel+0x2fc/0x35c)
[52485.458437] df60:                                     c08d11b4 a00f0093 00000000 00000000
[52485.466612] df80: c082c000 c05692a8 c08344f8 c3b14940 0000406a 413fc090 00000000 00000000
[52485.474806] BUG: scheduling while atomic: swapper/0/0/0x40000100
[52485.480822] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables highbank_mc_edac xgmac highbank_l2_edac edac_core sata_highbank libahci
[52485.514459] [<c0015f8c>] (unwind_backtrace+0x0/0x124) from [<c05571e8>] (__schedule_bug+0x50/0x68)
[52485.523434] [<c05571e8>] (__schedule_bug+0x50/0x68) from [<c055cd04>] (__schedule+0x78/0x764)
[52485.531983] [<c055cd04>] (__schedule+0x78/0x764) from [<c006b9dc>] (__cond_resched+0x2c/0x3c)
[52485.540527] [<c006b9dc>] (__cond_resched+0x2c/0x3c) from [<c055d494>] (_cond_resched+0x44/0x4c)
[52485.549247] [<c055d494>] (_cond_resched+0x44/0x4c) from [<c0011fc4>] (dump_mem+0x98/0x150)
[52485.557513] [<c0011fc4>] (dump_mem+0x98/0x150) from [<c0016094>] (unwind_backtrace+0x108/0x124)
[52485.566229] [<c0016094>] (unwind_backtrace+0x108/0x124) from [<c003db30>] (warn_slowpath_common+0x54/0x74)
[52485.575900] [<c003db30>] (warn_slowpath_common+0x54/0x74) from [<c003dbe8>] (warn_slowpath_fmt+0x34/0x44)
[52485.585484] [<c003dbe8>] (warn_slowpath_fmt+0x34/0x44) from [<c04709fc>] (dev_watchdog+0x158/0x260)
[52485.594547] [<c04709fc>] (dev_watchdog+0x158/0x260) from [<c004b868>] (call_timer_fn+0x54/0x120)
[52485.603351] [<c004b868>] (call_timer_fn+0x54/0x120) from [<c004ce94>] (run_timer_softirq+0x1f8/0x278)
[52485.612589] [<c004ce94>] (run_timer_softirq+0x1f8/0x278) from [<c0045d60>] (__do_softirq+0x10c/0x25c)
[52485.621826] [<c0045d60>] (__do_softirq+0x10c/0x25c) from [<c00461d0>] (irq_exit+0x7c/0xc4)
[52485.630114] [<c00461d0>] (irq_exit+0x7c/0xc4) from [<c000ecf8>] (handle_IRQ+0x70/0x94)
[52485.638048] [<c000ecf8>] (handle_IRQ+0x70/0x94) from [<c00086bc>] (gic_handle_irq+0x44/0x68)
[52485.646487] [<c00086bc>] (gic_handle_irq+0x44/0x68) from [<c055eac0>] (__irq_svc+0x40/0x50)
[52485.654846] Exception stack(0xc082df70 to 0xc082dfb8)
[52485.659909] df60:                                     c08d11b4 a00f0093 00000000 00000000
[52485.668098] df80: c082c000 c05692a8 c08344f8 c3b14940 0000406a 413fc090 00000000 00000000
[52485.676270] dfa0: 00000000 c082dfb8 c000f0d8 c000f0dc 600f0013 ffffffff
[52485.682901] [<c055eac0>] (__irq_svc+0x40/0x50) from [<c000f0dc>] (cpu_idle+0xbc/0x108)
[52485.690838] [<c000f0dc>] (cpu_idle+0xbc/0x108) from [<c07ad808>] (start_kernel+0x2fc/0x35c)
[52485.699445] dfa0: 00000000 c082dfb8 c000f0d8 c000f0dc 600f0013 ffffffff
[52485.706066] [<c055eac0>] (__irq_svc+0x40/0x50) from [<c000f0dc>] (cpu_idle+0xbc/0x108)
[52485.714006] [<c000f0dc>] (cpu_idle+0xbc/0x108) from [<c07ad808>] (start_kernel+0x2fc/0x35c)
[52485.722372] ---[ end trace da227214a82491b9 ]---



Expected results:

no call trace and network is up after resume without manual restart of NetworkManager

Additional info:

unloading kernel module xgmac and reloading it allows the network to come up immediately

rtcwake won't work at all on 3.6.x kernel as it can't set the alarm.
Comment 2 Justin M. Forbes 2013-10-18 17:00:25 EDT
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.
Comment 4 Jeff Bastian 2013-10-21 14:03:14 EDT
(In reply to Justin M. Forbes from comment #2)
> Fedora 18 has now been rebased to 3.11.4-101.fc18.  


The newest kernel as of today that I see in koji is 3.10.10-200.fc19.armv7hl:
http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=174716

So I tested with 3.10.10-200.fc19.armv7hl


[   87.454978] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[   87.462030] ata3.00: failed command: READ FPDMA QUEUED
[   87.467225] ata3.00: cmd 60/08:00:20:36:50/00:00:1b:00:00/40 tag 0 ncq 4096 in
[   87.467225]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.481845] ata3.00: status: { DRDY }
[   94.583877] ata3.00: revalidation failed (errno=-5)
[  105.162828] ata3.00: revalidation failed (errno=-5)

Fedora release 19 (Schrödinger’s Cat)
Kernel 3.10.10-200.fc19.armv7hl on an armv7l (ttyAMA0)

calxeda-arm-soc-03 login: 


After another 40 seconds, I got hundreds of errors about the disk and filesystem:

[  148.476491] end_request: I/O error, dev sda, sector 458241568
[  148.482272] sd 2:0:0:0: rejecting I/O to offline device
[  148.487564] sd 2:0:0:0: rejecting I/O to offline device
[  148.492797] Buffer I/O error on device dm-1, logical block 13829
[  148.498823] sd 2:0:0:0: rejecting I/O to offline device
[  148.504048] Buffer I/O error on device dm-1, logical block 13835
[  148.510067] sd 2:0:0:0: rejecting I/O to offline device
[  148.515290] Buffer I/O error on device dm-1, logical block 13860
[  148.521309] sd 2:0:0:0: rejecting I/O to offline device
...
[  149.490302] EXT4-fs error (device dm-1): ext4_read_inode_bitmap:175: comm systemd-readahe: Cannot read inode bitmap - block_group = 0, inode_bitmap = 1041
[  149.490307] EXT4-fs (dm-1): previous I/O error to superblock detected
[  149.490338] sd 2:0:0:0: rejecting I/O to offline device
[  149.490347] Buffer I/O error on device dm-1, logical block 0
[  149.490366] sd 2:0:0:0: rejecting I/O to offline device
[  149.490478] Aborting journal on device dm-1-8.
[  149.490500] systemd-readahead[278]: Failed to open pack file: Input/output error
[  149.490513] Buffer I/O error on device dm-1, logical block 6324224
[  149.490537] JBD2: Error -5 detected when updating journal superblock for dm-1-8.
[  149.490562] EXT4-fs (dm-1): previous I/O error to superblock detected
[  149.490605] Buffer I/O error on device dm-1, logical block 0
[  149.490622] EXT4-fs error (device dm-1): __ext4_journal_start_sb:62: Detected aborted journal
[  149.490625] EXT4-fs (dm-1): Remounting filesystem read-only
[  149.490629] EXT4-fs (dm-1): previous I/O error to superblock detected
[  149.490653] Buffer I/O error on device dm-1, logical block 0


And then systemd-journald started failing due to the read-only root filesystem:

[  149.490782] systemd-journald[295]: Failed to rotate /var/log/journal/f9a6b136aa144ef8aa8a2fd8e873e8fb/system.journal: Read-only file system
[  149.491138] systemd-journald[295]: Failed to write entry, ignoring: Bad message
[  149.492447] systemd-journald[295]: Failed to rotate /var/log/journal/f9a6b136aa144ef8aa8a2fd8e873e8fb/system.journal: Read-only file system
[  149.492765] systemd-journald[295]: Failed to write entry, ignoring: Bad message
[  149.492918] systemd-journald[295]: Failed to rotate /var/log/journal/f9a6b136aa144ef8aa8a2fd8e873e8fb/system.journal: Read-only file system
[  149.493212] systemd-journald[295]: Failed to write entry, ignoring: Bad message
[  149.493281] systemd-journald[295]: Failed to write entry, ignoring: Read-only file system
[  149.494878] systemd-journald[295]: Failed to write entry, ignoring: Read-only file system
[  149.494954] systemd-journald[295]: Failed to write entry, ignoring: Read-only file system
...
[  152.710077] systemd-journald[295]: Failed to write entry, ignoring: Read-only file system
[  152.718392] systemd-journald[295]: Failed to write entry, ignoring: Read-only file system
[  152.726650] systemd-journald[295]: Failed to write entry, ignoring: Read-only file system



I do not see any of the original errors regarding "NETDEV WATCHDOG: eth0 (calxedaxgmac): transmit queue 0 timed out", however, the system clearly does not sleep and wake very well.
Comment 5 Jeff Bastian 2013-10-21 14:04:53 EDT
(In reply to Jeff Bastian from comment #4)
> So I tested with 3.10.10-200.fc19.armv7hl


Oops, I forgot to finish this sentence!

I tested with 3.10.10-200.fc19 and when the system went to sleep and woke up again 2 minutes later, I saw a number of errors involving the disk:
Comment 6 Fedora End Of Life 2013-12-21 08:40:12 EST
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 7 Fedora End Of Life 2014-02-05 16:29:17 EST
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.