Description of problem: I have a Dell Inspiron 15R SE (7520) laptop with an Intel 7260-AC wifi card. When using the newest kernel release (4.8.15-300) the system doesn't boot up completely because NetworkManager hangs. When blacklisting the iwlwifi module via kernel command line, the system boots up completely; when insmod'ing iwlwifi later, all programs accessing network stuff in kernel (specifically NetworkManager, but also e.g. ifconfig) hang again. Trying to strace those processes makes strace hang as well. Booting an older kernel (4.8.14-300, 4.8.12-300) makes the wifi card work just fine. Version-Release number of selected component (if applicable): kernel-4.8.15-300.fc25.x86_64 How reproducible: Always Steps to Reproduce: 1. Use hardware described above with kernel release mentioned above 2. Boot system Actual results: Systemd boot is blocked for several minutes by NetworkManager, logging into GDM isn't possible after the NetworkManager task times out Expected results: Login should be possible
Created attachment 1238267 [details] dmesg | grep iwl
Could you install kernel-debug 4.8.15, boot it and provide dmesg ?
In the meantime it seems 4.8.16-300 was released, which seems to fix the problem for me. Were there any fixes in that area of is this rather unexpected? If the former, I guess we can close this; if the latter, I guess I need to fetch the debug packages for 4.8.15-300 from koji?
If new version fixes the problem we can close the bug.
Reopening, as the issue has reappeared in the 4.9.6 and 4.9.8 kernels. I'll attach dmesg with debug package installed, although there's nothing suspicious wifi related in there ... I wonder whether the KMS backtraces are related?
Created attachment 1255437 [details] dmesg with debug package
What I forgot to mention: 4.9.4 seems to be fine.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs. Fedora 25 has now been rebased to 4.10.9-200.fc25. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
The issue still persists, but it's dependant on the kernel version. Of the last 4 kernel releases, I get those results: - kernel-4.10.12-200.fc25.x86_64 -> doesn't work - kernel-4.10.9-200.fc25.x86_64 -> works - kernel-4.10.8-200.fc25.x86_64 -> doesn't work - kernel-4.10.6-200.fc25.x86_64 -> works For each of the versions, the state (either working or not working) is 100% reproducible. Please let me know if I should provide any further information
Are working and not-working kernels different variants i.e. ones are -debug kernels other standard kernels or all happen on standard kernels ? Please attach dmesg from latest working and non-working kernel.
All tests mentioned in comment #10 were done using the standard kernel. In the meantime I've found some additional piece of relevant information: The hangs don't seem to be root caused by iwlwifi, but the radeon driver. As my laptop has both the Intel graphics integrated into the CPU and an additional Radeon graphics chip, I turn off the latter as I don't really need it. I do that by writing OFF to /sys/kernel/debug/vgaswitcheroo/switch by using systemd's tmpfiles.d mechanism. In the 'broken' cases doing so yields a kernel stack trace (see the dmesg in comment #6 at around 6.5 seconds), which seems to trigger the network hangs (maybe a broken failure path that doesn't properly release a mutex or something?) When I stop writing to that node, network is working fine even with the kernels listed as broken in comment #10 - but that's of course only a workaround.
I forgot to look at dmesg from comment 6 . Yes indeed this looks like radeon issue. Seems locking is fine: [ 6.453923] random: crng init done [ 6.522323] radeon: switched off [ 6.522337] INFO: trying to register non-static key. [ 6.522363] the code is fine but needs lockdep annotation. [ 6.522379] turning off the locking correctness validator. but later we have oops: [ 6.523256] BUG: unable to handle kernel NULL pointer dereference at (null) [ 6.523299] IP: [<ffffffffbb49863b>] __list_add+0x1b/0xb0 [ 6.523334] PGD 0 <snip> [ 6.523362] Oops: 0000 [#1] SMP [ 6.524029] CPU: 2 PID: 771 Comm: systemd-tmpfile Not tainted 4.9.8-201.fc25.x86_64+debug #1 [ 6.524074] Hardware name: Dell Inc. Inspiron 7520/0PXH02, BIOS A11 02/20/2014 [ 6.524113] task: ffff970cba2f8000 task.stack: ffffb279c2aec000 [ 6.524147] RIP: 0010:[<ffffffffbb49863b>] [<ffffffffbb49863b>] __list_add+0x1b/0xb0 [ 6.524196] RSP: 0018:ffffb279c2aefbe0 EFLAGS: 00010046 [ 6.524227] RAX: ffff970cc78535b8 RBX: ffffb279c2aefc30 RCX: 0000000000000000 [ 6.524267] RDX: ffff970cc78535b8 RSI: 0000000000000000 RDI: ffffb279c2aefc30 [ 6.524306] RBP: ffffb279c2aefbf8 R08: 0000000000000000 R09: 0000000000000000 [ 6.524345] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 6.524383] R13: ffff970cc78535b8 R14: ffff970cba2f8000 R15: ffffffffbb91431b [ 6.524423] FS: 00007f86b9f37280(0000) GS:ffff970cce000000(0000) knlGS:0000000000000000 [ 6.524467] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.524500] CR2: 0000000000000000 CR3: 000000043a5a5000 CR4: 00000000001406e0 [ 6.524539] Stack: [ 6.524554] ffff970cc7853568 0000000000000246 ffff970cc7853570 ffffb279c2aefc90 [ 6.524597] ffffffffbb9142f1 ffffffffc019f5e0 0000000000000080 ffff970cc78535b8 [ 6.524639] ffffffffc019f5e0 ffff970cc78535d8 ffffb279c2aefc30 ffffb279c2aefc30 [ 6.524681] Call Trace: [ 6.525973] [<ffffffffbb9142f1>] mutex_lock_nested+0x131/0x3f0 [ 6.527077] [<ffffffffc019f5e0>] ? drm_modeset_lock_all+0x40/0x120 [drm] [ 6.528180] [<ffffffffc019f5e0>] ? drm_modeset_lock_all+0x40/0x120 [drm] [ 6.529262] [<ffffffffc019f5c5>] ? drm_modeset_lock_all+0x25/0x120 [drm] [ 6.530332] [<ffffffffc019f5e0>] drm_modeset_lock_all+0x40/0x120 [drm] [ 6.531407] [<ffffffffc05ff6fd>] radeon_suspend_kms+0x5d/0x3f0 [radeon]
Looks like we use dev->mode_config before it is initialized. Perhaps we should check radeon specific mode_config_initialized boolean to prevent oops: diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 621af06..723508e 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1246,6 +1246,9 @@ static void radeon_switcheroo_set_state(struct pci_dev *pdev, enum vga_switchero if (radeon_is_px(dev) && state == VGA_SWITCHEROO_OFF) return; + if (!rdev->mode_info.mode_config_initialized) + return; + if (state == VGA_SWITCHEROO_ON) { unsigned d3_delay = dev->pdev->d3_delay;
(In reply to Stanislaw Gruszka from comment #14) > Looks like we use dev->mode_config before it is initialized. Perhaps we > should check radeon specific mode_config_initialized boolean to prevent oops: > > diff --git a/drivers/gpu/drm/radeon/radeon_device.c > b/drivers/gpu/drm/radeon/radeon_device.c > index 621af06..723508e 100644 > --- a/drivers/gpu/drm/radeon/radeon_device.c > +++ b/drivers/gpu/drm/radeon/radeon_device.c > @@ -1246,6 +1246,9 @@ static void radeon_switcheroo_set_state(struct pci_dev > *pdev, enum vga_switchero > if (radeon_is_px(dev) && state == VGA_SWITCHEROO_OFF) > return; > > + if (!rdev->mode_info.mode_config_initialized) > + return; > + > if (state == VGA_SWITCHEROO_ON) { > unsigned d3_delay = dev->pdev->d3_delay; Could you try: https://patchwork.freedesktop.org/patch/155277/
(In reply to Rob Clark from comment #15) > Could you try: https://patchwork.freedesktop.org/patch/155277/ If by 'you' you mean me: sure ... would it be possible to get an RPM built with this patch, though?
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.