Bug 1938323

Summary: Wifi settings panel causes gnome-control-panel to hang
Product: Red Hat Enterprise Linux 8 Reporter: Andrew Mike <amike>
Component: gnome-control-centerAssignee: Benjamin Berg <bberg>
gnome-control-center sub component: Network and Wi-Fi panels QA Contact: Filip Pokryvka <fpokryvk>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: alanm, bberg, brclark, casantos, cgarnach, feborges, fpokryvk, it, jesse.thilo, jkoten, jwright, mkielian, mkolbas, redhat-bugzilla, sbarcomb, srinidhi.s, tpopela
Version: 8.3Keywords: Triaged, ZStream
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: gnome-control-center-3.28.2-29.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2024646 2024649 2024650 2024706 2024707 (view as bug list) Environment:
Last Closed: 2022-05-10 14:03:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2024649, 2024650, 2024706, 2024707    

Description Andrew Mike 2021-03-12 19:42:15 UTC
Description of problem:
When attempting to open the WiFi settings in RHEL 8.3 with the gnome-control-panel, the entire gnome-control-panel application hard locks and does not respond.  This appears to happen in places with many broadcasting APs.

Version-Release number of selected component (if applicable):
3.28.2-22

How reproducible: 100% by customer, but cannot reproduce locally due to lack of hardware.

Steps to Reproduce:
1. Go to a place with many Wireless APs in the area.
2. Attempt to open the menu of wireless networks in the GNOME control panel.

Actual results: GNOME control panel freezes.

Expected results: GNOME control panel does not freeze.

Additional info:
- The primary mechanism seems to be gnome-control-center screwing up dbus:

dbus-daemon[1546]: [system] The maximum number of pending replies for ":1.33663" (uid=1000 pid=41349 comm="gnome-control-center wifi " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023") has been reached (max_replies_per_connection=1024)

-Customer reports that the issue seems to be hardware-agnostic.

Comment 1 Benjamin Berg 2021-03-16 10:42:14 UTC
This issue has been fixed upstream. Unfortunately, the fix is really involved and backporting it to RHEL 8 would effectively require updating the network configuration code in its entirety.

So, my take is that this is a "wontfix". Unless someone makes a good case for updating the whole network configuration part of gnome-control-center (and it turns out that this is indeed feasible to do).

Comment 2 Benjamin Berg 2021-03-16 10:48:35 UTC
Uhh, wait, I just read the description again. So, my comment was in relation to the fact that updating the list of wifi connections is really slow (CPU time wise).

However:

dbus-daemon[1546]: [system] The maximum number of pending replies for ":1.33663" (uid=1000 pid=41349 comm="gnome-control-center wifi " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023") has been reached (max_replies_per_connection=1024)

means there is something else going on. So the system bus is getting flooded. But, I am not sure what is happening here …

Comment 4 KazuarIT 2021-07-25 12:18:18 UTC
Hey, @bberg, We're encountering the same issue with RHEL 8.3. This issue has become critical for us.
1. Is there any configurational Workaround to overcome it?
2. In your last comment "2021-03-16 10:48:35 UTC", you mentioned you might found something? What is the status of that? When should we expect a fix for RHEL 8.3 for this issue? 

In this current state, RHEL 8.3 has become a blocker for our product, and moving to a different RHEL version (8.4) is not an option due to this specific issue. We need at least a Workaround (better a fix) to continue with RHEL 8.3.

Please advise.

Comment 5 Benjamin Berg 2021-07-26 09:52:14 UTC
The proper fix for this is likely to update gnome-control-center so that it processes requests at a reasonable rate. This will happen in RHEL 9 only, with a low chance of a fix going into RHEL 8.5.

You might be able to work around the issue by modifying /etc/dbus-1/system.conf, and adding (in between the <busconfig></busconfig> tags):

  <limit name="max_replies_per_connection">4096</limit>

(or adding a file to /etc/dbus1-/system.d/ with the appropriate line wrapped by the "busconfig" tag). Reboot after changing the configuration.

However, it is not clear that this will fix the issue, as there might be a priority inversion issue where processing the replies has a lower priority than updating the UI.

Please try to change the configuration and see if the issue still occurs afterwards.

Comment 6 KazuarIT 2021-07-26 10:43:42 UTC
Thank you @bberg for the fast reply. I'll try the workaround and will update you accordingly.

Comment 7 KazuarIT 2021-07-26 14:45:11 UTC
@bberg, I checked the configuration you gave me. Unfortunately, it didn't work. 

What I found, after playing with the Wifi window, is that it hangs in different scenarios, for example,

- Passing between Wifi Networks, where the to destination network "requires authentication" - it takes ~20 seconds to display the password windows.
- Forgetting a WiFi network while connecting to it.

Can you help me understand how can I debug this issue and to collect more valuable logs?

Comment 8 Benjamin Berg 2021-07-26 15:05:02 UTC
Are you in a really busy environment with a lot of APs?

The old g-c-c code was inefficient at handling the AP list, we are talking of quadratic algorithms here doing UI updates.

As such, if you are in a busy environment, then I totally expect the gnome-control-center window to freeze for periods of times and become almost unusable. If this is not a busy environment, then there might be another issue (in particular, things continuing after 25s sounds like a timeout on DBus).

If this is a DBus timeout, then you should see error messages in the output of g-c-c (or the logs). You can also try to run "G_MESSAGES_DEBUG=all gnome-control-center" in the terminal which could give you more information about what is going on.

Comment 9 KazuarIT 2021-07-27 12:17:02 UTC
@bberg, we're detecting ~25 AP. Maybe more. Just to clarify, RHEL 8.2 is working as expected and is able to provide a good user experience without any major hangs. The issue is only in RHEL 8.3 (looks like a degradation).

I'll try to enable the debug you mentioned (thanks for your reply) and to see if I can find something. Any help from your side to assist with this issue will be appreciated. As I mentioned, it is a critical issue as the UX is horrible.

Comment 10 KazuarIT 2021-07-28 11:30:31 UTC
Hey @bberg, I need your help understanding my findings.

- After enabling the debug on gnome_control_center I noticed that it's in-kind of a loop adding the same AP (~33 in my environment) to its list.
- Then I moved to gdb and saw it is signaled to update the list every second or so. The "populate_ap_list" is called over and over again.

#0  0x00007f61d936ae83 in g_hash_table_lookup () from /lib64/libglib-2.0.so.0
#1  0x00007f61d9673ce6 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0
#2  0x00007f61d9675043 in g_signal_emit () from /lib64/libgobject-2.0.so.0
#3  0x00007f61d965cdf4 in g_object_dispatch_properties_changed () from /lib64/libgobject-2.0.so.0
#4  0x00007f61d965f3f1 in g_object_notify_by_pspec () from /lib64/libgobject-2.0.so.0
#5  0x00007f61d8d9c810 in gtk_widget_init () from /lib64/libgtk-3.so.0
#6  0x00007f61d967b8ad in g_type_create_instance () from /lib64/libgobject-2.0.so.0
#7  0x00007f61d965dfc8 in g_object_new_internal () from /lib64/libgobject-2.0.so.0
#8  0x00007f61d965ff4e in g_object_new_valist () from /lib64/libgobject-2.0.so.0
#9  0x00007f61d96602ad in g_object_new () from /lib64/libgobject-2.0.so.0
#10 0x0000560e781742d3 in make_row.constprop ()
#11 0x0000560e781753e3 in populate_ap_list ()
#12 0x00007f61d96583bd in g_closure_invoke () from /lib64/libgobject-2.0.so.0
#13 0x00007f61d966b945 in signal_emit_unlocked_R () from /lib64/libgobject-2.0.so.0
#14 0x00007f61d9674a06 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0
#15 0x00007f61d9675043 in g_signal_emit () from /lib64/libgobject-2.0.so.0
#16 0x00007f61d965cdf4 in g_object_dispatch_properties_changed () from /lib64/libgobject-2.0.so.0
#17 0x00007f61d965c72e in g_object_notify_queue_thaw () from /lib64/libgobject-2.0.so.0
#18 0x00007f61d965f47b in g_object_thaw_notify () from /lib64/libgobject-2.0.so.0
#19 0x00007f61d456deb3 in _dbus_handle_changes_commit () from /lib64/libnm.so.0
#20 0x00007f61d45708f9 in _dbus_managed_objects_changed_cb () from /lib64/libnm.so.0
#21 0x00007f61d995d8d8 in emit_signal_instance_in_idle_cb () from /lib64/libgio-2.0.so.0
#22 0x00007f61d9378fbb in g_idle_dispatch () from /lib64/libglib-2.0.so.0
#23 0x00007f61d937c67d in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#24 0x00007f61d4602416 in _ctx_integ_source_dispatch () from /lib64/libnm.so.0
#25 0x00007f61d937c67d in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#26 0x00007f61d937ca48 in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0
#27 0x00007f61d937cae0 in g_main_context_iteration () from /lib64/libglib-2.0.so.0
#28 0x00007f61d99450ed in g_application_run () from /lib64/libgio-2.0.so.0

- Finally, I added a DEBUG log level to the Network manager and found it is explicitly requested to perform an AP Scan

[user@lp0032 ~]$ journalctl -u NetworkManager -b -f | grep -v "wifi-ap" | grep wifi-scan | grep -i start
Jul 28 14:21:10 lp0032 NetworkManager[1650]: <debug> [1627471270.8247] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:11 lp0032 NetworkManager[1650]: <debug> [1627471271.0614] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:12 lp0032 NetworkManager[1650]: <debug> [1627471272.5694] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:14 lp0032 NetworkManager[1650]: <debug> [1627471274.0783] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:15 lp0032 NetworkManager[1650]: <debug> [1627471275.5862] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:17 lp0032 NetworkManager[1650]: <debug> [1627471277.0964] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:18 lp0032 NetworkManager[1650]: <debug> [1627471278.6063] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:20 lp0032 NetworkManager[1650]: <debug> [1627471280.1134] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:21 lp0032 NetworkManager[1650]: <debug> [1627471281.6188] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:23 lp0032 NetworkManager[1650]: <debug> [1627471283.1263] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:24 lp0032 NetworkManager[1650]: <debug> [1627471284.6284] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:26 lp0032 NetworkManager[1650]: <debug> [1627471286.1344] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:27 lp0032 NetworkManager[1650]: <debug> [1627471287.6474] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:29 lp0032 NetworkManager[1650]: <debug> [1627471289.1594] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:30 lp0032 NetworkManager[1650]: <debug> [1627471290.6673] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:32 lp0032 NetworkManager[1650]: <debug> [1627471292.1833] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:33 lp0032 NetworkManager[1650]: <debug> [1627471293.6914] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:35 lp0032 NetworkManager[1650]: <debug> [1627471295.1915] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:36 lp0032 NetworkManager[1650]: <debug> [1627471296.6921] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:38 lp0032 NetworkManager[1650]: <debug> [1627471298.1927] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:39 lp0032 NetworkManager[1650]: <debug> [1627471299.6934] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:41 lp0032 NetworkManager[1650]: <debug> [1627471301.1938] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:42 lp0032 NetworkManager[1650]: <debug> [1627471302.6936] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:44 lp0032 NetworkManager[1650]: <debug> [1627471304.1938] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:45 lp0032 NetworkManager[1650]: <debug> [1627471305.6941] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:47 lp0032 NetworkManager[1650]: <debug> [1627471307.1953] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:48 lp0032 NetworkManager[1650]: <debug> [1627471308.6964] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:50 lp0032 NetworkManager[1650]: <debug> [1627471310.1975] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:51 lp0032 NetworkManager[1650]: <debug> [1627471311.6980] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:53 lp0032 NetworkManager[1650]: <debug> [1627471313.1989] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:54 lp0032 NetworkManager[1650]: <debug> [1627471314.6995] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:56 lp0032 NetworkManager[1650]: <debug> [1627471316.1996] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:57 lp0032 NetworkManager[1650]: <debug> [1627471317.6993] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:21:59 lp0032 NetworkManager[1650]: <debug> [1627471319.1995] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:22:00 lp0032 NetworkManager[1650]: <debug> [1627471320.6995] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:22:02 lp0032 NetworkManager[1650]: <debug> [1627471322.2003] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:22:03 lp0032 NetworkManager[1650]: <debug> [1627471323.7005] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)
Jul 28 14:22:05 lp0032 NetworkManager[1650]: <debug> [1627471325.2011] device[a40af747817df8c8] (wlp0s20f3): wifi-scan: start explicit scan (0 SSIDs to probe scan)

Comment 12 Benjamin Berg 2021-07-28 15:22:21 UTC
> populate_ap_list

Yeah, that is the n^4 inefficiency that I was talking about. And it seems like we request more and more scans even though the earlier ones are not finished yet.

There might be workarounds, but it is going to be ugly to come up with one in this case. And pulling the proper fix into a z-stream is definitely not possible.

Comment 14 Benjamin Berg 2021-07-28 20:29:36 UTC
OK, you can try the following build:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=38477871

This pushes the AP list refresh into an idle handler. It'll still burn CPU, but hopefully it is smooth and able to actually process all information.

Comment 15 KazuarIT 2021-08-02 08:05:34 UTC
@bberg, thank you for the reply.

- It seems the link is broken. I don't have access to this link.
- Please note that I'm using gnome-control-center-3.28.2-22.el8.x86_64, can it be that it missing a small fix that you made in g-c-c - https://gitlab.gnome.org/GNOME/gnome-control-center/uploads/0e7490f6dcd2273e7f7f38c14366f9c2/0001-wifi-Only-cancel-pending-scan-if-wireless-is-disable.patch

Comment 16 Benjamin Berg 2021-08-02 08:50:43 UTC
Right, of course, the URL is only available from within Red Hat, sorry about that.

  https://fedorapeople.org/~benzeagnome-control-center-3.28.2-22.idle.el8_3.x86_64.rpm

Hmm, I don't remember all the details of the bug. At the end, it led to rewriting the whole UI code.

Seems like
  https://gitlab.gnome.org/GNOME/gnome-control-center/-/merge_requests/287
is basically what I now did here (though, re-done independently …).

Comment 17 Benjamin Berg 2021-08-02 08:51:26 UTC
Sorry, the correct URL is https://fedorapeople.org/~benzea/gnome-control-center-3.28.2-22.idle.el8_3.x86_64.rpm (forgot the /).

Comment 18 KazuarIT 2021-08-03 08:26:32 UTC
Thanks a lot @bberg . The fix is working great. The Wifi windows UX is back to normal.

The issue I have is only during the RPM installation. Due to the "idle" version suffix, when I install it, it requires all its dependencies with the "idle" suffix. In my environment I used "rpm --nodepts" to remove the old package and to install the new one.
We wish to install it as part of our current ansible-playbook. Would it be possible to provide an RPM without that prefix to avoid the dependencies prefix mismatch? We wish to avoid patchy workarounds same as I did to install it on my local PC.

Comment 19 KazuarIT 2021-08-04 19:03:30 UTC
Hey @bberg, would you be able to provide the aforementioned RPM build?

Comment 20 Benjamin Berg 2021-08-05 11:20:12 UTC
Sure, https://fedorapeople.org/~benzea/gnome-control-center-3.28.2-22.el8_3.x86_64.rpm

Setting devel_ack, as I think the patch is good to be pushed into z-stream.

Comment 21 KazuarIT 2021-08-09 12:11:26 UTC
Thank a lot bberg

Comment 22 Benjamin Berg 2021-11-17 12:01:18 UTC
This needs to be fixed in RHEL 8.6. Is a backport to z-stream of earlier RHEL versions desired here?

Comment 23 KazuarIT 2021-11-18 14:33:54 UTC
Hey @bberg,

2 days ago I downloaded RHEL 8.5 to see if the fix there and saw that the issues reproduce, and now I see your update. So first of all, thanks for the update and clarification.
As for your question, the answer is that we are eager to get a fix for RHEL 8.3. The private RPM fix that you kindly provided still depends on other g-c-c dependencies RPM from the current private build. 
In order to install our private RPM, we had to perform some "hack" by removing the old g-c-c RPM while keeping its dependencies and installing your private RPM while avoiding its dependencies installation.
In that way, we replace one RPM and another RPM while avoiding their dependencies. It is not something that we wish to perform to our product outside dev/testing environment. 

Would it be possible to backport it to RHEL 8.3 z-stream?

Comment 24 Benjamin Berg 2021-11-18 15:11:45 UTC
I think we'll need a bug for each distro. I can do appropriate clones.

Comment 25 Benjamin Berg 2021-11-18 15:39:25 UTC
Right, my bad. Someone else needs to decide on whether this is backported or not.

Comment 29 KazuarIT 2021-11-21 08:34:53 UTC
Hey @bberg and @jkoten 

Thanks for pushing it. This becomes a blocking issue for us. I noticed a backport requires only RHEL 8.4 & RHEL 8.5. We need it for RHEL 8.3. 
In order for us to upgrade to RHEL 8.[4,5] is a huge effort, hence we will highly appreciate it if you backport it also for RHEL 8.3.

Thanks in advance for all your efforts here.

Comment 32 Yaacov Fenster 2021-12-01 06:47:18 UTC
Good morning,  @bberg and @jkoten -

First of all, thank you very much for your responsivness.

When we try to install the supplied RPM (gnome-control-center-3.28.2-22.el8_3.x86_64.rpm) on a 8.3 system we run into a slight dependncy issue:

Error: 
 Problem: conflicting requests
  - nothing provides gnome-control-center-filesystem = 3.28.2-22.el8_3 needed by gnome-control-center-3.28.2-22.el8_3.x86_64

while the available version for the gnome-control-center-filesystem package is close (but no cigar):

$ rpm --query gnome-control-center-filesystem
gnome-control-center-filesystem-3.28.2-22.el8.noarch

a) Is the difference of el8_3 != el8, a real difference or is it just a build issue that can be rectified with a re-building of the RPM?

b) What are the ramifications of this conflict? If we "force" the dependency are we likely to run into any compatability issues?

c) Are we correct in assuming that the fix is in the scanning algorithm and not in the interface between the components?

   Thank you in advance

       Yaacov

Comment 34 errata-xmlrpc 2022-05-10 14:03:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (gnome-control-center bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1870

Comment 35 mkielian@redhat.com 2022-07-01 17:40:19 UTC
Cu is still seeing slowdown in wi-fi scanning