Bug 2024330

Summary: GSocketClient is very slow to resolve names when web proxy autodiscovery is configured by mistake, consider removing WPAD support?
Product: [Fedora] Fedora Reporter: Trev Richards <trev>
Component: glib-networkingAssignee: Michael Catanzaro <mcatanza>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 35CC: caillon+fedoraproject, gecko-bugs-nobody, gnome-sig, jhorak, mcatanza, mclasen, peter, rhughes, rstrode, sandmann, tiagomatos, tpopela
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-29 22:13:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
An animated gif that illustrates my problem.
none
Example resource timings screenshot, for a random resource
none
An example of what it looks like when ask.fedoraproject.org cannot load at all.
none
Details of a single timed out resource
none
Initial, very delayed DNS resolution.
none
GResolver test script
none
GResolver test script
none
GDB Output
none
Socket test
none
GDB Output
none
Socket Test - Clean? none

Description Trev Richards 2021-11-17 20:24:19 UTC
Created attachment 1842422 [details]
An animated gif that illustrates my problem.

Description of problem:

* Browser: Epiphany (Gnome Web) version 41, RPM version.
* FlatPak from Flathub also tested, bug was reproduced
* WebKitGTK 2.34.1
* Fedora 35 using Gnome 41 - 5.14.17-301.fc35.x86_64

Please see Fedora Project forum thread for reference: https://ask.fedoraproject.org/t/epiphany-gnome-web-webkit-intermittently-breaks

I am getting browser-breaking DNS failures and content timeouts that are rendering Epiphany unusable. These problems are not occurring with Firefox or any Blink (Chromium) based browser. I am unsure of how to troubleshoot this issue and am happy to follow any steps required.

How reproducible:
90% of the time Epiphany is broken, the other 10% of the time it breaks within an hour.

Steps to Reproduce:
1. Install epiphany from anywhere you'd like
2. Observe how broken epiphany is while trying to load a website
3. If it works, let your laptop sit for a while and try again, it will likely be broken.
4. Restart epiphany to see if it breaks.

Additional info:
 OS: Fedora 
 Kernel: x86_64 Linux 5.14.17-301.fc35.x86_64
 Uptime: 1h 53m
 Packages: 6714
 Shell: zsh 5.8
 Resolution: 1920x1080
 DE: GNOME 41.1
 WM: Mutter
 WM Theme: WhiteSur-dark-solid-red
 GTK Theme: WhiteSur-dark-solid-red [GTK2/3]
 Icon Theme: Papirus-Dark
 Font: Cantarell 11
 Disk: 674G / 960G (71%)
 CPU: Intel Core i7-7700HQ @ 8x 3.8GHz [56.0°C]
 GPU: Mesa Intel(R) HD Graphics 630 (KBL GT2)
 RAM: 7591MiB / 15716MiB

Comment 1 Michael Catanzaro 2021-11-17 22:16:26 UTC
Hmmm. This is a strange one. Going to be tough to debug this....

> I am getting browser-breaking DNS failures and content timeouts that are rendering Epiphany unusable.

So to make sure I understand, you are experiencing two different symptoms: DNS requests usually fail, but when they do not fail, then the subsequent HTTP request usually times out. Is that correct? It really kinda sounds like something is going wrong at the IP layer then. But that's ridiculous, because you said that Firefox and Chromium are both fine.

I was going to suggest moving this to WebKit Bugzilla, but maybe we should try tackling this via the DNS timeouts first, since that's just GResolver, which should be easy to isolate from everything else. If we can reproduce outside WebKit, that would help a lot. But first, the animated GIF is hard to follow, so let's take some screenshots instead. You've already found the web inspector, so now try switching to the network tab, which will show the resource timings. Also, if you look in the top left of the inspector, there is the X close button, then a dock to side of window button, then a detach button. Use that to detach the inspector to get a bigger window so it will be easier to see. Then look in the top-right corner of the network tab: next to the little trash icon, you should see a little database icon with the tooltip "Ignore the resource cache when loading resources." Click that so that the icon turns blue, that way we force everything to load via the network. Now, refresh the page in Epiphany to show the network progress as things fail to load. Let's take one screenshot of the whole thing, then a couple more showing the timings for some particular resource loads.  Click on the colorful timings graph to open a popover showing detail of what WebKit is doing. More screenshots that you think are interesting of course won't hurt. (There is an Import/Export function, but Import seems to be broken, which isn't surprising, so screenshots it is.)

I'm going to attach an example screenshot showing some resource timings. Note that not all resources will show all the timings, because Discourse is using ServiceWorkers, so just ignore the ones that seem to be mysteriously skipping DNS and TCP and TLS ("Secure"). ServiceWorkers are the reason for that (at least, I assume so).

Comment 2 Michael Catanzaro 2021-11-17 22:17:39 UTC
Created attachment 1842431 [details]
Example resource timings screenshot, for a random resource

Comment 3 Trev Richards 2021-11-17 22:36:54 UTC
Created attachment 1842432 [details]
An example of what it looks like when ask.fedoraproject.org cannot load at all.

Comment 4 Trev Richards 2021-11-17 22:42:27 UTC
Created attachment 1842433 [details]
Details of a single timed out resource

Comment 5 Trev Richards 2021-11-17 22:43:20 UTC
Created attachment 1842443 [details]
Initial, very delayed DNS resolution.

Comment 6 Trev Richards 2021-11-17 22:49:20 UTC
Hello Michael,

It took a couple of attempts to produce an example that didn't totally break outright this time. I have attached a few more examples.

There have also been some updates in the community forum worth looking at (I think?) https://ask.fedoraproject.org/t/epiphany-gnome-web-webkit-intermittently-breaks/18361/6

It's probably also worth noting that this problem was reproducible both at the location I am physically at today, and the entirely different LAN I was connected with yesterday. Even after all of the tweaks Vladislav suggested, Firefox/Chrome still persist just fine.

I am currently running a MacOS VM using Safari in order to keep working, where resolution/timeout issues are not occurring.

This is just weird enough to make me wonder if this is some sort of unrelated system issue. I did the naughty thing and upgraded to F35 from F34 without a clean format. Not sure if knowing this is helpful or not. I have submitted a similar bug report to WebKit

Comment 7 Michael Catanzaro 2021-11-17 23:05:54 UTC
I'm going to attach a GResolver test script to see if it's still broken when we take WebKit out of the equation. Give it a try, see if it is broken or not.

Comment 8 Michael Catanzaro 2021-11-17 23:06:26 UTC
Created attachment 1842447 [details]
GResolver test script

Comment 9 Michael Catanzaro 2021-11-17 23:08:58 UTC
Created attachment 1842448 [details]
GResolver test script

Attaching this script a second time because I noticed a typo immediately after I uploaded the first one. I messed up the syntax of the first for loop, but JavaScript didn't seem to care: it just worked totally fine with no warnings. Sigh....

Comment 10 Trev Richards 2021-11-17 23:28:47 UTC
JavaScript being too loose? Never heard of that problem ;)

Looks like these DNS results worked out:

$ ./glib-test.js 
Gjs-Message: 15:27:04.221: JS LOG: Resolved baidu.com to 220.181.38.148 in 60.281 ms
Gjs-Message: 15:27:04.221: JS LOG: Resolved baidu.com to 220.181.38.251 in 60.281 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved qq.com to 61.129.7.47 in 61.38 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved qq.com to 183.3.226.35 in 61.38 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved qq.com to 203.205.254.157 in 61.38 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved qq.com to 123.151.137.18 in 61.38 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved facebook.com to 2a03:2880:f101:83:face:b00c:0:25de in 62.104 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved facebook.com to 157.240.3.35 in 62.104 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved youtube.com to 2607:f8b0:400a:807::200e in 63.219 ms
Gjs-Message: 15:27:04.222: JS LOG: Resolved youtube.com to 142.251.33.110 in 63.219 ms
Gjs-Message: 15:27:04.223: JS LOG: Resolved google.com to 2607:f8b0:400a:806::200e in 64.034 ms
Gjs-Message: 15:27:04.223: JS LOG: Resolved google.com to 142.251.33.78 in 64.034 ms
Gjs-Message: 15:27:04.248: JS LOG: Resolved amazon.com to 205.251.242.103 in 86.904 ms
Gjs-Message: 15:27:04.248: JS LOG: Resolved amazon.com to 176.32.103.205 in 86.904 ms
Gjs-Message: 15:27:04.248: JS LOG: Resolved amazon.com to 54.239.28.85 in 86.904 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 2001:4998:124:1507::f000 in 87.366 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 2001:4998:24:120d::1:0 in 87.366 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 2001:4998:24:120d::1:1 in 87.366 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 2001:4998:44:3507::8000 in 87.366 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 2001:4998:44:3507::8001 in 87.366 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 2001:4998:124:1507::f001 in 87.366 ms
Gjs-Message: 15:27:04.249: JS LOG: Resolved yahoo.com to 74.6.231.20 in 87.366 ms
Gjs-Message: 15:27:04.250: JS LOG: Resolved yahoo.com to 98.137.11.163 in 87.366 ms
Gjs-Message: 15:27:04.250: JS LOG: Resolved yahoo.com to 74.6.231.21 in 87.366 ms
Gjs-Message: 15:27:04.250: JS LOG: Resolved yahoo.com to 74.6.143.25 in 87.366 ms
Gjs-Message: 15:27:04.250: JS LOG: Resolved yahoo.com to 98.137.11.164 in 87.366 ms
Gjs-Message: 15:27:04.250: JS LOG: Resolved yahoo.com to 74.6.143.26 in 87.366 ms
Gjs-Message: 15:27:04.301: JS LOG: Resolved zoom.us to 52.202.62.250 in 138.19 ms
Gjs-Message: 15:27:04.343: JS LOG: Resolved live.com to 204.79.197.212 in 180.173 ms
Gjs-Message: 15:27:04.374: JS LOG: Resolved taobao.com to 140.205.220.96 in 213.35 ms
Gjs-Message: 15:27:04.374: JS LOG: Resolved taobao.com to 140.205.94.189 in 213.35 ms
Gjs-Message: 15:27:04.379: JS LOG: Resolved netflix.com to 2600:1f14:62a:de85:d7:e7a1:8f7d:f6f5 in 216.017 ms
Gjs-Message: 15:27:04.379: JS LOG: Resolved netflix.com to 2600:1f14:62a:de83:c61a:d6e:18b0:65f7 in 216.017 ms
Gjs-Message: 15:27:04.379: JS LOG: Resolved netflix.com to 2600:1f14:62a:de84:880a:88cc:a16:5423 in 216.017 ms
Gjs-Message: 15:27:04.379: JS LOG: Resolved netflix.com to 44.242.13.161 in 216.017 ms
Gjs-Message: 15:27:04.379: JS LOG: Resolved netflix.com to 44.240.158.19 in 216.017 ms
Gjs-Message: 15:27:04.379: JS LOG: Resolved netflix.com to 52.38.7.83 in 216.017 ms
Gjs-Message: 15:27:04.389: JS LOG: Resolved tmall.com to 140.205.94.193 in 229.332 ms
Gjs-Message: 15:27:04.389: JS LOG: Resolved tmall.com to 140.205.130.99 in 229.332 ms
Gjs-Message: 15:27:04.389: JS LOG: Resolved wikipedia.org to 2620:0:863:ed1a::1 in 227.388 ms
Gjs-Message: 15:27:04.389: JS LOG: Resolved wikipedia.org to 198.35.26.96 in 227.388 ms
Gjs-Message: 15:27:04.407: JS LOG: Resolved sohu.com to 211.159.191.77 in 246.042 ms
Gjs-Message: 15:27:04.413: JS LOG: Resolved microsoft.com to 104.215.148.63 in 250.136 ms
Gjs-Message: 15:27:04.413: JS LOG: Resolved microsoft.com to 40.76.4.15 in 250.136 ms
Gjs-Message: 15:27:04.413: JS LOG: Resolved microsoft.com to 40.112.72.205 in 250.136 ms
Gjs-Message: 15:27:04.413: JS LOG: Resolved microsoft.com to 40.113.200.201 in 250.136 ms
Gjs-Message: 15:27:04.413: JS LOG: Resolved microsoft.com to 13.77.161.179 in 250.136 ms
Gjs-Message: 15:27:04.442: JS LOG: Resolved 360.cn to 36.99.170.84 in 281.413 ms
Gjs-Message: 15:27:04.442: JS LOG: Resolved 360.cn to 36.110.213.10 in 281.413 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved xinhuanet.com to 202.108.119.193 in 317.665 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved xinhuanet.com to 202.108.119.194 in 317.665 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved weibo.com to 2400:89c0:1053:3::17 in 318.101 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved weibo.com to 2400:89c0:1053:3::18 in 318.101 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved weibo.com to 180.149.139.248 in 318.101 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved weibo.com to 180.149.153.187 in 318.101 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved weibo.com to 49.7.40.131 in 318.101 ms
Gjs-Message: 15:27:04.480: JS LOG: Resolved weibo.com to 49.7.40.133 in 318.101 ms
Gjs-Message: 15:27:04.482: JS LOG: Resolved sina.com.cn to 123.125.104.150 in 319.848 ms
Gjs-Message: 15:27:04.482: JS LOG: Resolved sina.com.cn to 123.126.45.205 in 319.848 ms
Gjs-Message: 15:27:04.482: JS LOG: Resolved sina.com.cn to 49.7.37.133 in 319.848 ms
Gjs-Message: 15:27:04.615: JS LOG: Resolved jd.com to 211.144.27.126 in 453.776 ms
Gjs-Message: 15:27:04.615: JS LOG: Resolved jd.com to 118.193.98.63 in 453.776 ms
Gjs-Message: 15:27:04.615: JS LOG: Resolved jd.com to 111.13.149.108 in 453.776 ms
Gjs-Message: 15:27:04.615: JS LOG: Resolved jd.com to 211.144.24.218 in 453.776 ms

Comment 11 Michael Catanzaro 2021-11-18 16:10:42 UTC
Closing as UPSTREAM since clearly GResolver is doing fine. Let's continue in the upstream bug.

Comment 12 Michael Catanzaro 2021-11-22 19:38:09 UTC
(In reply to Michael Catanzaro from comment #11)
> Closing as UPSTREAM since clearly GResolver is doing fine. Let's continue in
> the upstream bug.

Reopening this here because GSocketClient is not doing fine.

Comment 13 Michael Catanzaro 2021-11-22 19:39:44 UTC
(Verdict from upstream bug is that GSocketClient is running very slow:

https://bugs.webkit.org/show_bug.cgi?id=233269#c10

so that's not a WebKit-level problem.)

Comment 14 Michael Catanzaro 2021-11-22 19:54:26 UTC
Next step. Take my test script from https://bugs.webkit.org/show_bug.cgi?id=233269#c8 and look for this line:

await client.connect_to_host_async(hostname, 443, null);

Change it to this:

client.connect_to_host(hostname, 443, null);

to switch from the async codepath to the sync codepath. This means it will connect to the test servers sequentially rather than in parallel, but more importantly it will simplify things a lot. Probably that won't make any difference in your results, because most of the complexity happens during the TCP/IP connection step that is going very quickly for you, but maybe it will matter. Let's see.

Also, let's enable some debug by running with the G_MESSAGES_DEBUG environment variable set like this:

$ G_MESSAGES_DEBUG=GLib-GIO ./test-socket-client

Go ahead and post what you see with both the async and the sync versions.

With the sync path g_socket_client_connect(), the code is pretty simple: the only thing happening between the RESOLVING and RESOLVED events is one call to g_socket_address_enumerator_next(). So likely that's what we'll have to dive into next.

Also, just to be careful, go ahead and run the original GResolver test script from comment #9 a couple more times, just to be sure you didn't somehow get unlucky and wind up running it when your DNS was going fast. The fact that GResolver seems to work 10x faster than GSocketClient is sufficiently weird that it's worth double-checking to make sure those results are consistent.

Comment 15 Trev Richards 2021-11-22 20:10:09 UTC
Using synchronous `client.connect_to_host()`
```
λ G_MESSAGES_DEBUG=GLib-GIO ./test-socket-client
(gjs:19155): GLib-GIO-DEBUG: 12:08:16.958: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’
Gjs-Message: 12:08:16.992: JS LOG: Connecting to example.com...
(gjs:19155): GLib-GIO-DEBUG: 12:08:16.992: _g_io_module_get_default: Found default implementation dconf (DConfSettingsBackend) for ‘gsettings-backend’
(gjs:19155): GLib-GIO-DEBUG: 12:08:16.993: _g_io_module_get_default: Found default implementation gnome (GProxyResolverGnome) for ‘gio-proxy-resolver’
Gjs-Message: 12:08:16.993: JS LOG: Resolving example.com...
Gjs-Message: 12:08:20.654: JS LOG: Resolved example.com after 3661.083 ms
Gjs-Message: 12:08:20.655: JS LOG: TCP/IP connecting to example.com...
Gjs-Message: 12:08:20.674: JS LOG: TCP/IP connected to example.com after 18.862 ms
(gjs:19155): GLib-GIO-DEBUG: 12:08:20.682: _g_io_module_get_default: Found default implementation gnutls (GTlsBackendGnutls) for ‘gio-tls-backend’
Gjs-Message: 12:08:20.716: JS LOG: TLS handshaking for example.com...
Gjs-Message: 12:08:20.742: JS LOG: TLS handshaked for example.com after 25.7 ms
Gjs-Message: 12:08:20.743: JS LOG: Finished connection to example.com after 3751.041 ms total
Gjs-Message: 12:08:20.743: JS LOG: Connecting to discussion.fedoraproject.org...
Gjs-Message: 12:08:20.743: JS LOG: Resolving discussion.fedoraproject.org...
Gjs-Message: 12:08:24.403: JS LOG: Resolved discussion.fedoraproject.org after 3660.03 ms
Gjs-Message: 12:08:24.403: JS LOG: TCP/IP connecting to discussion.fedoraproject.org...
Gjs-Message: 12:08:24.422: JS LOG: TCP/IP connected to discussion.fedoraproject.org after 18.838 ms
Gjs-Message: 12:08:24.425: JS LOG: TLS handshaking for discussion.fedoraproject.org...
Gjs-Message: 12:08:24.480: JS LOG: TLS handshaked for discussion.fedoraproject.org after 54.577 ms
Gjs-Message: 12:08:24.480: JS LOG: Finished connection to discussion.fedoraproject.org after 3737.669 ms total
```

Using async `client.connect_to_host_async()`
```
λ G_MESSAGES_DEBUG=GLib-GIO ./test-socket-client-async 
(gjs:19320): GLib-GIO-DEBUG: 12:09:22.874: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’
Gjs-Message: 12:09:22.909: JS LOG: Connecting to example.com...
(gjs:19320): GLib-GIO-DEBUG: 12:09:22.910: _g_io_module_get_default: Found default implementation dconf (DConfSettingsBackend) for ‘gsettings-backend’
(gjs:19320): GLib-GIO-DEBUG: 12:09:22.910: _g_io_module_get_default: Found default implementation gnome (GProxyResolverGnome) for ‘gio-proxy-resolver’
Gjs-Message: 12:09:22.911: JS LOG: Resolving example.com...
(gjs:19320): GLib-GIO-DEBUG: 12:09:22.911: GSocketClient: Starting new address enumeration
Gjs-Message: 12:09:22.913: JS LOG: Connecting to discussion.fedoraproject.org...
Gjs-Message: 12:09:22.914: JS LOG: Resolving discussion.fedoraproject.org...
(gjs:19320): GLib-GIO-DEBUG: 12:09:22.914: GSocketClient: Starting new address enumeration
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.654: GSocketClient: Address enumeration succeeded
Gjs-Message: 12:09:26.655: JS LOG: Resolved example.com after 3743.624 ms
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.655: GSocketClient: Starting TCP connection attempt
Gjs-Message: 12:09:26.655: JS LOG: TCP/IP connecting to example.com...
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.677: GSocketClient: TCP connection successful
Gjs-Message: 12:09:26.677: JS LOG: TCP/IP connected to example.com after 22.208 ms
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.677: GSocketClient: Starting application layer connection
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.677: GSocketClient: Starting TLS handshake
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.684: _g_io_module_get_default: Found default implementation gnutls (GTlsBackendGnutls) for ‘gio-tls-backend’
Gjs-Message: 12:09:26.718: JS LOG: TLS handshaking for example.com...
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.747: GSocketClient: TLS handshake succeeded
Gjs-Message: 12:09:26.747: JS LOG: TLS handshaked for example.com after 28.777 ms
(gjs:19320): GLib-GIO-DEBUG: 12:09:26.747: GSocketClient: Connection successful!
Gjs-Message: 12:09:26.748: JS LOG: Finished connection to example.com after 3838.553 ms total
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.402: GSocketClient: Address enumeration succeeded
Gjs-Message: 12:09:30.402: JS LOG: Resolved discussion.fedoraproject.org after 7488.558 ms
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.403: GSocketClient: Starting TCP connection attempt
Gjs-Message: 12:09:30.403: JS LOG: TCP/IP connecting to discussion.fedoraproject.org...
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.422: GSocketClient: TCP connection successful
Gjs-Message: 12:09:30.422: JS LOG: TCP/IP connected to discussion.fedoraproject.org after 19.05 ms
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.422: GSocketClient: Starting application layer connection
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.422: GSocketClient: Starting TLS handshake
Gjs-Message: 12:09:30.425: JS LOG: TLS handshaking for discussion.fedoraproject.org...
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.478: GSocketClient: TLS handshake succeeded
Gjs-Message: 12:09:30.478: JS LOG: TLS handshaked for discussion.fedoraproject.org after 52.284 ms
(gjs:19320): GLib-GIO-DEBUG: 12:09:30.478: GSocketClient: Connection successful!
Gjs-Message: 12:09:30.478: JS LOG: Finished connection to discussion.fedoraproject.org after 7565.284 ms total

```

The older GResolver script ran plenty fast 3 times.

Comment 16 Michael Catanzaro 2021-11-22 20:28:34 UTC
OK, so it's slow no matter which path we take. Not surprising.

I think that's as far as we can go using test scripts. I might need to give you an instrumented build of GLib to make further progress.

BTW, does this happen even if you switch to 8.8.8.8 or 1.1.1.1 as your resolver (drop and reestablish your connection afterwards, or restart NetworkManager, to ensure the change is effective)? I suspect it's going to be slow regardless of what DNS server you use, because you've proved that GSocketClient is slow even if GResolver is fast, but... maybe not. If there is some specific public DNS that somehow triggers this, that would make it a lot easier, since I could debug locally instead of passing test builds to you.

Comment 17 Trev Richards 2021-11-22 22:26:33 UTC
(In reply to Michael Catanzaro from comment #16)
> OK, so it's slow no matter which path we take. Not surprising.
> 
> I think that's as far as we can go using test scripts. I might need to give
> you an instrumented build of GLib to make further progress.
> 
> BTW, does this happen even if you switch to 8.8.8.8 or 1.1.1.1 as your
> resolver (drop and reestablish your connection afterwards, or restart
> NetworkManager, to ensure the change is effective)? I suspect it's going to
> be slow regardless of what DNS server you use, because you've proved that
> GSocketClient is slow even if GResolver is fast, but... maybe not. If there
> is some specific public DNS that somehow triggers this, that would make it a
> lot easier, since I could debug locally instead of passing test builds to
> you.

I've replicated the browser issue with both Google & Cloudflare's public DNS and almost never use my local ISP's DNS servers. I'm currently configured to use cloudflare for all connections and have bypassed NetworkManager for DNS.

/etc/NetworkManager/conf.d/00-custom.conf
```
[main]
dns=none
systemd-resolved=false
```

/etc/systemd/resolved.conf.d/00-custom.conf
```
[Resolve]
DNS=1.1.1.1 1.0.0.1 2606:4700:4700::1111 \
2606:4700:4700::1001
DNSOverTLS=yes
```

I'm not 100% sure I've disabled these configurations and tested without them. Mainly because I had trouble with DNS with my ISP in the first place: https://ask.fedoraproject.org/t/dns-resolution-takes-5-seconds-fedora-34/16020

Depending on where I am and which wireless gateway I've used, enabling dns-over-tls has been necessary. I even wrote a shell script for disabling/enabling it so I could quickly change configs.

I'm not on my typical, problematic home ISP supplied wireless gateway, so let's test...

I have 
1. Disabled my custom configs by commenting them out.
2. Restarted NetworkManager and systemd-resolved
3. Cleared the resolved cache
4. Made sure no public gateway is set in NetworkManager for my current connection
5. Ran the scripts
6. Tried Epiphany

GResolve
```
λ ./gresolver         
Gjs-Message: 14:09:50.874: JS LOG: Resolved baidu.com to 220.181.38.148 in 28.142 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved baidu.com to 220.181.38.251 in 28.142 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved youtube.com to 2607:f8b0:400a:80a::200e in 29.627 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved youtube.com to 142.250.69.206 in 29.627 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved google.com to 2607:f8b0:400a:80b::200e in 30.155 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved google.com to 142.250.217.110 in 30.155 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved facebook.com to 2a03:2880:f101:83:face:b00c:0:25de in 28.944 ms
Gjs-Message: 14:09:50.874: JS LOG: Resolved facebook.com to 157.240.3.35 in 28.944 ms
Gjs-Message: 14:09:50.878: JS LOG: Resolved qq.com to 61.129.7.47 in 32.541 ms
Gjs-Message: 14:09:50.878: JS LOG: Resolved qq.com to 123.151.137.18 in 32.541 ms
Gjs-Message: 14:09:50.878: JS LOG: Resolved qq.com to 183.3.226.35 in 32.541 ms
Gjs-Message: 14:09:50.878: JS LOG: Resolved qq.com to 203.205.254.157 in 32.541 ms
Gjs-Message: 14:09:50.892: JS LOG: Resolved tmall.com to 140.205.130.99 in 46.969 ms
Gjs-Message: 14:09:50.892: JS LOG: Resolved tmall.com to 140.205.94.193 in 46.969 ms
Gjs-Message: 14:09:50.896: JS LOG: Resolved taobao.com to 140.205.94.189 in 49.79 ms
Gjs-Message: 14:09:50.896: JS LOG: Resolved taobao.com to 140.205.220.96 in 49.79 ms
Gjs-Message: 14:09:50.899: JS LOG: Resolved amazon.com to 205.251.242.103 in 52.568 ms
Gjs-Message: 14:09:50.899: JS LOG: Resolved amazon.com to 54.239.28.85 in 52.568 ms
Gjs-Message: 14:09:50.899: JS LOG: Resolved amazon.com to 176.32.103.205 in 52.568 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 2001:4998:24:120d::1:1 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 2001:4998:124:1507::f001 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 2001:4998:24:120d::1:0 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 2001:4998:124:1507::f000 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 2001:4998:44:3507::8000 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 2001:4998:44:3507::8001 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 74.6.143.25 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 74.6.231.21 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 74.6.231.20 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 74.6.143.26 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 98.137.11.164 in 55.859 ms
Gjs-Message: 14:09:50.903: JS LOG: Resolved yahoo.com to 98.137.11.163 in 55.859 ms
Gjs-Message: 14:09:50.908: JS LOG: Resolved 360.cn to 36.110.213.10 in 62.126 ms
Gjs-Message: 14:09:50.908: JS LOG: Resolved 360.cn to 36.99.170.84 in 62.126 ms
Gjs-Message: 14:09:50.916: JS LOG: Resolved zoom.us to 3.235.71.132 in 67.918 ms
Gjs-Message: 14:09:50.923: JS LOG: Resolved live.com to 204.79.197.212 in 74.949 ms
Gjs-Message: 14:09:50.929: JS LOG: Resolved netflix.com to 2600:1f14:62a:de81:b848:82ee:2416:447e in 80.933 ms
Gjs-Message: 14:09:50.929: JS LOG: Resolved netflix.com to 2600:1f14:62a:de82:822d:a423:9e4c:da8d in 80.933 ms
Gjs-Message: 14:09:50.929: JS LOG: Resolved netflix.com to 2600:1f14:62a:de80:69a8:7b12:8e5f:855d in 80.933 ms
Gjs-Message: 14:09:50.929: JS LOG: Resolved netflix.com to 52.38.7.83 in 80.933 ms
Gjs-Message: 14:09:50.929: JS LOG: Resolved netflix.com to 44.240.158.19 in 80.933 ms
Gjs-Message: 14:09:50.929: JS LOG: Resolved netflix.com to 44.242.13.161 in 80.933 ms
Gjs-Message: 14:09:50.948: JS LOG: Resolved microsoft.com to 40.112.72.205 in 99.118 ms
Gjs-Message: 14:09:50.948: JS LOG: Resolved microsoft.com to 40.113.200.201 in 99.118 ms
Gjs-Message: 14:09:50.948: JS LOG: Resolved microsoft.com to 13.77.161.179 in 99.118 ms
Gjs-Message: 14:09:50.948: JS LOG: Resolved microsoft.com to 40.76.4.15 in 99.118 ms
Gjs-Message: 14:09:50.948: JS LOG: Resolved microsoft.com to 104.215.148.63 in 99.118 ms
Gjs-Message: 14:09:50.967: JS LOG: Resolved wikipedia.org to 2620:0:861:ed1a::1 in 119.717 ms
Gjs-Message: 14:09:50.967: JS LOG: Resolved wikipedia.org to 208.80.154.224 in 119.717 ms
Gjs-Message: 14:09:51.044: JS LOG: Resolved sohu.com to 211.159.191.77 in 198.388 ms
Gjs-Message: 14:09:51.064: JS LOG: Resolved weibo.com to 36.51.254.229 in 216.48 ms
Gjs-Message: 14:09:51.064: JS LOG: Resolved weibo.com to 36.51.254.228 in 216.48 ms
Gjs-Message: 14:09:51.066: JS LOG: Resolved jd.com to 211.144.24.218 in 218.967 ms
Gjs-Message: 14:09:51.066: JS LOG: Resolved jd.com to 111.13.149.108 in 218.967 ms
Gjs-Message: 14:09:51.066: JS LOG: Resolved jd.com to 211.144.27.126 in 218.967 ms
Gjs-Message: 14:09:51.066: JS LOG: Resolved jd.com to 118.193.98.63 in 218.967 ms
Gjs-Message: 14:09:51.074: JS LOG: Resolved sina.com.cn to 123.125.104.150 in 226.45 ms
Gjs-Message: 14:09:51.074: JS LOG: Resolved sina.com.cn to 123.126.45.205 in 226.45 ms
Gjs-Message: 14:09:51.074: JS LOG: Resolved sina.com.cn to 49.7.37.133 in 226.45 ms
Gjs-Message: 14:09:51.124: JS LOG: Resolved xinhuanet.com to 202.108.119.193 in 276.488 ms
Gjs-Message: 14:09:51.124: JS LOG: Resolved xinhuanet.com to 202.108.119.194 in 276.488 ms
```

Using synchronous `client.connect_to_host()`
```
λ ./test-socket-client
Gjs-Message: 14:12:21.927: JS LOG: Connecting to example.com...
Gjs-Message: 14:12:21.930: JS LOG: Resolving example.com...
 	Gjs-Message: 14:12:46.956: JS LOG: Resolved example.com after 25026.008 ms

(gjs:34565): Gjs-CRITICAL **: 14:12:46.956: JS ERROR: Gio.IOErrorEnum: Timeout was reached
test_connect_to_host@./test-socket-client:60:12
@./test-socket-client:71:29
@./test-socket-client:75:10

Gjs-Message: 14:12:46.956: JS LOG: Connecting to discussion.fedoraproject.org...
Gjs-Message: 14:12:46.957: JS LOG: Resolving discussion.fedoraproject.org...
Gjs-Message: 14:13:11.982: JS LOG: Resolved discussion.fedoraproject.org after 25024.996 ms

(gjs:34565): Gjs-CRITICAL **: 14:13:11.982: JS ERROR: Gio.IOErrorEnum: Timeout was reached
test_connect_to_host@./test-socket-client:60:12
@./test-socket-client:71:29
@./test-socket-client:75:10

^C
```

Note the break command at the end. It didn't want to finish. The time at which I broke the script was 14:25! Things may be worse without public DNS. And sure as can be Epiphany tilted as well.

Comment 18 Michael Catanzaro 2021-11-23 16:28:41 UTC
> Note the break command at the end. It didn't want to finish.

It's because I'm too used to C and forgot the required try/catch to handle exceptions. Oh well, it's just a test script. Worked well enough.

To make further progress, I'm going to have to prepare a test build with some debug added to GNetworkAddressAddressEnumerator to see what it's doing. Everything it does should happen effectively instantaneously, except for its call to GResolver, but more debug will allow us to know for sure.

Comment 19 Michael Catanzaro 2021-11-23 17:21:08 UTC
(In reply to Michael Catanzaro from comment #18) 
> To make further progress, I'm going to have to prepare a test build with
> some debug added to GNetworkAddressAddressEnumerator to see what it's doing.

Please remove or comment out all but one test domain from those test scripts, so each script tests only a single domain at once, to reduce the amount of debug that gets printed. Then repeat both tests using the glib2 packages from here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=79204370

using the G_MESSAGES_DEBUG=GLib-GIO environment variable, and post the results. glib2-2.70.0-5.fc35.x86_64.rpm is the only package that you really need: the rest are there in case you already have them installed and need to replace them. When finished testing, use 'dnf downgrade' to get back to Fedora's glib2.

Comment 20 Trev Richards 2021-11-23 23:53:31 UTC
I attempted to install your test build but ran into a problem with dependency checks:

```
λ sudo dnf install ./glib2-2.70.0-5.fc35.x86_64.rpm 
[sudo] password for trevdev: 
keybase                                                                                                                                                                                                         12 kB/s | 3.3 kB     00:00    
Dependencies resolved.
===============================================================================================================================================================================================================================================
 Package                                               Architecture                                           Version                                                       Repository                                                    Size
===============================================================================================================================================================================================================================================
Downgrading:
 glib2                                                 i686                                                   2.70.0-5.fc35                                                 fedora                                                       2.7 M
 glib2                                                 x86_64                                                 2.70.0-5.fc35                                                 @commandline                                                 2.6 M

Transaction Summary
===============================================================================================================================================================================================================================================
Downgrade  2 Packages

Total size: 5.4 M
Total download size: 2.7 M
Is this ok [y/N]: y
Downloading Packages:
glib2-2.70.0-5.fc35.i686.rpm                                                                                                                                                                                   4.4 MB/s | 2.7 MB     00:00    
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                          251 kB/s | 2.7 MB     00:11     
Running transaction check
Transaction check succeeded.
Running transaction test
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Transaction test error:
  file /usr/share/man/man1/gapplication.1.gz conflicts between attempted installs of glib2-2.70.0-5.fc35.i686 and glib2-2.70.0-5.fc35.x86_64
  file /usr/share/man/man1/gdbus.1.gz conflicts between attempted installs of glib2-2.70.0-5.fc35.i686 and glib2-2.70.0-5.fc35.x86_64
  file /usr/share/man/man1/gio-querymodules.1.gz conflicts between attempted installs of glib2-2.70.0-5.fc35.i686 and glib2-2.70.0-5.fc35.x86_64
  file /usr/share/man/man1/gio.1.gz conflicts between attempted installs of glib2-2.70.0-5.fc35.i686 and glib2-2.70.0-5.fc35.x86_64
  file /usr/share/man/man1/glib-compile-schemas.1.gz conflicts between attempted installs of glib2-2.70.0-5.fc35.i686 and glib2-2.70.0-5.fc35.x86_64
  file /usr/share/man/man1/gsettings.1.gz conflicts between attempted installs of glib2-2.70.0-5.fc35.i686 and glib2-2.70.0-5.fc35.x86_64

```

Looks like my version of glib2 is 2.70.1:
```
λ dnf list --installed | grep glib2
glib2.i686                                        2.70.1-1.fc35                          @updates                                       
glib2.x86_64                                      2.70.1-1.fc35                          @updates   
```

Comment 21 Michael Catanzaro 2021-11-24 15:12:03 UTC
Wow, multilib. :/  And it looks like I was missing the latest commits too.

New scratch build here: https://koji.fedoraproject.org/koji/taskinfo?taskID=79228402

You'll have to install the x86_64 and i686 packages at the same time.

Comment 22 Trev Richards 2021-11-24 19:34:20 UTC
There we go!

GResolve
```
λ G_MESSAGES_DEBUG=GLib-GIO ./glib-test.js      
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.490: g_socket_client_connect: BEGIN
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Enumerated connectable (GUnixSocketAddress)
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Begin new iteration of primary loop
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Emitting RESOLVING event...
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Calling g_socket_address_enumerator_next()...
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: ...finished g_socket_address_enumerator_next()
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Emitting RESOLVED event...
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Finished emitting RESOLVED event. ever_resolved=TRUE
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: Continuing on to socket connecting...
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.491: g_socket_client_connect: END
(gjs:10194): GLib-GIO-DEBUG: 11:32:09.493: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’
Gjs-Message: 11:32:09.529: JS LOG: Resolved xinhuanet.com to 202.108.119.194 in 2.333 ms
Gjs-Message: 11:32:09.529: JS LOG: Resolved xinhuanet.com to 202.108.119.193 in 2.333 ms
```

socket test
```
λ G_MESSAGES_DEBUG=GLib-GIO ./test-socket-client
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: BEGIN
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Enumerated connectable (GUnixSocketAddress)
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Begin new iteration of primary loop
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Emitting RESOLVING event...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Calling g_socket_address_enumerator_next()...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: ...finished g_socket_address_enumerator_next()
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Emitting RESOLVED event...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Finished emitting RESOLVED event. ever_resolved=TRUE
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: Continuing on to socket connecting...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.343: g_socket_client_connect: END
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.345: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’
Gjs-Message: 11:33:16.378: JS LOG: Connecting to discussion.fedoraproject.org...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.378: g_socket_client_connect: BEGIN
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.378: _g_io_module_get_default: Found default implementation dconf (DConfSettingsBackend) for ‘gsettings-backend’
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: _g_io_module_get_default: Found default implementation gnome (GProxyResolverGnome) for ‘gio-proxy-resolver’
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Enumerated connectable (GNetworkAddress)
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Begin new iteration of primary loop
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Emitting RESOLVING event...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: BEGIN
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Enumerated connectable (GUnixSocketAddress)
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Begin new iteration of primary loop
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Emitting RESOLVING event...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Calling g_socket_address_enumerator_next()...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: ...finished g_socket_address_enumerator_next()
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Emitting RESOLVED event...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Finished emitting RESOLVED event. ever_resolved=TRUE
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Continuing on to socket connecting...
Gjs-Message: 11:33:16.379: JS LOG: Resolving discussion.fedoraproject.org...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Calling g_socket_address_enumerator_next()...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: END
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.143: g_network_address_address_enumerator_next: BEGIN
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.143: g_network_address_address_enumerator_next: addr_enum->addresses == NULL
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.143: g_network_address_address_enumerator_next: !addr->priv->cached_sockaddrs, parsing sockaddr...
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.143: g_network_address_address_enumerator_next: still !addr->priv->cached_sockaddrs, calling g_resolver_lookup_by_name() with hostname=discussion.fedoraproject.org
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.147: g_network_address_address_enumerator_next: finished g_resolver_lookup_by_name()
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: init_and_query_next_address: BEGIN
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: init_and_query_next_address: copied cached_sockaddrs
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: init_and_query_next_address: addr_enum->current_item == NULL
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: init_and_query_next_address: END (next_item, return result)
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: g_network_address_address_enumerator_next: END
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: g_socket_client_connect: ...finished g_socket_address_enumerator_next()
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: g_socket_client_connect: Emitting RESOLVED event...
Gjs-Message: 11:33:20.148: JS LOG: Resolved discussion.fedoraproject.org after 3768.974 ms
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: g_socket_client_connect: Finished emitting RESOLVED event. ever_resolved=TRUE
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.148: g_socket_client_connect: Continuing on to socket connecting...
Gjs-Message: 11:33:20.149: JS LOG: TCP/IP connecting to discussion.fedoraproject.org...
Gjs-Message: 11:33:20.170: JS LOG: TCP/IP connected to discussion.fedoraproject.org after 21.159 ms
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.178: _g_io_module_get_default: Found default implementation gnutls (GTlsBackendGnutls) for ‘gio-tls-backend’
Gjs-Message: 11:33:20.218: JS LOG: TLS handshaking for discussion.fedoraproject.org...
Gjs-Message: 11:33:20.247: JS LOG: TLS handshaked for discussion.fedoraproject.org after 29.293 ms
(gjs:10350): GLib-GIO-DEBUG: 11:33:20.247: g_socket_client_connect: END
Gjs-Message: 11:33:20.247: JS LOG: Finished connection to discussion.fedoraproject.org after 3869.962 ms total
```

socket async test
```
λ G_MESSAGES_DEBUG=GLib-GIO ./test-socket-client-async 
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: BEGIN
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Enumerated connectable (GUnixSocketAddress)
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Begin new iteration of primary loop
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Emitting RESOLVING event...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Calling g_socket_address_enumerator_next()...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: ...finished g_socket_address_enumerator_next()
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Emitting RESOLVED event...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Finished emitting RESOLVED event. ever_resolved=TRUE
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.102: g_socket_client_connect: Continuing on to socket connecting...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.103: g_socket_client_connect: END
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.105: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’
Gjs-Message: 11:33:49.138: JS LOG: Connecting to discussion.fedoraproject.org...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: _g_io_module_get_default: Found default implementation dconf (DConfSettingsBackend) for ‘gsettings-backend’
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: _g_io_module_get_default: Found default implementation gnome (GProxyResolverGnome) for ‘gio-proxy-resolver’
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: BEGIN
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Enumerated connectable (GUnixSocketAddress)
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Begin new iteration of primary loop
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Emitting RESOLVING event...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Calling g_socket_address_enumerator_next()...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: ...finished g_socket_address_enumerator_next()
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Emitting RESOLVED event...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Finished emitting RESOLVED event. ever_resolved=TRUE
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: Continuing on to socket connecting...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.140: g_socket_client_connect: END
Gjs-Message: 11:33:49.141: JS LOG: Resolving discussion.fedoraproject.org...
(gjs:10402): GLib-GIO-DEBUG: 11:33:49.141: GSocketClient: Starting new address enumeration
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.898: init_and_query_next_address: BEGIN
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.898: init_and_query_next_address: addr_enum->current_item == NULL
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.898: init_and_query_next_address: END (next_item, return result)
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.898: GSocketClient: Address enumeration succeeded
Gjs-Message: 11:33:52.898: JS LOG: Resolved discussion.fedoraproject.org after 3757.317 ms
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.898: GSocketClient: Starting TCP connection attempt
Gjs-Message: 11:33:52.899: JS LOG: TCP/IP connecting to discussion.fedoraproject.org...
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.917: GSocketClient: TCP connection successful
Gjs-Message: 11:33:52.917: JS LOG: TCP/IP connected to discussion.fedoraproject.org after 18.466 ms
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.917: GSocketClient: Starting application layer connection
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.917: GSocketClient: Starting TLS handshake
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.923: _g_io_module_get_default: Found default implementation gnutls (GTlsBackendGnutls) for ‘gio-tls-backend’
Gjs-Message: 11:33:52.957: JS LOG: TLS handshaking for discussion.fedoraproject.org...
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.983: GSocketClient: TLS handshake succeeded
Gjs-Message: 11:33:52.983: JS LOG: TLS handshaked for discussion.fedoraproject.org after 25.632 ms
(gjs:10402): GLib-GIO-DEBUG: 11:33:52.983: GSocketClient: Connection successful!
Gjs-Message: 11:33:52.983: JS LOG: Finished connection to discussion.fedoraproject.org after 3845.07 ms total
```

Comment 23 Michael Catanzaro 2021-11-24 20:37:25 UTC
The test-resolver output looks good. Let's forget about that. Also ignore the async debug output: I didn't add debug for that, so the debug you see is all coming from the unrelated GUnixSocketAddress stuff. That's presumably happening on another thread. Maybe it's from GSettings or something, dunno.

The sync output is confused because GIO modules are also using their own GSocketClients at the same time, presumably on separate threads. That actually could be related to the bug, though. I guess I failed to test this build myself like I should have before passing it off to you, or I would have noticed and added more debug to disambiguate the separate GSocketClient instances. Might have to do that in the end, but that would require another build, so for now let's instead sabotage GIO_MODULE_DIR:

$ GIO_MODULE_DIR=0 G_MESSAGES_DEBUG=GLib-GIO ./test-socket-client

like so. This is guaranteed to break your connection because you won't be able to do the final TLS connection step, but that's extraneous anyway and I think it might print enough to give us a clearer picture. Or alternatively, I wonder if it might sidestep the bug entirely, if something is going wrong with shared state when GSocketClient is used on two threads at once.

Anyway, the most interesting part of your debug output is here:

Gjs-Message: 11:33:16.379: JS LOG: Resolving discussion.fedoraproject.org...
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: ...finished emitting RESOLVING event.
(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: Calling g_socket_address_enumerator_next()...

This is where we really start resolving. Then we have:

(gjs:10350): GLib-GIO-DEBUG: 11:33:16.379: g_socket_client_connect: END

which confused me for a while until I realized it's coming from the unrelated connection on a secondary thread. My guess is that it's for GSettings. So let's ignore that. Then we have:

(gjs:10350): GLib-GIO-DEBUG: 11:33:20.143: g_network_address_address_enumerator_next: BEGIN

and no explanation for the four seconds that were lost. There is really almost *nothing* happening inside g_socket_address_enumerator_next(). It should call klass->next right away, and that is g_network_address_address_enumerator_next():

GSocketAddress *
g_socket_address_enumerator_next (GSocketAddressEnumerator  *enumerator,
				  GCancellable              *cancellable,
				  GError                   **error)
{
  GSocketAddressEnumeratorClass *klass;

  g_return_val_if_fail (G_IS_SOCKET_ADDRESS_ENUMERATOR (enumerator), NULL);

  klass = G_SOCKET_ADDRESS_ENUMERATOR_GET_CLASS (enumerator);

  return (* klass->next) (enumerator, cancellable, error);
}

So why is the thread blocking on seemingly nothing? I might need to add debug within this function just to make sure it really is calling klass->next right away. This would be *really really* crazy, but I wonder if the GObject casts are somehow getting blocked on the other thread... somehow. It would be so very unlikely, but if true could potentially explain a lot. Low confidence. Meh. Let's see what happens with GIO_MODULE_DIR=0 before deciding what to try next.

I was hoping the debug would be enlightening, but this is somehow becoming stranger the deeper we go....

Comment 24 Michael Catanzaro 2021-11-24 20:46:29 UTC
Besides rerunning the sync test script with GIO_MODULE_DIR=0, please also try to catch it in gdb at the time of the hang:

$ gdb gjs
(gdb) set logging on
(gdb) r ./test-socket-client

Note it may take a very long time to download debuginfo. Next, wait for that four-second hang, then hit Ctrl+C during the hang and take a backtrace after you regain control:

^C
(gdb) thread apply all bt

Continue and repeat once or twice just to be sure the backtraces look the same:

(gdb) c
^C

If you get lucky, then maybe it will point to something interesting.

Comment 25 Michael Catanzaro 2021-11-24 20:47:03 UTC
(In reply to Michael Catanzaro from comment #24)
> (gdb) set logging on

BTW this will cause everything to be logged to gdb.txt in the current working directory.

Comment 26 Trev Richards 2021-11-24 21:30:19 UTC
Created attachment 1843475 [details]
GDB Output

Comment 27 Trev Richards 2021-11-24 21:31:30 UTC
Created attachment 1843477 [details]
Socket test

Flags GIO_MODULE_DIR=0 G_MESSAGES_DEBUG=GLib-GIO

Comment 28 Trev Richards 2021-11-24 21:33:25 UTC
(In reply to Michael Catanzaro from comment #24)
> Besides rerunning the sync test script with GIO_MODULE_DIR=0, please also
> try to catch it in gdb at the time of the hang:
> 
> $ gdb gjs
> (gdb) set logging on
> (gdb) r ./test-socket-client
> 
> Note it may take a very long time to download debuginfo. Next, wait for that
> four-second hang, then hit Ctrl+C during the hang and take a backtrace after
> you regain control:
> 
> ^C
> (gdb) thread apply all bt
> 
> Continue and repeat once or twice just to be sure the backtraces look the
> same:
> 
> (gdb) c
> ^C
> 
> If you get lucky, then maybe it will point to something interesting.

I have attached the output for these requested tests :)

https://bugzilla.redhat.com/attachment.cgi?id=1843475
https://bugzilla.redhat.com/attachment.cgi?id=1843477

Comment 29 Trev Richards 2021-11-24 21:36:24 UTC
Created attachment 1843478 [details]
GDB Output

The same GDB output as the previous file, but without the download feedback

Comment 30 Michael Catanzaro 2021-11-24 22:16:09 UTC
So it goes 10x faster with GIO_MODULE_DIR=0. Huhhhhh. Now we're getting somewhere. And it looks like it's blocking on network proxy lookup, which you said you didn't have configured, but maybe something weird is happening. First, what is the output of:

$ gsettings get org.gnome.system.proxy mode
$ gsettings get org.gnome.system.proxy autoconfig-url

Resetting those (with 'gsettings reset') will probably fix the slowness... but I didn't see any mention of proxies in the debug you provided earlier, as I would expect if the mode setting is set to anything other than 'none'. Sure looks like it's blocking to do proxy autoconfig though, which is slow. Anyway, if that fails, then as root, next thing you're going to try is:

 * Move /usr/lib64/gio/modules/libgiognomeproxy.so to someplace safe
 * Run 'sudo gio-querymodules-64 /usr/lib64/gio/modules' to recreate /usr/lib64/io/modules/giomodule.cache

Of course you only want to do that temporarily, for debugging, because modifying files under /usr is naughty. In this case, it is OK to do temporarily because if you put the proxy module back and then run gio-querymodules-64 again, you should get back to a state equivalent to as if you had not messed with it.

In the unlikely case that doesn't help, then I'm going to ask you to bisect /usr/lib64/gio/modules by removing half the untested modules at a  time and repeating until you figure out which module is slowing things down. Pretty sure it will turn out to be the gnomeproxy one, but if not, there's only a few modules there, shouldn't take long to test and find out.

Comment 31 Trev Richards 2021-11-25 00:04:53 UTC
Created attachment 1843484 [details]
Socket Test - Clean?

Comment 32 Trev Richards 2021-11-25 00:09:37 UTC
In terms of proxies I have Tor and ProtonVPN, both of which are currently disabled. ProtonVPN is the only thing that auto-configures anything. I wonder if that has something to do with this.

GSettings:
```
λ gsettings get org.gnome.system.proxy mode
'auto'

λ gsettings get org.gnome.system.proxy autoconfig-url
''
```

Resetting indeed set system.proxy to `none`

I (and most likely you, too) am pretty happy to be writing this response from within Epiphany after browsing and using several websites. I can't rightly recall having explicitly gone to my Settings -> Network -> Network Proxy setting for any reason other than to perhaps glean some better understanding of how I can use Tor.

I think it's fair to say that if the setting is "auto" and there's no proxy configured, it shouldn't break things...but I'm happy and somewhat embarrassed to say I think we've isolated the problem.

Here's the resulting socket test, just to be sure: https://bugzilla.redhat.com/attachment.cgi?id=1843484

Comment 33 Michael Catanzaro 2021-11-26 00:58:23 UTC
> I think it's fair to say that if the setting is "auto" and there's no proxy configured, it shouldn't break things...

Indeed, it shouldn't. Will investigate more next week.

Comment 34 Michael Catanzaro 2021-11-26 01:03:24 UTC
BTW the "auto" setting means "proxy autoconfig."

When the URL is empty, that means "do web proxy autodiscovery" (WPAD). See: https://en.wikipedia.org/wiki/Web_Proxy_Auto-Discovery_Protocol. Presumably that is going slow for some reason. It's certainly not what you want (note there's a *lot* of security considerations) unless you know that you want it, which you don't. I'm surprised you wound up in this state by accident.

Comment 35 Trev Richards 2021-11-26 01:37:46 UTC
(In reply to Michael Catanzaro from comment #34)
> BTW the "auto" setting means "proxy autoconfig."
> 
> When the URL is empty, that means "do web proxy autodiscovery" (WPAD). See:
> https://en.wikipedia.org/wiki/Web_Proxy_Auto-Discovery_Protocol. Presumably
> that is going slow for some reason. It's certainly not what you want (note
> there's a *lot* of security considerations) unless you know that you want
> it, which you don't. I'm surprised you wound up in this state by accident.

Big yikes! Yeah, chances are likely that while I experimented I didn't appreciate what "auto" meant. Perhaps I thought it could figure out a local (to the machine) proxy on its own. I never had problems in 2/3 major browsers, so it took this long to realize there was any sort of problem. We'll just leave that off.

For users who are even dumber than I am, it might be a good idea to have some sort of help info in that interface.

Comment 36 Trev Richards 2021-11-26 19:28:29 UTC
A moment of gratitude for your help Michael. Thank you!

I hope you have a good weekend.

Comment 37 Michael Catanzaro 2021-11-29 20:11:26 UTC
(In reply to Trev Richards from comment #35)
> For users who are even dumber than I am, it might be a good idea to have
> some sort of help info in that interface.

Yeah I think that's what this issue comes down to. That said, there is already a warning in Settings -> Network -> Network Proxy: "Web Proxy Autodiscovery is used when a Configuration URL is not provided. This is not recommended for untrusted public networks." That kinda already hints at "this is not what you want," but perhaps I could make it more clear. How about:

"Never enable automatic proxy configuration unless you understand why you need to do so. Providing a configuration URL is strongly recommended. If not provided, Web Proxy Autodiscovery is used. This should be used only on trusted internal networks, never on untrusted public networks."

Comment 38 Trev Richards 2021-11-29 21:02:56 UTC
(In reply to Michael Catanzaro from comment #37)
> (In reply to Trev Richards from comment #35)
> > For users who are even dumber than I am, it might be a good idea to have
> > some sort of help info in that interface.
> 
> Yeah I think that's what this issue comes down to. That said, there is
> already a warning in Settings -> Network -> Network Proxy: "Web Proxy
> Autodiscovery is used when a Configuration URL is not provided. This is not
> recommended for untrusted public networks." That kinda already hints at
> "this is not what you want," but perhaps I could make it more clear. How
> about:
> 
> "Never enable automatic proxy configuration unless you understand why you
> need to do so. Providing a configuration URL is strongly recommended. If not
> provided, Web Proxy Autodiscovery is used. This should be used only on
> trusted internal networks, never on untrusted public networks."

Yeah! I'm not sure I spotted this warning if I had toggled it myself. It's possible I toggled it, tested something, then when that test didn't work perhaps forgotten to toggle it back to off.

It's obvious that leaving this on is a bad idea already. A stricter message like the one you're suggesting is never a bad idea. The number of times I work on a public network these days are next to none (thank goodness!). When I am I'm behind a VPN. Seeing as it more or less breaks Epiphany is an interesting side-effect. I wonder what Chrome/FireFox does differently. In this case I'm very glad it did "break" things.

Comment 39 Michael Catanzaro 2021-11-29 21:50:32 UTC
> I wonder what Chrome/FireFox does differently.

(1) They have their own proxy settings, and surely ignore GNOME's. (2) I think libproxy might be doing web proxy autodiscovery again for each network request, instead of which is... pretty intense. We do have to run the PAC JavaScript once for each network request, but I suspect libproxy just keeps trying again and again to download it instead of eventually giving up....

Comment 40 Michael Catanzaro 2021-11-29 22:13:35 UTC
Instead of working on new UI for this, I think it's better to just try removing WPAD support and see if anybody complains: https://gitlab.gnome.org/GNOME/glib-networking/-/issues/182

Comment 41 Trev Richards 2021-11-30 18:16:20 UTC
(In reply to Michael Catanzaro from comment #40)
> Instead of working on new UI for this, I think it's better to just try
> removing WPAD support and see if anybody complains:
> https://gitlab.gnome.org/GNOME/glib-networking/-/issues/182

I'm obviously no authority on this topic, but if I may quote "The Zen of Python" - Explicit is better than implicit.