Bug 1151665
|
Description
Changming Sun
2014-10-11 02:00:26 UTC
Thanks for a bug report. It seems to me (from the backtrace), that the Google account was trying to open its associated Calendar, when a network change had been received, which resulted in a notification of an online state change being discovered by the evolution-data-server code and propagated further, through some signals into GLib. I guess from the backtrace that the GLib got stuck in the sort function. Could you provide your glib2 version, please? (rpm -q glib2) Dan (or Philip), does this sound familiar to anyone of you, please? I found an upstream bug report []1, but it doesn't fit the above backtrace, thus it's possible the change there is unrelated to this issue. [1] https://bugzilla.gnome.org/show_bug.cgi?id=736806 Here are backtraces with line numbers: bug #1148247 comment #5, see the backtraces for evolution-addressbook-factory. (In reply to Milan Crha from comment #1) > Could you provide your glib2 version, please? (rpm -q glib2) $ rpm -q glib2 glib2-2.42.0-1.fc21.x86_64 glib2-2.42.0-1.fc21.i686 rpm -q glib2 glib2-2.42.0-1.fc21.x86_64 Thanks for the update. Let's move this to glib2 for further investigation. *** Bug 1152692 has been marked as a duplicate of this bug. *** Dan, could you have a look on this, please? There are happening more same issues in GTask all exhibiting the same symptoms. Hm... so this is the same as https://bugzilla.gnome.org/show_bug.cgi?id=735325. The strace output there showed that the kernel was apparently spamming e-d-s with routing change notifications for some reason, but the reporter stopped being able to reproduce the bug before we could figure out why. It would be useful to get "strace -f -s 256" output of e-d-s while it's stuck in the loop. (The loop is pretty tight, so you only need like 1/10s worth of output.) Also, it might be useful to run NetworkManager with debug logging (to see what it's doing/noticing about the routing). Add this to /etc/NetworkManager/NetworkManager.conf: [logging] level=DEBUG domains=DEFAULT and restart NM. Regardless, I guess I should also add some rate limiting to GNetworkMonitor... Created attachment 948558 [details]
strace -f -s 256 evolution
OK, so the complete netlink message is:
6c 00 00 00 [ length 108 ]
18 00 00 00 [ type RTM_NEWROUTE, flags 0 ]
xx xx xx xx [ serial # ]
yy yy yy yy [ pid ?? ]
0a 00 00 00 [ AF_INET6, src len 0, dst len 0, TOS 0 ]
fe 04 00 01 [ RT_TABLE_MAIN, RTPROT_STATIC, RT_SCOPE_UNIVERSE, RTN_UNICAST]
00 00 00 00 [ flags 0 ]
08 00 0f 00 [ attr of length 8, type RTA_TABLE ]
fe 00 00 00 [ RT_TABLE_MAIN ]
14 00 05 00 [ attr of length 20, type RTA_GATEWAY ]
fe 80 00 00 [ fe80::16:da:e9:ff:fe:f8:a8:f7 ]
00 00 00 00
16 da e9 ff
fe f8 a8 f7
08 00 04 00 [ attr of length 8, type RTA_OIF ]
02 00 00 00 [ device ifindex 2 ]
08 00 06 00 [ attr of length 8, type RTA_PRIORITY ]
00 04 00 00 [ route metric 1024 ]
24 00 0c 00 [ attr of length 36, type RTA_CACHEINFO ]
00 00 00 00
[ truncated? ]
The one thing that this trace tells us that the truncated one in the gnome bug didn't is that the "new" route is actually the same one every time.
The other thing in this trace is that there are also thousands of
[pid 31632] recvmsg(10, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\1\210\0\0\0\376s\0\0\233\0\0\0\1\1o\0+\0\0\0/org/freedesktop/NetworkManager/IP6Config/3\0\0\0\0\0\2\1s\0(\0\0\0org.freedesktop.NetworkManager.IP6Config\0\0\0\0\0\0\0\0\3\1s\0\21\0\0\0PropertiesChanged\0\0\0\0\0\0\0\10\1g\0\5a{sv}\0\0\0\0\0\0\200\0\0\0\0\0\0\0\6\0\0\0Routes\0\ta(ayuayu)\0\0\0`\0\0\0\0\0\0\0\20\0\0\0*\2\1\310\1\1\f\204\0\0\0\0\0\0\0\0@\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 576
meaning that NM is also receiving and reacting to the same flood of RTM_NEWLINK messages.
So... is there something "weird" about your network configuration? Do you have any weird (physical or virtual) network interfaces? Are you running any weird network-related daemons? Weird sysctl settings? Etc...
(The NM DEBUG logs mentioned before might still be useful too.)
Actually, another thing: install radvd, and then (as root) run "radvdump", and attach the output.
Pretty sure this is NM's fault. Mikhail (or anyone else who can reproduce this reliably): can you try installing the NetworkManager package at http://koji.fedoraproject.org/koji/taskinfo?taskID=7926164 (and then restart NetworkManager). I expect that just restarting NM will make the CPU usage go away temporarily, but with old NM it would come back eventually, and with the new it won't. Created attachment 949234 [details]
radvdump
Created attachment 949295 [details]
strace of /usr/libexec/evolution-calendar-factory
Created attachment 949607 [details]
journalctl --since=2014-10-23 --no-pager -u NetworkManager.service
Created attachment 950330 [details]
NetworkManager log
Anything else? Just a side note that evolution-data-server was issuing a server availability checks after each single "network-changed" signal, which can come in a quick rate in junks like 5+ in a row in certain cases. That caused unnecessary performance requirements and so on. I tried to address this upstream at [1]. The only difference is that it's not fully related to this, I think, because the backtrace as such is a new thing, not an old. [1] https://bugzilla.gnome.org/show_bug.cgi?id=712392 (In reply to Mikhail from comment #17) > Anything else? Yes: (In reply to Dan Winship from comment #12) > Mikhail (or anyone else who can reproduce this reliably): can you try > installing the NetworkManager package at > http://koji.fedoraproject.org/koji/taskinfo?taskID=7926164 (and then restart > NetworkManager). > > I expect that just restarting NM will make the CPU usage go away > temporarily, but with old NM it would come back eventually, and with the new > it won't. With this NM http://koji.fedoraproject.org/koji/taskinfo?taskID=7926164 issue still reproducible: $ rpm -qa | grep NetworkManager NetworkManager-wifi-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-debuginfo-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-pptp-gnome-0.9.8.2-6.fc21.x86_64 NetworkManager-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-glib-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-vpnc-gnome-0.9.9.0-6.git20140428.fc21.x86_64 NetworkManager-bluetooth-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-pptp-0.9.8.2-6.fc21.x86_64 NetworkManager-wwan-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-openvpn-0.9.9.0-3.git20140128.fc21.x86_64 NetworkManager-config-connectivity-fedora-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-openvpn-gnome-0.9.9.0-3.git20140128.fc21.x86_64 NetworkManager-adsl-0.9.10.0-8.1danw.git20140704.fc21.x86_64 NetworkManager-openconnect-0.9.8.4-4.fc21.x86_64 NetworkManager-vpnc-0.9.9.0-6.git20140428.fc21.x86_64 Created attachment 950385 [details]
evolution-addressbook-factory strace with new NM
Created attachment 950386 [details]
Network Manager log
OK... new test package with a different fix: http://koji.fedoraproject.org/koji/taskinfo?taskID=7939674 BTW, are you getting constant high CPU usage, or just repeated bursts of high CPU usage? The NM log shows a second or two of high activity, followed by a 30-second gap, then another burst of activity, a 30-second gap, etc. (If evolution-data-server is constantly spinning rather than only sporadically spinning, then there may be a GNetworkMonitor-specific bug too...) (In reply to Dan Winship from comment #23) > OK... new test package with a different fix: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=7939674 Thanks, with this update I don't see issue. > BTW, are you getting constant high CPU usage, or just repeated bursts of > high CPU usage? If this occurs evolution-addressbook-factory process constant utilize CPU core. > The NM log shows a second or two of high activity, followed > by a 30-second gap, then another burst of activity, a 30-second gap, etc. > (If evolution-data-server is constantly spinning rather than only > sporadically spinning, then there may be a GNetworkMonitor-specific bug > too...) Subjectively, two days of work all right. Why is subjective because there is no clear instructions for reproducing problem. But NetworkManager log are still very huge. Created attachment 950824 [details]
Network Manager log after fixing issue
(In reply to Dan Winship from comment #23) > (If evolution-data-server is constantly spinning rather than only > sporadically spinning, then there may be a GNetworkMonitor-specific bug > too...) Or the way evolution used to react on the GNetworkMonitor "network-changed" signal. See comment #18. Hi, today come NetworkManager-0.9.10.0-10.git20140704.fc21.x86_64 and this issue returned :( Created attachment 951025 [details]
NetworkManager-0.9.10.0-10 log
(In reply to Mikhail from comment #24) > Thanks, with this update I don't see issue. Awesome > But NetworkManager log are still very huge. Oh, right, because you still have debugging logging enabled. You'll want to remove the [logging] section from /etc/NetworkManager/NetworkManager.conf (and restart NM) (In reply to Mikhail from comment #27) > Hi, today come NetworkManager-0.9.10.0-10.git20140704.fc21.x86_64 and this > issue returned :( Yes, the package I gave you was just a test build, and after I made it, someone else put out a new release build of NM, which has a higher version, but doesn't contain the test bug fix. A new, fixed package should be coming out soon. Until then, you can just downgrade to the build in koji. (In reply to Dan Winship from comment #29) > Oh, right, because you still have debugging logging enabled. You'll want to > remove the [logging] section from /etc/NetworkManager/NetworkManager.conf > (and restart NM) I just wanted to know, it is normal that NetworkManager always something to do. NetworkManager-0.9.10.0-12.git20140704.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/NetworkManager-0.9.10.0-12.git20140704.fc21 Package NetworkManager-0.9.10.0-12.git20140704.fc21: * should fix your issue, * was pushed to the Fedora 21 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.10.0-12.git20140704.fc21' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-13942/NetworkManager-0.9.10.0-12.git20140704.fc21 then log in and leave karma (feedback). And it occurs again with new NetworkManager $ rpm -qa | grep NetworkManager NetworkManager-bluetooth-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-glib-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-pptp-gnome-0.9.8.2-6.fc21.x86_64 NetworkManager-vpnc-gnome-0.9.9.0-6.git20140428.fc21.x86_64 NetworkManager-pptp-0.9.8.2-6.fc21.x86_64 NetworkManager-wifi-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-config-connectivity-fedora-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-openvpn-0.9.9.0-3.git20140128.fc21.x86_64 NetworkManager-openvpn-gnome-0.9.9.0-3.git20140128.fc21.x86_64 NetworkManager-debuginfo-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-adsl-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-wwan-0.9.10.0-12.git20140704.fc21.x86_64 NetworkManager-openconnect-0.9.8.4-4.fc21.x86_64 NetworkManager-vpnc-0.9.9.0-6.git20140428.fc21.x86_64 Created attachment 952386 [details]
Network Manager log (0.9.10.0-12)
Created attachment 952387 [details]
strace of evolution-addressbook-factory
Created attachment 952388 [details]
screenshot of htop
Created attachment 952921 [details]
screenshot of htop
Created attachment 952922 [details]
Network Manager log
Created attachment 952923 [details]
strace of evolution-source-registry
Created attachment 952924 [details]
strace of evolution-addressbook-factory
Same here. NetworkManager update to NetworkManager-0.9.10.0-12.git20140704.fc21 does not fix the 100% CPU issue in Evolution. Upstream bug report for the task_thread_cancelled(): https://bugzilla.gnome.org/show_bug.cgi?id=738545 I've put a new test build up at http://koji.fedoraproject.org/koji/taskinfo?taskID=8048589. It doesn't actually have any new bug fixes, but adds a little bit more debugging in the hopes that I can figure out where this loop is coming from... (Christian, if you want to try it, see comment 8 about enabling NM debug logging first.) Created attachment 954180 [details]
Network Manager log
new build at http://koji.fedoraproject.org/koji/taskinfo?taskID=8051195. This includes both more debugging, and a potential fix (In reply to Dan Winship from comment #45) > new build at http://koji.fedoraproject.org/koji/taskinfo?taskID=8051195. > This includes both more debugging, and a potential fix I am not sure but seems already all ok with this update :) Created attachment 954301 [details]
Network Manager log
And NetworkManager log become much smaller. NetworkManager-0.9.10.0-12.git20140704.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. Mikhail, do you know if you have more than one IPv6 router on your LAN? There should be no more than one physical router. [mikhail@localhost ~]$ ping6 -I enp2s0 FF02::2 PING FF02::2(ff02::2) from fe80::96de:80ff:fe6b:dd24 enp2s0: 56 data bytes 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=1 ttl=64 time=4.73 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=2 ttl=64 time=0.466 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=3 ttl=64 time=0.389 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=4 ttl=64 time=0.456 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=5 ttl=64 time=0.372 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=6 ttl=64 time=0.387 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=7 ttl=64 time=0.456 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=8 ttl=64 time=0.417 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=9 ttl=64 time=0.437 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=10 ttl=64 time=0.402 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=11 ttl=64 time=0.461 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=12 ttl=64 time=0.408 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=13 ttl=64 time=0.493 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=14 ttl=64 time=0.368 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=15 ttl=64 time=0.458 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=16 ttl=64 time=0.440 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=17 ttl=64 time=0.419 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=18 ttl=64 time=0.452 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=19 ttl=64 time=0.502 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=20 ttl=64 time=0.458 ms ^C --- FF02::2 ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 19001ms rtt min/avg/max/mdev = 0.368/0.648/4.738/0.939 ms [mikhail@localhost ~]$ ping6 -I enp2s0 FF02::1 PING FF02::1(ff02::1) from fe80::96de:80ff:fe6b:dd24 enp2s0: 56 data bytes 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=1 ttl=64 time=7.42 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=2 ttl=64 time=0.465 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=3 ttl=64 time=0.499 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=4 ttl=64 time=0.535 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=5 ttl=64 time=0.476 ms 64 bytes from fe80::faa9:d0ff:fe71:91ae: icmp_seq=5 ttl=64 time=48.0 ms (DUP!) 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=6 ttl=64 time=0.393 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=7 ttl=64 time=0.344 ms 64 bytes from fe80::faa9:d0ff:fe71:91ae: icmp_seq=7 ttl=64 time=92.0 ms (DUP!) 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=8 ttl=64 time=0.479 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=9 ttl=64 time=0.376 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=10 ttl=64 time=0.412 ms 64 bytes from fe80::faa9:d0ff:fe71:91ae: icmp_seq=10 ttl=64 time=62.0 ms (DUP!) 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=11 ttl=64 time=0.471 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=12 ttl=64 time=0.385 ms 64 bytes from fe80::faa9:d0ff:fe71:91ae: icmp_seq=12 ttl=64 time=107 ms (DUP!) 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=13 ttl=64 time=0.493 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=14 ttl=64 time=0.423 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=15 ttl=64 time=0.383 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=16 ttl=64 time=0.355 ms 64 bytes from fe80::faa9:d0ff:fe71:91ae: icmp_seq=16 ttl=64 time=97.1 ms (DUP!) 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=17 ttl=64 time=0.365 ms 64 bytes from fe80::16da:e9ff:fef8:a8f7: icmp_seq=18 ttl=64 time=0.311 ms ^C --- FF02::1 ping statistics --- 18 packets transmitted, 18 received, +5 duplicates, 0% packet loss, time 17006ms rtt min/avg/max/mdev = 0.311/18.305/107.269/34.833 ms I mean, more than one machine sending out IPv6 Router Advertisement packets. I have isolated a case where, if more than one router sends RAs, with one router including DNS servers/domains, and the other not, and the first router either stops responding or stops including servers/domains, NM could get into a tight loop sending a router solicitation, which could trigger the behavior you see. Do you know which machine is sending the router advertisements, and do you control it? (In reply to Dan Williams from comment #52) > I mean, more than one machine sending out IPv6 Router Advertisement packets. In my LAN several computers with Linux Fedora, and phone with Android, but when I created this strace only one computer was worked. Can phone sending out IPv6 Router Advertisement packets I don't know. > Do you know which machine is sending the router advertisements, and do you > control it? I can show my router settings. As I described above it can be phone, but I am not sure. Created attachment 1013767 [details]
router ipv6 settings
This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |