Description of problem: Started ekiga normally, with 2 accounts enabled: nextiva and ekiga.net. Worked normally. Left ekiga running overnight, with both accounts registered. In the morning, ekiga was using near 100% CPU. I tried disabling both accounts, but this did not stop the run loop. Then I used kill -6 in the hopes it could shed some light on where it was running away. Version-Release number of selected component: ekiga-4.0.0-1.fc17 Additional info: libreport version: 2.0.18 abrt_version: 2.0.18 backtrace_rating: 4 cmdline: ekiga kernel: 3.6.9-2.fc17.i686
Created attachment 664415 [details] File: environ
Created attachment 664416 [details] File: backtrace
Created attachment 664417 [details] File: limits
Created attachment 664418 [details] File: gconf_subtree
Created attachment 664419 [details] File: cgroup
Created attachment 664420 [details] File: smolt_data
Created attachment 664421 [details] File: executable
Created attachment 664422 [details] File: maps
Created attachment 664423 [details] File: dso_list
Created attachment 664424 [details] File: proc_pid_status
Created attachment 664425 [details] File: build_ids
Created attachment 664426 [details] File: open_fds
Created attachment 664427 [details] File: var_log_messages
Ok, after running for 6 hours, it is now using about 25% CPU. I had noticed, at work, that shutting down ekiga at the end of the day would take a long time, with lots of CPU. I was going to report that, but when shutting down shortly after startup, there is no such CPU use or long delay on shutdown. I guess the next question is whether this is related to bug#885769
It is related (the problem appears when sengind OPTIONS packets), but I am not sure that it is the same bug. The best is to start ekiga like this: ekiga -d 5 2>log (note that we use -d 5) and when the CPU is 100% look with wireshark if many OPTIONS packets are sent each second, and finally kill -6 like you did. Give us the log and the backtrace and tell if numerous OPTIONS packets are sent each second. Thank you!
I waited until it was up to 25% again, and checked with wireshark. There are still exactly 2 (two) OPTIONS packets every 10 seconds. Maybe, it puts the requests on a queue, and doesn't remove them with the Forbidden response. As the queue gets longer and longer, it takes longer to scan for each response. With that theory, the CPU use should come in pulses spaced 10 secs apart. It is going to be one humongous log file.
You could be right.
After letting CPU use grow to 15%, the log file is 46M. Should I upload that? Compress it first? Interestingly, the CPU use does not grow on my work computer (at least it didn't grow significantly during an 8 hour workday). My work computer is a single CPU. My home computer is dual CPU. Could that be related?
Yes please, compress it. xz is probably best
Created attachment 671876 [details] ekiga -d 5 output while CPU use builds Several calls were made, as I left the log running while using the program. CPU use (while idle) slowly built up to around 15% when I quit the program. If left running, it builds to 100% of a single CPU in about 24 hours.
Ok, it *does* have the loop on my single processor work machine. It just takes longer to build up. After a day and a half, it was 70%. My cynical guess would be that CPU use grows twice as fast on a dual processor, and maybe 4 times as fast on a quad!
After looking at the log, the CPU time increase is not due to OPTIONS packets. I do not know the cause. This can be tackled in two manners: - either start in gdb, and when CPU grows up press several times ctrl-c and thread apply all bt, to see where the CPU is - or use a real profiling I have not yet tries these solutions, I need a bit of time to make a procedure. I will recontact you.
Stuart, the simplest method (for me) to understand what happens is to start ekiga inside gdb (gdb --args ekiga -d 4) and once CPU grows to 100% press in gdb ctrl-c and thread apply all bt, and do this 5 times at 5-10 secs interval, and give us the whole log. This will give us what the program is executing at those times.
Cand you do what I proposed in previous comment, Stuart? Maybe we can fix it for 4.0.1...
Yes, I just need to remember to do that when starting up in the morning ... :-P
Is there a variable I could patch from gdb to turn on -d 4 after already starting ekiga? I started ekiga with gdb last night, but forgot the -d 4. It is at 80% CPU now. I've gotten 3 backtraces (which are identical).
There is not. Give me the output of thread apply all bt in all the cases, perhaps it is sufficient.
Created attachment 686877 [details] 5 thread dumps while looping at 100% I forgot to add the -d 4, but hopefully these will be helpful. It does show the time is spent in PAbstractList::GetObjectsIndex. Some list is getting really long! Here are the tops of threads that were not at vsyscall: #0 0x450486c3 in PAbstractList::GetObjectsIndex(PObject const*) const () from /lib/libpt.so.2.10.9 #1 0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9 #2 0x491ec6c0 in operator++ (this=<optimized out>) at /usr/include/ptlib/safecoll.h:896 #3 SIPEndPoint::GarbageCollection (this=0x81edf70) at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:413 #4 0x48dafe4e in OpalManager::GarbageCollection (this=this@entry=0x81e2750) at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1783 #5 0x48daff08 in OpalManager::GarbageMain (this=0x81e2750) at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1806 #0 0x450486c3 in PAbstractList::GetObjectsIndex(PObject const*) const () from /lib/libpt.so.2.10.9 #1 0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9 #2 0x491ed888 in operator++ (this=<optimized out>) at /usr/include/ptlib/safecoll.h:896 #3 SIPEndPoint::NATBindingRefresh (this=<optimized out>) at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:215 #4 0x491f8dc5 in SIPEndPoint::NATBindingRefresh_PNotifier::Call ( this=0x81e2fe0, note=..., extra=1) at /home/stuart/rpm/BUILD/opal-3.10.9/include/sip/sipep.h:988 #0 0x450486c0 in PAbstractList::GetObjectsIndex(PObject const*) const () from /lib/libpt.so.2.10.9 #1 0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9 #2 0x491ec6c0 in operator++ (this=<optimized out>) at /usr/include/ptlib/safecoll.h:896 #3 SIPEndPoint::GarbageCollection (this=0x81edf70) at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:413 #4 0x48dafe4e in OpalManager::GarbageCollection (this=this@entry=0x81e2750) at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1783 #5 0x48daff08 in OpalManager::GarbageMain (this=0x81e2750) at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1806 #0 0x450486c3 in PAbstractList::GetObjectsIndex(PObject const*) const () from /lib/libpt.so.2.10.9 #1 0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9 #2 0x491ec6c0 in operator++ (this=<optimized out>) at /usr/include/ptlib/safecoll.h:896 #3 SIPEndPoint::GarbageCollection (this=0x81edf70) at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:413 #4 0x48dafe4e in OpalManager::GarbageCollection (this=this@entry=0x81e2750) at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1783 #5 0x48daff08 in OpalManager::GarbageMain (this=0x81e2750) at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1806
This is fixed with http://opalvoip.svn.sourceforge.net/viewvc/opalvoip?view=revision&revision=29036.
Well, it no longer goes into the run loop, but now it doesn't send OPTIONS packets at all, and NAT times out after about 10 minutes on my networks, cutting off my phone calls.
I verified that backing out the patch in comment#29 fixes the NAT timeout problem (resulting in phone calls getting cut off) - but brings back the run loop problem. So comment#29 is not the correct fix.
It is fixed by a later commit. Check 4.0.1 once it appears in fedora and close this bug afterwards. Thank you for bug reporting so far.
ptlib-2.10.10-1.fc17,opal-3.10.10-1.fc17,ekiga-4.0.1-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/ptlib-2.10.10-1.fc17,opal-3.10.10-1.fc17,ekiga-4.0.1-1.fc17
ptlib-2.10.10-1.fc18,opal-3.10.10-1.fc18,ekiga-4.0.1-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/ptlib-2.10.10-1.fc18,opal-3.10.10-1.fc18,ekiga-4.0.1-1.fc18
Package ptlib-2.10.10-1.fc17, opal-3.10.10-1.fc17, ekiga-4.0.1-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing ptlib-2.10.10-1.fc17 opal-3.10.10-1.fc17 ekiga-4.0.1-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-2890/ptlib-2.10.10-1.fc17,opal-3.10.10-1.fc17,ekiga-4.0.1-1.fc17 then log in and leave karma (feedback).
ptlib-2.10.10-1.fc18, opal-3.10.10-1.fc18, ekiga-4.0.1-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
ptlib-2.10.10-1.fc17, opal-3.10.10-1.fc17, ekiga-4.0.1-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.