Bug 887602 - ekiga-4.0.0-1.fc17: run loop after 1 day uptime
Summary: ekiga-4.0.0-1.fc17: run loop after 1 day uptime
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: ekiga
Version: 17
Hardware: i686
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Peter Robinson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:fd91e5983b18f812afa8872cf89...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-16 18:35 UTC by Stuart D Gathman
Modified: 2013-03-03 22:39 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-03-03 22:33:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: environ (1.54 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: backtrace (48.95 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: limits (1.29 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: gconf_subtree (26.06 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: cgroup (129 bytes, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: smolt_data (2.87 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: executable (14 bytes, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: maps (47.52 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: dso_list (11.68 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: proc_pid_status (783 bytes, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: build_ids (6.01 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: open_fds (1.52 KB, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
File: var_log_messages (228 bytes, text/plain)
2012-12-16 18:36 UTC, Stuart D Gathman
no flags Details
ekiga -d 5 output while CPU use builds (2.23 MB, application/x-xz)
2013-01-03 05:38 UTC, Stuart D Gathman
no flags Details
5 thread dumps while looping at 100% (69.05 KB, text/plain)
2013-01-24 17:57 UTC, Stuart D Gathman
no flags Details

Description Stuart D Gathman 2012-12-16 18:35:55 UTC
Description of problem:
Started ekiga normally, with 2 accounts enabled: nextiva and ekiga.net.  Worked normally.  Left ekiga running overnight, with both accounts registered.  In the morning, ekiga was using near 100% CPU.  I tried disabling both accounts, but this did not stop the run loop.  Then I used kill -6 in the hopes it could shed some light on where it was running away.

Version-Release number of selected component:
ekiga-4.0.0-1.fc17

Additional info:
libreport version: 2.0.18
abrt_version:   2.0.18
backtrace_rating: 4
cmdline:        ekiga
kernel:         3.6.9-2.fc17.i686

Comment 1 Stuart D Gathman 2012-12-16 18:36:00 UTC
Created attachment 664415 [details]
File: environ

Comment 2 Stuart D Gathman 2012-12-16 18:36:02 UTC
Created attachment 664416 [details]
File: backtrace

Comment 3 Stuart D Gathman 2012-12-16 18:36:04 UTC
Created attachment 664417 [details]
File: limits

Comment 4 Stuart D Gathman 2012-12-16 18:36:05 UTC
Created attachment 664418 [details]
File: gconf_subtree

Comment 5 Stuart D Gathman 2012-12-16 18:36:07 UTC
Created attachment 664419 [details]
File: cgroup

Comment 6 Stuart D Gathman 2012-12-16 18:36:08 UTC
Created attachment 664420 [details]
File: smolt_data

Comment 7 Stuart D Gathman 2012-12-16 18:36:10 UTC
Created attachment 664421 [details]
File: executable

Comment 8 Stuart D Gathman 2012-12-16 18:36:11 UTC
Created attachment 664422 [details]
File: maps

Comment 9 Stuart D Gathman 2012-12-16 18:36:13 UTC
Created attachment 664423 [details]
File: dso_list

Comment 10 Stuart D Gathman 2012-12-16 18:36:16 UTC
Created attachment 664424 [details]
File: proc_pid_status

Comment 11 Stuart D Gathman 2012-12-16 18:36:18 UTC
Created attachment 664425 [details]
File: build_ids

Comment 12 Stuart D Gathman 2012-12-16 18:36:19 UTC
Created attachment 664426 [details]
File: open_fds

Comment 13 Stuart D Gathman 2012-12-16 18:36:21 UTC
Created attachment 664427 [details]
File: var_log_messages

Comment 14 Stuart D Gathman 2012-12-17 01:48:56 UTC
Ok, after running for 6 hours, it is now using about 25% CPU.  I had noticed, at work, that shutting down ekiga at the end of the day would take a long time, with lots of CPU.  I was going to report that, but when shutting down shortly after startup, there is no such CPU use or long delay on shutdown.

I guess the next question is whether this is related to bug#885769

Comment 15 Eugen Dedu 2012-12-17 21:33:50 UTC
It is related (the problem appears when sengind OPTIONS packets), but I am not sure that it is the same bug.

The best is to start ekiga like this:
ekiga -d 5 2>log
(note that we use -d 5)
and when the CPU is 100% look with wireshark if many OPTIONS packets are sent each second, and finally kill -6 like you did.  Give us the log and the backtrace and tell if numerous OPTIONS packets are sent each second.

Thank you!

Comment 16 Stuart D Gathman 2012-12-18 01:03:51 UTC
I waited until it was up to 25% again, and checked with wireshark.  There are still exactly 2 (two) OPTIONS packets every 10 seconds.  Maybe, it puts the requests on a queue, and doesn't remove them with the Forbidden response.  As the queue gets longer and longer, it takes longer to scan for each response.  With that theory, the CPU use should come in pulses spaced 10 secs apart.

It is going to be one humongous log file.

Comment 17 Eugen Dedu 2012-12-18 07:12:34 UTC
You could be right.

Comment 18 Stuart D Gathman 2013-01-02 19:35:51 UTC
After letting CPU use grow to 15%, the log file is 46M.  Should I upload that?  Compress it first?

Interestingly, the CPU use does not grow on my work computer (at least it didn't grow significantly during an 8 hour workday).  My work computer is a single CPU.  My home computer is dual CPU.  Could that be related?

Comment 19 Peter Robinson 2013-01-02 21:51:19 UTC
Yes please, compress it. xz is probably best

Comment 20 Stuart D Gathman 2013-01-03 05:38:40 UTC
Created attachment 671876 [details]
ekiga -d 5 output while CPU use builds

Several calls were made, as I left the log running while using the program.  CPU use (while idle) slowly built up to around 15% when I quit the program.  If left running, it builds to 100% of a single CPU in about 24 hours.

Comment 21 Stuart D Gathman 2013-01-04 18:18:22 UTC
Ok, it *does* have the loop on my single processor work machine.  It just takes longer to build up.  After a day and a half, it was 70%.  My cynical guess would be that CPU use grows twice as fast on a dual processor, and maybe 4 times as fast on a quad!

Comment 22 Eugen Dedu 2013-01-09 20:07:25 UTC
After looking at the log, the CPU time increase is not due to OPTIONS packets.  I do not know the cause.  This can be tackled in two manners:
- either start in gdb, and when CPU grows up press several times ctrl-c and thread apply all bt, to see where the CPU is
- or use a real profiling

I have not yet tries these solutions, I need a bit of time to make a procedure.  I will recontact you.

Comment 23 Eugen Dedu 2013-01-11 09:11:18 UTC
Stuart, the simplest method (for me) to understand what happens is to start ekiga inside gdb (gdb --args ekiga -d 4) and once CPU grows to 100% press in gdb ctrl-c and thread apply all bt, and do this 5 times at 5-10 secs interval, and give us the whole log.  This will give us what the program is executing at those times.

Comment 24 Eugen Dedu 2013-01-20 22:33:38 UTC
Cand you do what I proposed in previous comment, Stuart?  Maybe we can fix it for 4.0.1...

Comment 25 Stuart D Gathman 2013-01-22 17:31:02 UTC
Yes, I just need to remember to do that when starting up in the morning ... :-P

Comment 26 Stuart D Gathman 2013-01-24 15:09:01 UTC
Is there a variable I could patch from gdb to turn on -d 4 after already starting ekiga?  I started ekiga with gdb last night, but forgot the -d 4.  It is at 80% CPU now.  I've gotten 3 backtraces (which are identical).

Comment 27 Eugen Dedu 2013-01-24 15:41:41 UTC
There is not.

Give me the output of thread apply all bt in all the cases, perhaps it is sufficient.

Comment 28 Stuart D Gathman 2013-01-24 17:57:55 UTC
Created attachment 686877 [details]
5 thread dumps while looping at 100%

I forgot to add the -d 4, but hopefully these will be helpful.  It does show the time is spent in PAbstractList::GetObjectsIndex.  Some list is getting really long!  Here are the tops of threads that were not at vsyscall:

#0  0x450486c3 in PAbstractList::GetObjectsIndex(PObject const*) const ()
   from /lib/libpt.so.2.10.9
#1  0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9
#2  0x491ec6c0 in operator++ (this=<optimized out>)
    at /usr/include/ptlib/safecoll.h:896
#3  SIPEndPoint::GarbageCollection (this=0x81edf70)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:413
#4  0x48dafe4e in OpalManager::GarbageCollection (this=this@entry=0x81e2750)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1783
#5  0x48daff08 in OpalManager::GarbageMain (this=0x81e2750)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1806

#0  0x450486c3 in PAbstractList::GetObjectsIndex(PObject const*) const ()
   from /lib/libpt.so.2.10.9
#1  0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9
#2  0x491ed888 in operator++ (this=<optimized out>)
    at /usr/include/ptlib/safecoll.h:896
#3  SIPEndPoint::NATBindingRefresh (this=<optimized out>)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:215
#4  0x491f8dc5 in SIPEndPoint::NATBindingRefresh_PNotifier::Call (
    this=0x81e2fe0, note=..., extra=1)
    at /home/stuart/rpm/BUILD/opal-3.10.9/include/sip/sipep.h:988

#0  0x450486c0 in PAbstractList::GetObjectsIndex(PObject const*) const ()
   from /lib/libpt.so.2.10.9
#1  0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9
#2  0x491ec6c0 in operator++ (this=<optimized out>)
    at /usr/include/ptlib/safecoll.h:896
#3  SIPEndPoint::GarbageCollection (this=0x81edf70)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:413
#4  0x48dafe4e in OpalManager::GarbageCollection (this=this@entry=0x81e2750)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1783
#5  0x48daff08 in OpalManager::GarbageMain (this=0x81e2750)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1806

#0  0x450486c3 in PAbstractList::GetObjectsIndex(PObject const*) const ()
   from /lib/libpt.so.2.10.9
#1  0x4504730f in PSafePtrBase::Next() () from /lib/libpt.so.2.10.9
#2  0x491ec6c0 in operator++ (this=<optimized out>)
    at /usr/include/ptlib/safecoll.h:896
#3  SIPEndPoint::GarbageCollection (this=0x81edf70)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/sip/sipep.cxx:413
#4  0x48dafe4e in OpalManager::GarbageCollection (this=this@entry=0x81e2750)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1783
#5  0x48daff08 in OpalManager::GarbageMain (this=0x81e2750)
    at /home/stuart/rpm/BUILD/opal-3.10.9/src/opal/manager.cxx:1806

Comment 30 Stuart D Gathman 2013-02-14 19:12:23 UTC
Well, it no longer goes into the run loop, but now it doesn't send OPTIONS packets at all, and NAT times out after about 10 minutes on my networks, cutting off my phone calls.

Comment 31 Stuart D Gathman 2013-02-20 15:54:51 UTC
I verified that backing out the patch in comment#29 fixes the NAT timeout problem (resulting in phone calls getting cut off) - but brings back the run loop problem.  So comment#29 is not the correct fix.

Comment 32 Eugen Dedu 2013-02-20 22:36:37 UTC
It is fixed by a later commit.  Check 4.0.1 once it appears in fedora and close this bug afterwards.  Thank you for bug reporting so far.

Comment 33 Fedora Update System 2013-02-21 16:13:35 UTC
ptlib-2.10.10-1.fc17,opal-3.10.10-1.fc17,ekiga-4.0.1-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/ptlib-2.10.10-1.fc17,opal-3.10.10-1.fc17,ekiga-4.0.1-1.fc17

Comment 34 Fedora Update System 2013-02-21 16:17:35 UTC
ptlib-2.10.10-1.fc18,opal-3.10.10-1.fc18,ekiga-4.0.1-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/ptlib-2.10.10-1.fc18,opal-3.10.10-1.fc18,ekiga-4.0.1-1.fc18

Comment 35 Fedora Update System 2013-02-24 08:32:20 UTC
Package ptlib-2.10.10-1.fc17, opal-3.10.10-1.fc17, ekiga-4.0.1-1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing ptlib-2.10.10-1.fc17 opal-3.10.10-1.fc17 ekiga-4.0.1-1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-2890/ptlib-2.10.10-1.fc17,opal-3.10.10-1.fc17,ekiga-4.0.1-1.fc17
then log in and leave karma (feedback).

Comment 36 Fedora Update System 2013-03-03 22:33:12 UTC
ptlib-2.10.10-1.fc18, opal-3.10.10-1.fc18, ekiga-4.0.1-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 37 Fedora Update System 2013-03-03 22:39:30 UTC
ptlib-2.10.10-1.fc17, opal-3.10.10-1.fc17, ekiga-4.0.1-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.