Description of problem: Trying to start evolution Version-Release number of selected component: evolution-data-server-3.10.4-1.fc20 Additional info: reporter: libreport-2.1.12 backtrace_rating: 4 cmdline: /usr/libexec/evolution-calendar-factory crash_function: g_thread_new executable: /usr/libexec/evolution-calendar-factory kernel: 3.13.3-201.fc20.x86_64 runlevel: N 5 type: CCpp uid: 1001 Truncated backtrace: Thread no. 1 (9 frames) #2 g_thread_new at gthread.c:840 #3 initialize_backend at /usr/lib64/evolution-data-server/calendar-backends/libecalbackendcaldav.so #4 caldav_do_open at /usr/lib64/evolution-data-server/calendar-backends/libecalbackendcaldav.so #5 cal_backend_open at /lib64/libedata-cal-1.2.so.23 #6 cal_backend_open_thread at /lib64/libedata-cal-1.2.so.23 #7 cal_backend_dispatch_thread at /lib64/libedata-cal-1.2.so.23 #8 io_job_thread at gioscheduler.c:89 #9 g_task_thread_pool_thread at gtask.c:1245 #11 g_thread_proxy at gthread.c:798
Created attachment 866216 [details] File: backtrace
Created attachment 866217 [details] File: cgroup
Created attachment 866218 [details] File: core_backtrace
Created attachment 866219 [details] File: dso_list
Created attachment 866220 [details] File: environ
Created attachment 866221 [details] File: limits
Created attachment 866222 [details] File: maps
Created attachment 866223 [details] File: open_fds
Created attachment 866224 [details] File: proc_pid_status
Created attachment 866225 [details] File: var_log_messages
Thanks for a bug report. The reason for the crash is: > creating thread '': Error creating thread: Resource temporarily unavailable Which means, in case of pthread library: EAGAIN - Insufficient resources to create another thread, or a system-imposed limit on the number of threads was encountered. The latter case may occur in two ways: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of process for a real user ID, was reached; or the kernel's system-wide limit on the number of threads, /proc/sys/kernel/threads-max, was reached. I do not see to many threads in the backtrace, only 18, which is not that many. The /proc/pid/status (comment #9) shows that you are close to memory allocation (the VmPeak and VmSize values), which can be probably the reason for this. That may also mean that the evolution-calendar-factory process is leaking memory somewhere, or your calendars are too large to fit into memory. I see multiple CalDAV calendars being opened during the crash (though the backtrace doesn't contain debug information for evolution-data-server, because you've update only the binary package, not the debuginfo package). Could you provide your calendar settings, please? Mainly how many calendars you've configured or certain types (On This Computer/CalDAV/On The Web/...) and how many of them are enabled for Reminder (Edit->Preferences->Calendar and Tasks->Reminders tab), because I suspect this being also related to evolution-alarm-notify process, which opens all calendars configured for Reminders, even if you run evolution in the Mailer part.
*** Bug 1068748 has been marked as a duplicate of this bug. ***
*** Bug 1068755 has been marked as a duplicate of this bug. ***
*** Bug 1068768 has been marked as a duplicate of this bug. ***
*** Bug 1068957 has been marked as a duplicate of this bug. ***
(In reply to Milan Crha from comment #11) > The /proc/pid/status (comment #9) shows that you are close to memory > allocation (the VmPeak and VmSize values), which can be probably the reason > for this. I'm not sure I'm following what you mean there. All that those numbers are telling you is that the VM of the process is close to 2GB. What's the problem there? This machine has 14 GB of ram and at least that much of swap. > That may also mean that the evolution-calendar-factory process is > leaking memory somewhere, or your calendars are too large to fit into > memory. Again, 14GB of memory on this machine. I think this is doubtful since I have all of the same calendars open all of the time. > I see multiple CalDAV calendars being opened during the crash > (though the backtrace doesn't contain debug information for > evolution-data-server, because you've update only the binary package, not > the debuginfo package). That would be abrt's fault then I guess. > Could you provide your calendar settings, please? Mainly how many calendars > you've configured or certain types (On This Computer/CalDAV/On The Web/...) 1 "On this computer" calendar, enabled 1 "Contacts" calendar, enabled 5 Google calendars, 4 enabled 2 Caldav calendar, enabled 1 EWS calendar, enabled But this has been my configuration for a lot of months, if not years. > and how many of them are enabled for Reminder (Edit->Preferences->Calendar > and Tasks->Reminders tab), All but 2.
Got two more of these this morning. Abrt doesn't allow you to provide more data to existing bugs though. I will keep them around in case you want me to manually pick anything out of them.
Thanks for the update. I see, I didn't check set limits, where you've basically none, and I assumed that you've got near the memory limit without making sure. I'm sorry, my fault. You mentioned at bug #1068768 comment #13 gnome-online-accounts, are any of your calendars configured through it? I'd expect that at least one Google calendar is, and probably also EWS? By the way, if this happened only recently, could the actual crashing be caused by certain update? It would be good to check what changed around the time you received these crashes, in /var/log/yum.log* files, and whether a downgrade of something from it would not fix this. I would focus on evolution packages and gnome-online-accounts related packages at the beginning.
Yeah, it would seem that one of my google calendars is configured through it but TBH, I don't need the calendar so I've disabled it this morning to see if the calendar factory stops crashing. It's crashed half a dozen times in just the last 20 minutes of trying to read e-mail. In fact I deleted it completely. I don't think my EWS calendar is using gnome-online-accounts though. It uses Active Directory (so kerberos and NTLM). There's nothing suspicious looking in the most recent round of yum updates.
Ahhh. So, I went back to a previous round of yum updates done on Feb. 20 and evolution was upgraded from evolution-3.10.3-1.fc20.x86_64 to evolution-3.10.4-1.fc20.x86_64. I suppose I can try downgrading that and see if the problem goes away.
(In reply to Brian J. Murrell from comment #20) > Ahhh. So, I went back to a previous round of yum updates done on Feb. 20 > and evolution was upgraded from evolution-3.10.3-1.fc20.x86_64 to > evolution-3.10.4-1.fc20.x86_64. > > I suppose I can try downgrading that and see if the problem goes away. It might be all group of evolution packages, aka evolution-data-server, evolution and evolution-ews in your case. (check the versions with `rpm -qa | grep evolution`, all of them should show the same main version, aka 3.10.4, maybe except of debuginfo pakcages, which are not tight to the binary packages that much). When downgrading, remember to downgrade all of the other evolution packages as well. After that, just restart the machine, to make sure the background processes are reloaded as expected.
*** Bug 1071300 has been marked as a duplicate of this bug. ***
*** Bug 1071561 has been marked as a duplicate of this bug. ***
I'm moving this to glibc, because the crash happens to multiple users, with various occasions (the last even on the very beginning of the process), but all due to the pthread_create() failing with EAGAIN, which propagates further in the call into: creating thread '': Error creating thread: Resource temporarily unavailable error. I cannot see how resources to create a new thread for a newly created process can fail, thus I suspect some update of glibc or any other related library, causes these crashes. I'd like to ask glibc maintainer for investigation, at least. Thanks in advance. P.S.: From `man pthread_create`: EAGAIN Insufficient resources to create another thread, or a system-imposed limit on the number of threads was encountered. The latter case may occur in two ways: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)),which limits the number of process for a real user ID, was reached; or the kernel's system-wide limit on the number of threads, /proc/sys/kernel/threads-max, was reached.
I think you can get EAGAIN from pthread_create if you run out of memory as well. Regardless, this doesn't sound like a bug in glibc to me. It sounds like exactly what the bug reports, there's some system resource that can not be acquired and thus pthread_create is reporting an error.
(In reply to Jeff Law from comment #25) > Regardless, this doesn't sound like a bug in glibc to me. It sounds like > exactly what the bug reports, there's some system resource that can not be > acquired and thus pthread_create is reporting an error. Yes. The point is that this begun to happen only recently, in Fedora 20, after some update of who-knows-which package. I do not believe this is the application issue, but some other part of the system. I currently see 37 bug reports mentioning g_thread_new(), most of them crashes in various applications, from which I would believe that this is not an application issue, but a library issue. Maybe bug #1011179 is the place where to place all these "duplicates"?
(In reply to Milan Crha from comment #26) > (In reply to Jeff Law from comment #25) > > Regardless, this doesn't sound like a bug in glibc to me. It sounds like > > exactly what the bug reports, there's some system resource that can not be > > acquired and thus pthread_create is reporting an error. > > Yes. The point is that this begun to happen only recently, in Fedora 20, > after some update of who-knows-which package. I do not believe this is the > application issue, but some other part of the system. > > I currently see 37 bug reports mentioning g_thread_new(), most of them > crashes in various applications, from which I would believe that this is not > an application issue, but a library issue. > > Maybe bug #1011179 is the place where to place all these "duplicates"? It is. You *must* handle EAGAIN either by waiting or handling the error gracefully. You've run out of system resources. It *is* possible to get EAGAIN with only 1 thread active. If you're fast enough and the kernel is slow enough at reaping the threads, you can run out of your ability to make new threads. In fact we see this in one of the glibc test suites and the only solution we had was to delay slightly to let the kernel catch up and reap the threads. *** This bug has been marked as a duplicate of bug 1011179 ***
I think the main point here is that, and while I cannot speak for everyone experiencing this new situation on F20, I can certainly say for myself that nothing has changed here except (Fedora) software. Same hardware, same work flows and patterns same everything as they have been for a number of Fedora release (3 at least) and even for F20 for a while before this problem started to happen. And it happens across many applications. I will even get fork() failures from bash trying to analyse the situation when this happens. So if you say it's a resource problem, let's prove it please. This system has 14GB of memory and resource meters show it nowhere near (not even half) full when this problem starts happening. The process table as witnessed by "ps" looks reasonable at the typical few hundred processes. $ cat /proc/sys/kernel/threads-max 223203 Seems like a limit to be reaching regularly. I'm not sure how to measure RLIMIT_NPROC. So what measurements can I do when this problem is happening to determine which resource is being exhausted?
(In reply to Brian J. Murrell from comment #28) > I think the main point here is that, and while I cannot speak for everyone > experiencing this new situation on F20, I can certainly say for myself that > nothing has changed here except (Fedora) software. Same hardware, same work > flows and patterns same everything as they have been for a number of Fedora > release (3 at least) and even for F20 for a while before this problem > started to happen. > > And it happens across many applications. I will even get fork() failures > from bash trying to analyse the situation when this happens. That seems like you're hitting the 1024 process security limit. e.g. cat /etc/security/limits.d/90-nproc.conf # Default limit for number of user's processes to prevent # accidental fork bombs. # See rhbz #432903 for reasoning. * soft nproc 1024 root soft nproc unlimited This is only applied in certain cases starting under PAM, but be aware that if you invoke a shell to run that process it might inherit these limits. I use threads and processes interchangeably since the kernel technically has just processes with some of them "named" threads. > So if you say it's a resource problem, let's prove it please. This system > has 14GB of memory and resource meters show it nowhere near (not even half) > full when this problem starts happening. The process table as witnessed by > "ps" looks reasonable at the typical few hundred processes. Despite having 14GB of physical memory the VMA for the process might be full of tiny allocations that prevent any new contiguous thread stacks from being allocated in the places the kernel has room for allocation. I doubt that's the problem, but it's possible. > $ cat /proc/sys/kernel/threads-max > 223203 You can probably reach that, but only if you have < 50% of physical memory free. That's the maximum computed by the kernel in order to limit the internal thread structures to 50% of total physical memory. A given process is limited by the kernel itself to threads-max/2, so you have at most 111601. Even then each thread you create is limited by needing to allocate the default thread stack e.g. 2MB on x86-64. > Seems like a limit to be reaching regularly. > > I'm not sure how to measure RLIMIT_NPROC. Use getrlimit. > So what measurements can I do when this problem is happening to determine > which resource is being exhausted? In this case `strace' is your best tool. `strace -ff -ttt -o test.log` then... Look for an mmap that returns ENOMEM. That would indicate you're running out of VMA space to place new thread stacks. This is translate into EAGAIN by the pthreads implementation. Look for any syscalls returning EAGAIN. For example clone syscalls that return EAGAIN mean you've hit a max process limit. I admit it's not easy to determine what's gone wrong, and if it's a resource leak someone will have to track it down to the component leaking the resources. I hope that helps.
(In reply to Carlos O'Donell from comment #29) > > That seems like you're hitting the 1024 process security limit. > > e.g. > cat /etc/security/limits.d/90-nproc.conf > # Default limit for number of user's processes to prevent > # accidental fork bombs. > # See rhbz #432903 for reasoning. > > * soft nproc 1024 > root soft nproc unlimited So surely I must be able to query how many of these are currently in use by a given user, yes? If so, how? Is there a /proc file for querying the current state of this setting or some other way? > Despite having 14GB of physical memory the VMA for the process might be full > of tiny allocations that prevent any new contiguous thread stacks from being > allocated in the places the kernel has room for allocation. I doubt that's > the problem, but it's possible. Right. So let's look for the more likely causes before we start looking for the unlikely, right? > > $ cat /proc/sys/kernel/threads-max > > 223203 > > You can probably reach that, but only if you have < 50% of physical memory > free. And how do I query the current number of these being used? > > Seems like a limit to be reaching regularly. > > > > I'm not sure how to measure RLIMIT_NPROC. > > Use getrlimit. Which is a syscall. Is there no already-available user-space query for this? > In this case `strace' is your best tool. I was thinking more along the lines of taking frequent and regular samples of all of the resources that you think might be getting exhausted to demonstrate which ones are or are not being exhausted. But let's play with strace... > Look for an mmap that returns ENOMEM. Nope, that's not it: # grep -i enomem test.log.* > That would indicate you're running out > of VMA space to place new thread stacks. So this is not the case. > Look for any syscalls returning EAGAIN. 1393936008.279564 clone(child_stack=0x7f73da7fbeb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f73da7fc9d0, tls=0x7f73da7fc700, child_tidptr=0x7f73da7fc9d0) = -1 EAGAIN (Resource temporarily unavailable) > For example clone syscalls that > return EAGAIN mean you've hit a max process limit. This is nproc limit above which defaults to 1024, right? But if I'm sceptical that this is actually the case, how can I query where this limit is at at any given time to see if these failures correlate to an elevated value of this limit?
*** Bug 1072085 has been marked as a duplicate of this bug. ***
I've increased nproc on all of the processes owned by my userid with: $ ps -ubrian fw | sed -e 1d -e 's/^ *//' -e 's/ .*//' | while read pid; do prlimit --nproc=10240: -p $pid; done Let's see if that alleviates the issue.
(In reply to Brian J. Murrell from comment #30) > (In reply to Carlos O'Donell from comment #29) > > > > That seems like you're hitting the 1024 process security limit. > > > > e.g. > > cat /etc/security/limits.d/90-nproc.conf > > # Default limit for number of user's processes to prevent > > # accidental fork bombs. > > # See rhbz #432903 for reasoning. > > > > * soft nproc 1024 > > root soft nproc unlimited > > So surely I must be able to query how many of these are currently in use by > a given user, yes? If so, how? Is there a /proc file for querying the > current state of this setting or some other way? You need to process /proc looking for everything that you started or use `ps' and count. There is no simple /proc file to give you a straight answer AFAIK. e.g. Total process/thread count for me on my box: [carlos@koi ~]$ ps -eLf | grep carlos | wc -l 255 That's an unloaded system. > > Despite having 14GB of physical memory the VMA for the process might be full > > of tiny allocations that prevent any new contiguous thread stacks from being > > allocated in the places the kernel has room for allocation. I doubt that's > > the problem, but it's possible. > > Right. So let's look for the more likely causes before we start looking for > the unlikely, right? Sure. This scenario is certainly less likely than others. > > > $ cat /proc/sys/kernel/threads-max > > > 223203 > > > > You can probably reach that, but only if you have < 50% of physical memory > > free. > > And how do I query the current number of these being used? cat /proc/$PID/status | grep Threads or ps -eLf | grep $PID > > > Seems like a limit to be reaching regularly. > > > > > > I'm not sure how to measure RLIMIT_NPROC. > > > > Use getrlimit. > > Which is a syscall. Is there no already-available user-space query for this? See above. > > In this case `strace' is your best tool. > > I was thinking more along the lines of taking frequent and regular samples > of all of the resources that you think might be getting exhausted to > demonstrate which ones are or are not being exhausted. But let's play with > strace... > > > Look for an mmap that returns ENOMEM. > > Nope, that's not it: > > # grep -i enomem test.log.* Good to know. > > That would indicate you're running out > > of VMA space to place new thread stacks. > > So this is not the case. > > > Look for any syscalls returning EAGAIN. > > 1393936008.279564 clone(child_stack=0x7f73da7fbeb0, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM| > CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, > parent_tidptr=0x7f73da7fc9d0, tls=0x7f73da7fc700, > child_tidptr=0x7f73da7fc9d0) = -1 EAGAIN (Resource temporarily unavailable) This is glibc's libpthread.so trying to execute pthread_create and failing with EAGAIN. > > For example clone syscalls that > > return EAGAIN mean you've hit a max process limit. > > This is nproc limit above which defaults to 1024, right? Very likely. The kernel, after being configured by userspace, is limiting your available threads. > But if I'm sceptical that this is actually the case, how can I query where > this limit is at at any given time to see if these failures correlate to an > elevated value of this limit? See above. Hopefully that helps.
(In reply to Carlos O'Donell from comment #33) > > You need to process /proc looking for everything that you started or use `ps' > and count. There is no simple /proc file to give you a straight answer AFAIK. > > e.g. > > Total process/thread count for me on my box: > [carlos@koi ~]$ ps -eLf | grep carlos | wc -l > 255 Aha! This was very useful: $ ps -ubrian -L | wc -l 1125 $ ps -ubrian -Lf | grep chrome | wc -l 797 It seems that 1024 is not really too reasonable a value when using chrome. > Hopefully that helps. Indeed. Worth noting that I have seen no crashes now either since incrasing nproc (admittedly, excessively, but really only as a debugging measure) to 1024. Probably setting it more permanently to 2K or 4K is reasonable. But I think the general concept/limit of 1K needs to be revisited. Perhaps it's just not reasonable any more.
More likely chrome is doing something that really is not reasonable. Almost 800 threads for something like a web browser looks like just very poor programming style.
*** Bug 1073553 has been marked as a duplicate of this bug. ***
*** Bug 1076179 has been marked as a duplicate of this bug. ***