Bug 384341
Summary: | race on message destruction in rsyslog | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Anton Noskov <lantoni> | ||||
Component: | rsyslog | Assignee: | Peter Vrabec <pvrabec> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 8 | CC: | ben, dkopecek, kas, ohtmvyyn, peterkcc2000, pw, sgrubb, steve, theinric, tmraz, zing | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | rsyslog-1.19.11-3.fc8 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-01-07 17:42:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 178855 | ||||||
Attachments: |
|
Description
Anton Noskov
2007-11-15 11:00:54 UTC
Could you try this release, please. http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-3.fc7.src.rpm Also, it will help a lot if you install debuginfo package. This problem is very hard to track, since we can't reproduce it. Any help is appreciated. If you have some special configuration, your system is under heavy load or anything else send us info please. Output from gdb rsyslog, after install rsyslog-debuginfo #0 0x0000003b8cc30ec5 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003b8cc32970 in abort () at abort.c:88 #2 0x0000003b8cc6b0db in __libc_message (do_abort=2, fmt=0x3b8cd22f08 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 #3 0x0000003b8cc7318b in _int_malloc (av=0x2aaaac000020, bytes=230) at malloc.c:5891 #4 0x0000003b8cc748fa in __libc_malloc (bytes=230) at malloc.c:3553 #5 0x0000000000411700 in MsgSetMSG (pMsg=0x2aaaac005390, pszMSG=0x2aaaac0041c0 " lALEJHc3030874: ruleset=check_rcpt, arg1=<kladdd.ru>, relay=190-51-190-182.speedy.com.ar [190.51.190.182] (may be forged), reject=550 5.7.1 <kladdd.ru>... Relaying denied. IP name pos"...) at msg.c:1121 #6 0x0000000000407191 in logmsg (pri=<value optimized out>, pMsg=0x2aaaac005390, flags=-1409266960) at syslogd.c:3180 #7 0x0000000000409da3 in printline (hname=0x621900 "univ", msg=0x7fff808bfd84 "Nov 21 17:19:23 sendmail[30874]: lALEJHc3030874: ruleset=check_rcpt, arg1=<kladdd.ru>, relay=190-51-190-182.speedy.com.ar [190.51.190.182] (may be forged), reject=550 5.7.1 <kladdd.ru>"..., bParseHost=0) at syslogd.c:2213 #8 0x0000000000409f8d in printchopped (hname=0x621900 "univ", msg=<value optimized out>, len=93, fd=<value optimized out>, bParseHost=0) at syslogd.c:2139 #9 0x000000000040b5f4 in mainloop () at syslogd.c:5607 #10 0x000000000040c254 in main (argc=<value optimized out>, argv=<value optimized out>) at syslogd.c:6324 could you try this rpm please: http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-1.mudflap.src.rpm /etc/init.s/rsyslog (ulimit) and etc/sysconfig/rsyslog (log mudflap to /tmp/rsyslog.mudflap, + "-n" option for output ) are little changed. Binaries are not striped. If you reproduce the segfault, send us: /tmp/rsyslog.mudflap coredump binary thnx. a lot rpm update, this one is much better: http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-2.mudflap.src.rpm I wonder if I'm seeing the same bug. I have two high-volume email servers, and rsyslogd just dies with nothing logged. I installed 1.19.10-2.mudflap a week ago. It hasn't crashed yet on either of the two systems. (Stock F8 rsyslog died after less than a day on both systems.) 1.19.11-1.fc8 got picked up by the automatic updates. Now rsyslog is dying after only a few minutes. I'm going to try again with this: http://ftp.kspei.com/pub/steve/rpms/rsyslog-1.19.12-1.mudflap.src.rpm http://ftp.kspei.com/pub/steve/rpms/rsyslog/rsyslog.spec * Wed Dec 05 2007 Steven Pritchard <steve> 1.19.12-1.mudflap - Update to 1.19.12. - Build with debugging, etc. - Fix Source0 URL. We see the same problem, in production we last a few hours but if we turn debug on with "-d" we get about 5 minutes worth of logging out of it before it crashes. Debug info as follows: Successful select, descriptor count = 1, Activity on: 3 -1208858944: Message from UNIX socket: #3 -1208858944: Message length: 79, File descriptor: 3. *** glibc detected *** rsyslogd: corrupted double-linked list: 0x081487c8 *** ======= Backtrace: ========= /lib/libc.so.6[0x1d2537] /lib/libc.so.6(__libc_calloc+0x8a)[0x1d389a] rsyslogd(MsgConstruct+0x1b)[0x80588eb] rsyslogd[0x8050946] rsyslogd(printchopped+0x110)[0x8050b60] rsyslogd[0x80522fa] rsyslogd(main+0xd19)[0x8053ce9] /lib/libc.so.6(__libc_start_main+0xe0)[0x17e390] rsyslogd[0x804c4e1] ======= Memory map: ======== 00110000-0012b000 r-xp 00000000 08:05 82573 /lib/ld-2.7.so 0012b000-0012c000 r-xp 0001a000 08:05 82573 /lib/ld-2.7.so 0012c000-0012d000 rwxp 0001b000 08:05 82573 /lib/ld-2.7.so 0012d000-0012e000 r-xp 0012d000 00:00 0 [vdso] 0012e000-00140000 r-xp 00000000 08:05 83045 /lib/libz.so.1.2.3 00140000-00141000 rwxp 00011000 08:05 83045 /lib/libz.so.1.2.3 00141000-00144000 r-xp 00000000 08:05 82588 /lib/libdl-2.7.so 00144000-00145000 r-xp 00002000 08:05 82588 /lib/libdl-2.7.so 00145000-00146000 rwxp 00003000 08:05 82588 /lib/libdl-2.7.so 00146000-0014d000 r-xp 00000000 08:05 82659 /lib/librt-2.7.so 0014d000-0014e000 r-xp 00007000 08:05 82659 /lib/librt-2.7.so 0014e000-0014f000 rwxp 00008000 08:05 82659 /lib/librt-2.7.so 0014f000-00164000 r-xp 00000000 08:05 82657 /lib/libpthread-2.7.so 00164000-00165000 r-xp 00014000 08:05 82657 /lib/libpthread-2.7.so 00165000-00166000 rwxp 00015000 08:05 82657 /lib/libpthread-2.7.so 00166000-00168000 rwxp 00166000 00:00 0 00168000-002bb000 r-xp 00000000 08:05 82560 /lib/libc-2.7.so 002bb000-002bd000 r-xp 00153000 08:05 82560 /lib/libc-2.7.so 002bd000-002be000 rwxp 00155000 08:05 82560 /lib/libc-2.7.so 002be000-002c1000 rwxp 002be000 00:00 0 002c1000-002cb000 r-xp 00000000 08:05 82650 /lib/libnss_files-2.7.so 002cb000-002cc000 r-xp 00009000 08:05 82650 /lib/libnss_files-2.7.so 002cc000-002cd000 rwxp 0000a000 08:05 82650 /lib/libnss_files-2.7.so 002cd000-002d8000 r-xp 00000000 08:05 82656 /lib/libnss_nisplus-2.7.so 002d8000-002d9000 r-xp 0000a000 08:05 82656 /lib/libnss_nisplus-2.7.so 002d9000-002da000 rwxp 0000b000 08:05 82656 /lib/libnss_nisplus-2.7.so 002da000-002ef000 r-xp 00000000 08:05 82616 /lib/libnsl-2.7.so 002ef000-002f0000 r-xp 00014000 08:05 82616 /lib/libnsl-2.7.so 002f0000-002f1000 rwxp 00015000 08:05 82616 /lib/libnsl-2.7.so 002f1000-002f3000 rwxp 002f1000 00:00 0 002f3000-002f7000 r-xp 00000000 08:05 82648 /lib/libnss_dns-2.7.so 002f7000-002f8000 r-xp 00003000 08:05 82648 /lib/libnss_dns-2.7.so 002f8000-002f9000 rwxp 00004000 08:05 82648 /lib/libnss_dns-2.7.so 002f9000-00309000 r-xp 00000000 08:05 82589 /lib/libresolv-2.7.so 00309000-0030a000 r-xp 00010000 08:05 82589 /lib/libresolv-2.7.so 0030a000-0030b000 rwxp 00011000 08:05 82589 /lib/libresolv-2.7.so 0030b000-0030d000 rwxp 0030b000 00:00 0 0030d000-00318000 r-xp 00000000 08:05 82301 /lib/libgcc_s-4.1.2-20070925.so.1 00318000-00319000 rwxp 0000a000 08:05 82301 /lib/libgcc_s-4.1.2-20070925.so.1 08048000-08068000 r-xp 00000000 08:05 82138 /sbin/rsyslogd 08068000-08069000 rw-p 00020000 08:05 82138 /sbin/rsyslogd 08138000-08159000 rw-p 08138000 00:00 0 b7400000-b7421000 rw-p b7400000 00:00 0 b7421000-b7500000 ---p b7421000 00:00 0 b7523000-b7524000 ---p b7523000 00:00 0 b7524000-b7f27000 rw-p b7524000 00:00 0 b7f31000-b7f32000 rw-p b7f31000 00:00 0 bfbc0000-bfbd5000 rw-p bffea000 00:00 0 [stack] Aborted Dear reproducers, there will be new rsyslog release in F8 testing soon. It is a big possibility, that we fix the problem. Your feedback is appreciated, thnx. https://admin.fedoraproject.org/updates/F8/pending/rsyslog-1.19.11-2.fc8 *** Bug 426003 has been marked as a duplicate of this bug. *** 1.19.11-2 Has lasted 20 hours and counting - 1.19.11-1 was failing very rapidly in the same circumstances so my inclination is to suggest that this fixes the bug although I'd suggest giving it another couple of days of runtime before we're able to say that 100% conclusively. In the mean time though if 1.19.11-1 is giving you pain then 1.19.11-2 is worth a go. *** Bug 426087 has been marked as a duplicate of this bug. *** The 1.19.11-2 will leak message structs though which is bad as well. The proper fix is to either have atomic operations used for reference counting, or to ensure that msg object cannot be destroyed in multiple threads concurrently. Or we could compile rsyslog without pthreads support but this is suboptimal. To put the leaking into context in 24 hours running 1.19.11-2 and logging about 1GB of log messages Virt is unchanged, Res has increased by 88KB. We can do better yes but that is tolerable for the moment and an awful lot better than the continual crashing of the 1.9.11-1 and earlier. Created attachment 290140 [details]
This patch should take care of freeing messages.
(In reply to comment #15) > To put the leaking into context in 24 hours running 1.19.11-2 and logging about > 1GB of log messages Virt is unchanged, Res has increased by 88KB. We can do > better yes but that is tolerable for the moment and an awful lot better than the > continual crashing of the 1.9.11-1 and earlier. Actually the messages were only leaked when the queue was full (10K msgs) or when parsing the message with no free memory, which are pretty rare situations. rsyslog-1.19.11-2.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update rsyslog' I hope that problem is fixed, since there aren't any new reports. rsyslog-1.19.11-3.fc8 is available in stable repos. thnx. you guys. *** Bug 468367 has been marked as a duplicate of this bug. *** |