Description of problem: I am getting panics in dovecot. I have installed abrt, but haven't seen one yet, so here is what I have so far: Apr 17 06:21:37 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used) Apr 17 06:21:37 dovecot: auth: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x3cbca) [0x7f2b9876abca] -> /usr/lib64/dovecot/libdovecot.so.0(+0x3cc0e) [0x7f2b9876ac0e] -> /usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7f2b98744db7] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x2e89) [0x7f2b972a4e89] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x3bee) [0x7f2b972a5bee] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(db_ldap_request+0xa1) [0x7f2b972a6441] -> dovecot/auth(auth_request_lookup_user+0x59) [0x40fa99] -> dovecot/auth(auth_request_handler_master_request+0xb3) [0x411743] -> dovecot/auth() [0x40c816] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x48) [0x7f2b987766c8] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f2b98777757] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f2b98776658] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f2b98764973] -> dovecot/auth(main+0x2a9) [0x409d69] -> /lib64/libc.so.6(__libc_start_main+0xed) [0x7f2b97b6243d] -> dovecot/auth() [0x409e79] I have seen it on two different machines. One a recent fresh install of F15 (as it is now) and one an upgrade from F14. Version-Release number of selected component (if applicable): dovecot-2.0.12-2.fc15.x86_64
Was this configuration working with 2.0.11 or is this first dovecot version you've tried to use? What is your output of doveconf -n ? Once you are able to get regular backtrace, attach it here, please. thanks
dovecot-2.0.11-2.fc14.x86_64 works (except LDAP now uses Kerberos on the server in question for authentication, other than the changes to make that happen the configuration is the same). So, some changes, I do not know if it works or not on 2.0.11-2 with those changes. I may be able to do this on another machine soon and let you know. Attaching doveconf -n.
Created attachment 492888 [details] doveconf -n I have masked host names, beyond that this is exactly what is returned.
This may be a bug in openldap (I believe that is what dovecot uses). The problem only shows when enabling Kerberos for LDAP sasl auth. So, openldap? SASL? The backtrace I will be providing is from dovecot-2.0.11-2.fc14.x86_64. I can't get debuginfo packages from F15 at the moment.
Created attachment 493057 [details] Backtrace of dovecot-2.0.11-2.fc14.x86_64 w/ problem mentioned for 2.0.12-x.fc15 This seems to also happen only via the lmtp code path of dovecot deliver. I am not seeing it from logins, at least not that I recognize, although once lmtp code path causes a problem logins do as well.
I forgot to mention F15 and F14 are both current.
The only changes were (in my ldap.conf for dovecot -- changes are new lines starting with *, * is not in the conf, just showing changes): hosts = example.org base = dc=example,dc=org ldap_version = 3 user_attrs = userPrincipalName=user user_filter = (&(objectClass=person)(|(mail=%u)(sAMAccountName=%u)(userPrincipalName=%u))) *dn = MACHINEACCOUNT$@EXAMPLE.ORG *sasl_bind = yes *sasl_mech = GSSAPI *sasl_realm = EXAMPLE.ORG *#sasl_authz_id = MACHINEACCOUNT$@EXAMPE.ORG # For using doveadm -A: iterate_attrs = userPrincipalName=user iterate_filter = (objectClass=person) in dovecot.conf: import_environment = TZ KRB5CCNAME=/etc/dovecot/krb5.cc I have included this information in a mailing directly to dovecot as well.
> I have included this information in a mailing directly to dovecot as well. thanks, I was going to forward this information there
Created attachment 494920 [details] additional backtrace from machine st Some of these backtraces bay be duplicates as these come from different machines.
Created attachment 494921 [details] another from st
Sorry for responding to my own posts. Neither of the following fix it: import_environment = HOME USER TZ KRB5CCNAME=/etc/dovecot/krb5.cc LISTEN_FDS LISTEN_PIDS GDB import_environment = KRB5CCNAME=/etc/dovecot/krb5.cc I am finding it interested that abrt seems to say that environment is empty/corrupted.
It should be noted that machines with more memory pressure crash more often.
Created attachment 494932 [details] backtrace from pp
Created attachment 494933 [details] another from pp
Would it be possible to get a build with http://hg.dovecot.org/dovecot-2.0/rev/3ada82147977 built so I can try this out as Timo is wondering if this will solve the problem.
(In reply to comment #15) > Would it be possible to get a build with > http://hg.dovecot.org/dovecot-2.0/rev/3ada82147977 built so I can try this out > as Timo is wondering if this will solve the problem. sure, just a sec ;)
packages are here: http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3062376/
Thank you so much. I have my hands full rebuilding servers and other things. This helps me tremendously. I have just restarted the three dovecot servers in question with these patches. I will post the results in a few days after they have had a time to misbehave. Again, my profound thank you!
This may have indeed fixed the bug. I will wait a few days before saying definitively.
It definitely did not fix this bug. It's just 'assert' - so it checks whether it gets broken sooner (before place it crashed last time). There even can't be any side effect of this change making it work. Just wait till it crashes again. This change was just for making it easier to find what causes this crash. Btw, don't forget to grab dovecot-debuginfo package from link I've posted. It'll be required to get new backtrace. Because this is not official build, it will be automatically deleted by build system soon.
Yes. I am a C coder. I thought it was strange he suggested the assert as a fix (instead of finding a bug). The funny thing is, in 24 hours, it used to crash a dozen times or more, on two of the three machines (the other was much slower). I do not have any asserts in /var/log/maillog that aren't "normal" also, no new crashes. I will keep letting it run. Is this possibly a glibc/gcc bug that has been fixed by a recompile?
This backtrace does not look strange enough for gcc/glibc bug. Well, I can't disprove it, but chances are really extremely low. Maybe you just need specific kind of load or some specific environment, which triggers it. Lets wait
I am only seeing this on one machine now. I think you must not have cherry-picked the patch I mentioned as several bugs have disappeared. For some reason abrtd is NOT picking this crash up. dovecot: auth: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x3cbea) [0x7fb10b8c9bea] -> /usr/lib64/dovecot/libdovecot.so.0(+0x3cc2e) [0x7fb10b8c9c2e] -> /usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7fb10b8a3dd7] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x2ea9) [0x7fb10a403ea9] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x3c32) [0x7fb10a404c32] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(db_ldap_request+0xa1) [0x7fb10a405491] -> dovecot/auth(auth_request_lookup_user+0x59) [0x40fab9] -> dovecot/auth(auth_request_userdb_callback+0x4a) [0x40f8ba] -> dovecot/auth() [0x41df8f] -> dovecot/auth(auth_request_lookup_user+0x59) [0x40fab9] -> dovecot/auth() [0x40c562] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x48) [0x7fb10b8d56e8] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7fb10b8d6777] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7fb10b8d5678] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7fb10b8c3993] -> dovecot/auth(main+0x2a9) [0x409d89] -> /lib64/libc.so.6(__libc_start_main+0xed) [0x7fb10acc143d] -> dovecot/auth() [0x409e99] Versions: dovecot-2.0.12-2.fc15.bz697325.1.x86_64 dovecot-pigeonhole-2.0.12-2.fc15.bz697325.1.x86_64 dovecot-antispam-plugin-20101012-1_2.0.12_2.fc15.bz697325.1.x86_64
(In reply to comment #23) > I am only seeing this on one machine now. I think you must not have > cherry-picked the patch I mentioned as several bugs have disappeared. It's just coincidence. I've checked it a few times, but it really uses correct patch. > For some reason abrtd is NOT picking this crash up. does abrt complain about anything in /var/log/messages log? are you able to get core file and backtrace manually ?
I was just meaning that you must have included other patches as well. No, no core file. The only backtrace I have is the one from /var/log/maillog included in comment #23. Again, it is only on one machine. The others have stopped crashing. I missed this originally, thank you for asking. May 19 04:01:02 abrtd: Directory 'ccpp-2011-05-19-04:01:02-11368' creation detected May 19 04:01:02 abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2011-05-19-04:01:02-11368 (res:2), deleting This is at the same time as the backtrace above. The interesting thing is I am seeing the abrtd message on the other machine. The crash is just happening less frequently on it. (And at very different times.) The third machine isn't seeing either. (Sorry that some of the later stuff in this message contradicts the former. I am testing things as I write.) May 16 04:01:01 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used) May 17 04:01:02 dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used) Interesting. For the first time I have realized these asserts match the crashes. I have posted the assert to the mail thread on the support list as well.
(In reply to comment #25) > I was just meaning that you must have included other patches as well. Definitely not, I've added only one patch and it's the patch that adds only assert, so any improvement is just coincidence. Really :) > No, no core file. The only backtrace I have is the one from /var/log/maillog > included in comment #23. Again, it is only on one machine. The others have > stopped crashing. > > I missed this originally, thank you for asking. > > May 19 04:01:02 abrtd: Directory 'ccpp-2011-05-19-04:01:02-11368' creation > detected > May 19 04:01:02 abrtd: Corrupted or bad dump > /var/spool/abrt/ccpp-2011-05-19-04:01:02-11368 (res:2), deleting most probably it's because scratch packages are not gpg signed and abrt processes signed packages only by default (unless you specify OpenGPGCheck=no in /etc/abrt/abrt.conf Anyway, Timo has released new version (2.0.13) which contains this patch too (and this time even some unrelated bug fixes). This version should be available within 24 hours in updates-testing (yum update --enablerepo=updates-testing dovecot). So you can wait for this updated version which will be gpg signed.
2.0.13 from testing does NOT solve the problem. abrt is still refusing to pick up the backtrace.
what is the message abrt reports? > /var/spool/abrt/ccpp-2011-05-19-04:01:02-11368 (res:2), deleting "res: N" is important part
I no longer get any messages in logs /var/log/messages or /var/log/maillog for abrt. Which is most strange. I believe Timo just fixed a bug, which is not this one, but may help get cleaner options. Would it be possible to get this fix in a signed package, so that abrtd might get a cleaner backtrace? http://hg.dovecot.org/dovecot-2.0/rev/0e1254dcf86b Thank you.
That is strange, because abrt should at least report that it deleted the crash. > Would it be possible to get this fix in a signed package, so > that abrtd might get a cleaner backtrace? Well, it'd be possible to create regular update, push it to updates so all users will download and install that update just when update system signs it, but it won't be too practical. If you want to make abrt happy, just edit /etc/abrt/abrt.conf, set OpenGPGCheck = no and restart abrt (service abrtd restart) > http://hg.dovecot.org/dovecot-2.0/rev/0e1254dcf86b Anyway, are you sure this (doveadm -A: Crashfix for doveadm server when using commands that print nothing) is the commit you need?
By nothing from abrt, I mean, it isn't even picking things up, there are no messages about it being invalid or anything. Just no abrt messages of any kind anywhere. I am now getting this bug on all three servers. Here is the log and backtrace. (How do I decode this thing? I have tried addr2line and it doesn't give anything!) Jun 5 01:20:07 PP dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used) Jun 5 01:20:07 PP dovecot: auth: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x3cbba) [0x7f956d494bba] -> /usr/lib64/dovecot/libdovecot.so.0(+0x3cbfe) [0x7f956d494bfe] -> /usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7f956d46edd7] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x2ea9) [0x7f956bfcfea9] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x3c32) [0x7f956bfd0c32] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(db_ldap_request+0xa1) [0x7f956bfd1491] -> dovecot/auth(auth_request_lookup_user+0x59) [0x40fab9] -> dovecot/auth() [0x40c562] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x48) [0x7f956d4a06b8] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f956d4a1747] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f956d4a0648] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f956d48e963] -> dovecot/auth(main+0x2a9) [0x409d89] -> /lib64/libc.so.6(__libc_start_main+0xed) [0x7f956c88c43d] -> dovecot/auth() [0x409e99] This is the same problem I have shown elsewhere, but I do not know how to debug it. Apparently whatever calls line 189 in array.h in the backtrace chain is the problem, or at least where to start. As for that fix, yes, part of what causes the backtrace above happens at 04:01-04:02 each morning when I do the doveadm expunge stuff (5 of them) on my mail servers. So, yes, this patch would fix at least some problems for me. This being one example: Jun 6 04:01:02 TS dovecot: auth: Panic: file ../../src/lib/array.h: line 189 (array_idx_i): assertion failed: (idx * array->element_size < array->buffer->used) Jun 6 04:01:02 TS dovecot: auth: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x3cbba) [0x7f886cababba] -> /usr/lib64/dovecot/libdovecot.so.0(+0x3cbfe) [0x7f886cababfe] -> /usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7f886ca94dd7] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x2ea9) [0x7f886b5f5ea9] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(+0x3c32) [0x7f886b5f6c32] -> /usr/lib64/dovecot/auth/libauthdb_ldap.so(db_ldap_request+0xa1) [0x7f886b5f7491] -> dovecot/auth(auth_request_lookup_user+0x59) [0x40fab9] -> dovecot/auth() [0x40c562] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x48) [0x7f886cac66b8] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f886cac7747] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f886cac6648] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f886cab4963] -> dovecot/auth(main+0x2a9) [0x409d89] -> /lib64/libc.so.6(__libc_start_main+0xed) [0x7f886beb243d] -> dovecot/auth() [0x409e99] The cron job is set for 01 04, so I have no choice but to believe that bug is at least part of my problem. Plus, shouldn't crash fixes be treated as security updates?
For some reason even with abrtd running core dumps were disabled. I don't know how or why that happened as it was working before. I have done this: echo 'DAEMON_COREFILE_LIMIT="unlimited"' >> /etc/sysconfig/dovecot If that is accurate, I should have a crash dump soon.
Ok, I think I have figured out the cause, but no the problem in the code. There were three machines TS, PP, ST. TS and ST had identical configurations with auth_username_format = %Lu. PP had it = %u. PP started crashing when I changed it to %Lu. As mentioned the kerberos/ldap setup is Samba4 here. PP had administrator and guest all lower case, ST had administrator but Guest. TS had Administrator and Guest. When I changed all auth_username_format=%u and ST Guest to guest (in userPrincipalName, I didn't mess with anything else), ST and PP stopped having any problems (at least for the last 6 hours even with things like the doveadm calls below which would always have at least one crash). I just changed TS to be administrator and guest and did the doveadm and some other things. No crashes. So, why is this the case when it will deliver the email but will cause crashes some times? I do not know why. The doveadm: doveadm expunge -A mailbox TRASH savedbefore 30d doveadm expunge -A mailbox SPAM savedbefore 30d doveadm expunge -A mailbox SPAM savedbefore 2d SEEN doveadm expunge -A mailbox Dangerous savedbefore 1w doveadm expunge -A mailbox Infected savedbefore 1w
Did you check that abrt is running? I've tried it on my system (kill dovecot with SIGSEGV) and abrt caught that crash without any problem. I've just built test package - dovecot repository snapshot. It contains all changes up to c0734f08b3f3 (including that doveadm fix) which Timo said should help. Packages can be found here: http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3115803/ It's still not signed package, so you need OpenGPGCheck = no in abrt.conf. If abrt still does not work for you, send me output of rpm -qa 'abrt*' check systemctl status abrtd.servce and if you have any *.rpmnew file in /etc/abrt
Apparently in F15, you have to have more abrt stuff set to start. I am now getting a backtrace that is complete. Problem is, it seems to be identical. I will upload it shortly. I still have one of the three crashing. The only difference I can find now is that this machine also sets mail, so it can match on userPrincipalName and mail. It shouldn't (at the moment, guaranteed not to) ever match on both.
Ok, all three have these packages installed. Hopefully this does indeed fix the problem. It is only one of two I have ever had with dovecot. (The other is that you seem to get disconnected, at least when using Thunderbird, if you delete too many messages at once while holding down the delete key. Not a problem, it doesn't seem to crash and Thunderbird just reconnects. This happens more with kerberos auth than any other method I have done.)
These packages solved the problem. Thank you.
ok, lets wait if everything is ok and if you don't see any problem, I'll prepare official dovecot update on Wednesday
Not only does this patch solve the problem, but it also greatly minimizes the delete problem mentioned above (in fact, instead of seeing it several times a day, I only have seen it once since installing this patched version).
Once this gets rolled out I think you can close this. I have still had NO problems.
I am closing this as it has been fixed for some time.