This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2217921 - nscd aborts with failed assert in prune_cache [NEEDINFO]
Summary: nscd aborts with failed assert in prune_cache
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: glibc
Version: 8.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: glibc team
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-27 13:36 UTC by yanf
Modified: 2023-08-11 14:43 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-11 14:43:46 UTC
Type: Bug
Target Upstream Version:
Embargoed:
mijjapur: needinfo? (yanf)


Attachments (Terms of Use)
bt full of ABRT event (5.47 KB, text/plain)
2023-06-27 13:36 UTC, yanf
no flags Details
*actual* nscd backtrace (2.33 KB, text/plain)
2023-06-27 14:52 UTC, yanf
no flags Details
nscd backtrace with all symbols (3.28 KB, text/plain)
2023-06-27 15:08 UTC, yanf
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-1192 0 None Migrated None 2023-08-11 14:43:21 UTC
Red Hat Issue Tracker RHELPLAN-160950 0 None None None 2023-06-27 13:37:35 UTC

Description yanf 2023-06-27 13:36:57 UTC
Created attachment 1972848 [details]
bt full of ABRT event

Description of problem:

NSCD exits with ABRT when reading `passwd` cache. 

Version-Release number of selected component (if applicable):
glibc-2.28-189.1.el8.x86_64

How reproducible:
always

Steps to Reproduce:
1. start nscd
2. ABRT almost immediately

Actual results:
strace shows :

[pid 219045] write(2</dev/null>, "nscd: cache.c:426: prune_cache: Assertion `dh->usable' failed.\n", 63) = 63


Expected results:
runs without error

Additional info:

Debug output (actual IDs redacted for safety / confidentiality) :

Mon 26 Jun 2023 06:55:18 PM EDT - 452466: Reloading "<redacted>" in user database cache!
Mon 26 Jun 2023 06:55:18 PM EDT - 452466: Reloading "<redacted>" in user database cache!
Mon 26 Jun 2023 06:55:18 PM EDT - 452466: Reloading "<redacted>" in user database cache!
Mon 26 Jun 2023 06:55:18 PM EDT - 452466: Reloading "<redacted>" in user database cache!
nscd: cache.c:426: prune_cache: Assertion `dh->usable' failed.

Back trace :

#0  0x00007fd2d11336cc in __nscd_get_map_ref () from /lib64/libc.so.6
#1  0x00007fd2d112fa7a in nscd_getpw_r () from /lib64/libc.so.6
#2  0x00007fd2d112feac in __nscd_getpwuid_r () from /lib64/libc.so.6
#3  0x00007fd2d10c6dbf in getpwuid_r@@GLIBC_2.2.5 () from /lib64/libc.so.6
#4  0x00007fd2d17d0976 in pam_modutil_getpwuid () from /lib64/libpam.so.0
#5  0x00007fd2cdd8cb12 in pam_sm_authenticate () from /usr/lib64/security/pam_succeed_if.so
#6  0x00007fd2d17ca7b4 in _pam_dispatch () from /lib64/libpam.so.0
#7  0x00005630c43259a3 in cron_close_pam ()
#8  0x00005630c43251cf in do_command ()
#9  0x00005630c4324170 in job_runqueue ()
#10 0x00005630c432193c in main ()

bt full attached.

Comment 1 yanf 2023-06-27 13:43:54 UTC
Obviously, if I clear the cache, problem goes away. I have the problematic passwd cache file, but can't post it here for obvious reaons. Might be able to send it direct under our mutual NDA.

Comment 2 yanf 2023-06-27 14:52:25 UTC
Created attachment 1972857 [details]
*actual* nscd backtrace

The previous bt was a related one from crond, but I was able to repro the issue under gdb, which gave this backtrace.

Comment 3 yanf 2023-06-27 15:08:16 UTC
Created attachment 1972871 [details]
nscd backtrace with all symbols

After adding the missing debug symbols package

Comment 5 Carlos O'Donell 2023-06-30 13:31:59 UTC
If you are a Red Hat customer with an active subscription, please visit the Red Hat Customer Portal [1] for assistance with your issue.

[1] http://access.redhat.com/

Comment 6 Carlos Santos 2023-06-30 14:39:45 UTC
I'm providing the required link to the support ticket in the customer portal.

Comment 14 Florian Weimer 2023-07-25 12:21:08 UTC
I looked at this for some time and I'm still not sure what might be causing this. We need some sort of reproducer, or at least the corrupted mapping that triggers this.

This issue seems different from the known concurrency issues (which I think cannot happen on x86-64 due to its strong memory model). I wonder if it could be caused by inconsistent data coming back from LDAP and trigger expiration of cache entries that is not time-based, hence triggering an assert.

Comment 15 yanf 2023-07-28 19:36:59 UTC
@fweimer I uploaded the corrupt nscd passwd file to RH case number 03548682 aes-256-cbc encrypted.

You will need a password to decrypt it. Please reach out of band.

Comment 16 Murali Prudhvi Ijjapureddi 2023-08-02 11:58:58 UTC
@yanf 

Hello Yan!

I have updated the support ticket and updating the same information here for your reference -

Please update the support ticket with the requested information, and we will take this further.

Thanks! - Murali

====================================================
>>Hello Yan!

>>Thank you for updating the support ticket.

>>I see that you want to share the decryption password for the file that you have shared here on the support ticket as well as on the BugZilla ticket.

>>I understand that you want to share the password out of band over email. However, this is not a recommended process.

>>It is best to keep all the communication and information on the support portal for security reasons, and tracking purposes.

>>I had a word with Florian, engineer working on the bug ticket to get a better understanding of the progress we have had so far on the issue.

>>Let's work on this together for sharing the password on an alternate secure medium; for the engineer to access it and work on the issue.

>>I tried calling you on the number that we have on file for your contact - "2124780000". But, it looks like a dummy placeholder number, and I wasn't able to reach you.

>>Could you provide your contact number along with country code to reach you and discuss this further?

>>Awaiting your response.

>>Thank you!

>>Regards,
>>Murali Prudhvi.
====================================================

Comment 19 RHEL Program Management 2023-08-11 14:43:46 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues.


Note You need to log in before you can comment on or make changes to this bug.