Bug 1184069

Summary: group names are not resolved for gid from sssd cache when using IPA backend
Product: Red Hat Enterprise Linux 6 Reporter: Shashikant <shashikant.mundlik>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED DUPLICATE QA Contact: Kaushik Banerjee <kbanerje>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.6CC: gawin, grajaiya, jgalipea, jhrozek, lslebodn, mkosek, mzidek, pbrezina, shashikant.mundlik
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-29 12:53:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sssd logs and ldb content for affected system hd008
none
ldb content on working system hd006 none

Description Shashikant 2015-01-20 14:25:41 UTC
Description of problem:

when using sssd with IPA backend, id command will inconsistently resolve gid to group names for some of groups for user. Removing cache db files and reloading sssd will display correct data for 1 to 2 days or even few hours and the problem repeates again.

root@server008 ~]# sss_cache -UG
Couldn't invalidate user c3013958 
Couldn't invalidate user c9110179
[root@server008 ~]# id -Gn c3001841
c3001841 itapp_eah_admin itapphueadmin itappcmadmin id: cannot find name for group ID 1019412599
1019412599 id: cannot find name for group ID 1019424177
1019424177 id: cannot find name for group ID 1019424180
1019424180
[root@server008 ~]# id -Gn c3001841
c3001841 itapp_eah_admin itapphueadmin itappcmadmin

"id c3001841" shows three groups without names.

[c3001841@server006 hadoop-logs]$ id c3001841 
uid=1019408268(c3001841) gid=1019408268(c3001841) groups=1019408268(c3001841),1019429545(itapp_eah_admin),1019424178(itapphueadmin),1019424179(itappcmadmin),1019412599,1019424177,1019424180 

Using id commands second time does not show these groups at all. 

[c3001841@server006 hadoop-logs]$ id c3001841 
uid=1019408268(c3001841) gid=1019408268(c3001841) groups=1019408268(c3001841),1019429545(itapp_eah_admin),1019424178(itapphueadmin),1019424179(itappcmadmin),1019412599,1019424177,1019424180 

[c3001841@server006 hadoop-logs]$ id c3001841 uid=1019408268(c3001841) gid=1019408268(c3001841) groups=1019408268(c3001841),1019429545(itapp_eah_admin),1019424178(itapphueadmin),1019424179(itappcmadmin) 

Clearing the cache and restarting sssd correct the problem. 

[root@server006 ~]# rm /var/lib/sss/db/cache_unix.example.com.ldb rm: remove regular file `/var/lib/sss/db/cache_unix.example.com.ldb'? y 
[root@server006 ~]# rm /var/lib/sss/db/ccache_UNIX.EXAMPLE.COM rm: remove regular file `/var/lib/sss/db/ccache_UNIX.EXAMPLE.COM'? y 

[root@server006 ~]# service sssd restart 
Stopping sssd: [ OK ] 
Starting sssd: [ OK ] 

[root@server006 ~]# id c3001841 
uid=1019408268(c3001841) gid=1019408268(c3001841) groups=1019408268(c3001841),1019412599(itsrvrhadmin),1019424179(itappcmadmin),1019429545(itapp_eah_admin),1019424177(itapphue),1019424178(itapphueadmin),1019424180(itappcm)

Version-Release number of selected component (if applicable):

[root@server008 ~]# rpm -qa |grep sssd
python-sssdconfig-1.11.6-30.el6_6.3.noarch
sssd-krb5-common-1.11.6-30.el6_6.3.x86_64
sssd-ldap-1.11.6-30.el6_6.3.x86_64
sssd-client-1.11.6-30.el6_6.3.x86_64
sssd-common-1.11.6-30.el6_6.3.x86_64
sssd-common-pac-1.11.6-30.el6_6.3.x86_64
sssd-ad-1.11.6-30.el6_6.3.x86_64
sssd-krb5-1.11.6-30.el6_6.3.x86_64
sssd-1.11.6-30.el6_6.3.x86_64
sssd-ipa-1.11.6-30.el6_6.3.x86_64
sssd-proxy-1.11.6-30.el6_6.3.x86_64


How reproducible:


Steps to Reproduce:
1. Install and configure ipa-client on RHEL 6.5/RHEL 6.6 machine with sssd-1.11.6-30.el6_6.3.x86_64 installed
2. Create users and multiple (6 in my case) groups in IPA,and add user to it
3. Do "id -g username" command on user from IPA, it will work correctly
4. After 15 -20 hrs or few days do "id -Gn username" command again and it will not show group names for all the groups. 
5. "id -Gn username" will give error for gids which cannot be translated to group names.
6. Do "rm /var/lib/sss/db/*.ldb"  and "rm /var/lib/sss/db/ccache*", restart sssd. 
7. You will see all group names correctly till next time, which could be few hours or 1-2 days.

Actual results:

[root@server008 ~]# id -Gn c3001841
c3001841 itapp_eah_admin itapphueadmin itappcmadmin id: cannot find name for group ID 1019412599
1019412599 id: cannot find name for group ID 1019424177
1019424177 id: cannot find name for group ID 1019424180


Expected results:

[root@server006 ~]# id c3001841 
uid=1019408268(c3001841) gid=1019408268(c3001841) groups=1019408268(c3001841),1019412599(itsrvrhadmin),1019424179(itappcmadmin),1019429545(itapp_eah_admin),1019424177(itapphue),1019424178(itapphueadmin),1019424180(itappcm)

Additional info:
This is very inconsistent so may not able to reproduce even after 24 hrs.

Comment 2 Jakub Hrozek 2015-01-20 15:10:34 UTC
Can you enable sssd debug_level in the nss and domain sections and send us the logs that capture the bug?

Also, when the bug hits you, can you do a dump of the ldb database?

yum -y install ldb-tools
ldbsearch -H /var/lib/sss/db/cache_$domain.ldb

Comment 3 Shashikant 2015-01-20 18:15:35 UTC
Created attachment 981957 [details]
sssd logs and ldb content for affected system hd008

Comment 4 Shashikant 2015-01-20 18:17:05 UTC
Created attachment 981958 [details]
ldb content on working system hd006

Comment 5 Shashikant 2015-01-20 18:39:42 UTC
Thanks Jakub for picking this early. I have update bug with sssd debug logs and ldb content from system where this issue is currently happening (hd008) . Also I have attached ldb content from working system (hd006) where all the group are present properly.

Here is what I seen in ldb dump.

These are three group ids which are not present in the cache, and its group names are not displayed with id command

GIDs: 1019412599,1019424177,1019424180

On affected system:
[root@server008 ~]# id p3001841
uid=1019408268(p3001841) gid=1019408268(p3001841) groups=1019408268(p3001841),1019429545(itapp_eah_admin),1019424178(itapphueadmin),1019424179(itappcmadmin),1019412599,1019424177,1019424180

[root@server008 ~]# ldbsearch -H /var/lib/sss/db/cache_unix.example.com.ldb |grep gidNumber
asq: Unable to register control with rootdse!
gidNumber: 1019424010
gidNumber: 1019424280
gidNumber: 0
gidNumber: 1019408268
gidNumber: 1019424179
gidNumber: 1019424280
gidNumber: 1019410644
gidNumber: 1019410644
gidNumber: 1019429545
gidNumber: 1019422062
gidNumber: 1019422062
gidNumber: 1019421136
gidNumber: 1019421136
gidNumber: 1019424178
gidNumber: 1019408268


On system where cache is clean:
[root@hlxp0server006 ~]# id p3001841
uid=1019408268(p3001841) gid=1019408268(p3001841) groups=1019408268(p3001841),1019412599(itsrvrhadmin),1019424179(itappcmadmin),1019429545(itapp_eah_admin),1019424177(itapphue),1019424178(itapphueadmin),1019424180(itappcm)
[root@server006 ~]# ldbsearch -H /var/lib/sss/db/cache_unix.example.com.ldb|grep gidNumber
asq: Unable to register control with rootdse!
gidNumber: 1019412599
gidNumber: 1019421281
gidNumber: 1019425825
gidNumber: 1019425210
gidNumber: 1019414733
gidNumber: 1019424180
gidNumber: 1019408268
gidNumber: 1019424179
gidNumber: 1019429545
gidNumber: 1019424178
gidNumber: 1019426225
gidNumber: 1019426648
gidNumber: 1019408268
gidNumber: 1019421665
gidNumber: 1019424177
gidNumber: 1019425826

This is what I see in sss_nss.log on affected system for GID 1019412599 which is having issue

/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:22:41 2015) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): Entering request [0x418850:2:1019412599.com]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:22:41 2015) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0080): No matching domain found for [1019412599]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:22:42 2015) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1019412599.com]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:22:42 2015) [sssd[nss]] [sss_ncache_set_str] (0x0400): Adding [NCE/GID/1019412599] to negative cache
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:22:42 2015) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0080): No matching domain found for [1019412599]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:22:42 2015) [sssd[nss]] [sss_dp_req_destructor] (0x0400): Deleting request: [0x418850:2:1019412599.com]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:28:43 2015) [sssd[nss]] [nss_cmd_getbyid] (0x0400): Running command [34] with id [1019412599].
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:28:43 2015) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/GID/1019412599]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:28:43 2015) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1019412599.com]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:28:43 2015) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing request for [0x418850:2:1019412599.com]
/var/log/sssd/sssd_nss.log:(Tue Jan 20 18:28:43 2015) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): Creating request for [unix.example.com][4098][1][idnumber=1019412599]

Please note I have sanitized domain and server names in the comments.

Comment 6 Jakub Hrozek 2015-01-29 12:53:14 UTC
Thank you very much for the info, we'll track the bug together with https://bugzilla.redhat.com/show_bug.cgi?id=1184458

*** This bug has been marked as a duplicate of bug 1184458 ***

Comment 7 Marek Gawinski 2015-03-09 13:18:07 UTC
(In reply to Jakub Hrozek from comment #6)
> Thank you very much for the info, we'll track the bug together with
> https://bugzilla.redhat.com/show_bug.cgi?id=1184458
> 
> *** This bug has been marked as a duplicate of bug 1184458 ***

Hi,

we have this same problem when we get users/groups from AD.
It is posible to track this bug?
Now i have "You are not authorized to access bug #1184458...."

Comment 8 Jakub Hrozek 2015-03-09 13:25:36 UTC
(In reply to Marek Gawinski from comment #7)
> (In reply to Jakub Hrozek from comment #6)
> > Thank you very much for the info, we'll track the bug together with
> > https://bugzilla.redhat.com/show_bug.cgi?id=1184458
> > 
> > *** This bug has been marked as a duplicate of bug 1184458 ***
> 
> Hi,
> 
> we have this same problem when we get users/groups from AD.
> It is posible to track this bug?
> Now i have "You are not authorized to access bug #1184458...."

I'm pretty sure that's a different bug, the one we track here is IPA-specific. You should open a new one with debugging information. Can you also try the 1.12 series?

Comment 9 Marek Gawinski 2015-03-09 13:42:39 UTC
Effects are this same as in this bug.
No we dont try 1.12.4 version because we use ubuntu 12.04 and have some problem to create packages for this version. Now we use 1.11.5. I will try to open debug this and open  new bug for our case.

Comment 10 Lukas Slebodnik 2015-03-09 13:46:07 UTC
(In reply to Marek Gawinski from comment #9)
> Effects are this same as in this bug.
> No we dont try 1.12.4 version because we use ubuntu 12.04 and have some
> problem to create packages for this version. Now we use 1.11.5. I will try
> to open debug this and open  new bug for our case.

There are known issues in sssd-1.11.. I would recommend to upgrade to the sssd-1.11.7

Comment 11 Marek Gawinski 2015-03-09 14:14:10 UTC
(In reply to Lukas Slebodnik from comment #10)
> (In reply to Marek Gawinski from comment #9)
> > Effects are this same as in this bug.
> > No we dont try 1.12.4 version because we use ubuntu 12.04 and have some
> > problem to create packages for this version. Now we use 1.11.5. I will try
> > to open debug this and open  new bug for our case.
> 
> There are known issues in sssd-1.11.. I would recommend to upgrade to the
> sssd-1.11.7

I will try on this version.