849538 – Lookup and login for new users fails when /var is full

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 849538 - Lookup and login for new users fails when /var is full

Summary: Lookup and login for new users fails when /var is full

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	7.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	SSSD Maintainers
QA Contact:	Kaushik Banerjee
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1113520
TreeView+	depends on / blocked

Reported:	2012-08-20 07:02 UTC by Sigbjorn Lie
Modified:	2020-08-20 08:14 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-10 05:20:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	SSSD sssd issues 2254	0	None	closed	Gracefully handle -ENOSPC	2020-11-25 11:03:19 UTC

Description Sigbjorn Lie 2012-08-20 07:02:33 UTC

Description of problem:
Lookup and login for new users fails when /var is full

Version-Release number of selected component (if applicable):
1.8.0-32


Steps to Reproduce:
Full up /var. Attempt to log in with a not cached user, or run "id not-cached-username" in an already logged in shell.
User cannot be looked up and an user not found message is displayed.
  
Expected results:
I should still be able to log in when sssd in online, even /var is full.

Additional info:
I am using IPA as backend.

Comment 2 Jakub Hrozek 2012-08-20 10:12:27 UTC

Upstream ticket:
https://fedorahosted.org/sssd/ticket/1212

Comment 3 Jakub Hrozek 2012-08-20 10:14:00 UTC

This was already discussed upstream a while ago and got deferred -- see https://fedorahosted.org/sssd/ticket/1212#comment:1

Comment 4 Sigbjorn Lie 2012-08-20 10:37:55 UTC

I see deferring this issue as a bad desicion. There is a focus to disable the root accounts in our organization, and other organizations. In the case of a full /var would then disable admins from logging on to the server and fixing the full disk issue.

If anything, this should be treatet as a high priority security issue as an attacker can fill up /var to prevent the administrators to log in.

Putting /var/lib/sss on a seperate partition is an ugly hack. Show me how many people actually do this?

I do not expect the cached logins to work (offline logins) when /var is full. Only when sssd is online the logins should work regardsless of a full /var.

Please reconsider the status of this bug.

Thank you.

Comment 5 Stephen Gallagher 2012-08-20 11:44:28 UTC

(In reply to comment #4)
> I see deferring this issue as a bad desicion. There is a focus to disable
> the root accounts in our organization, and other organizations. In the case
> of a full /var would then disable admins from logging on to the server and
> fixing the full disk issue.
> 

Disabling root completely and relying on network auth exclusively is a security issue. Anything that manages to remove your network access will result in you being unable to resolve the issues. Root should always be present on the system so that single-user mode is at least available.

> If anything, this should be treatet as a high priority security issue as an
> attacker can fill up /var to prevent the administrators to log in.
> 
> Putting /var/lib/sss on a seperate partition is an ugly hack. Show me how
> many people actually do this?
> 

Not many people put /var/lib/sss itself on a separate partition. A *great* many people store /var/log on a separate partition (often even a network share), which is the number one reason why /var would fill up.

> I do not expect the cached logins to work (offline logins) when /var is
> full. Only when sssd is online the logins should work regardsless of a full
> /var.
> 

The issue is nowhere near as simple as you think it is. The architecture of the SSSD requires the use of the cache during authentication. It would be a substantial redesign to accommodate what is a genuine edge-case.

> Please reconsider the status of this bug.
> 

We recognize that it is an issue, and we'll be taking your request into account. However, given the invasive nature of the necessary changes compared to the low likelihood of the partition filling up (if /var/log is properly isolated), it's likely that we will continue to defer this in favor of spending our effort on higher-priority issues and enhancements.

Patches from the community to address this issue would be welcomed warmly and we'd be happy to assist anyone who wishes to take this on.

Comment 6 Sigbjorn Lie 2012-08-23 11:03:50 UTC

Yes the root account will always be available, however several of our customers are having a strong password set for root accounts and this password is locked away where only a very few people have access to them. The rest of the admin users have to use their networked accounts to access the server. A particular company I worked for this did this where the people with direct root access was only available in a different country with a different timezone.

Other companies I have worked for have had the root password locked away in a safe.

If these practices are the best practices can be discussed, but that's a seperate discussion. Having the networked access disabled just due to a full disk is unacceptable in these situations. And to my knowledge, more and more companies are moving towards having direct root access "removed", only accessing root privileges throgh sudo or similar tools.

What these companies have in common is that they all run Red Hat EL. Having SSSD failing at full filesystems might prevent their adoption of SSSD.

You assume that it's /var/log filling up the disk. I would say it's more of a common practice to seperate /var from /. I've seen a great many applications fill up /var, for instance mysql in it's default data path /var/lib/mysql. I have also seen cfengine creating large log files in it's default directory /var/cfengine, large files uploaded to a web directory in /var/www, and the list just keeps going on.

I do not know the internals of SSSD so I do not know the amount of work required to implement this. But as far as I understand SSSD will always attempt to use the cache, and if it fails either due to an expired record or non existing record it will continue to the network server.

If the issue lies with the cache engine: What if the db files sssd uses we're allocated larger on disk than their actual usage? I'm not speaking of much, but some extra MB would not be a big issue. By doing this the space would already be allocated when needed in case of a full filesystem.

Comment 10 Jakub Hrozek 2013-01-02 20:04:30 UTC

One thing we could do would be to force a cleanup when the back end encounters the ENOSPC error. That would be a smallish investment, but in many cases, I'm not sure how much it would help with the current cache cleanup design.

We might change the cleanup task a bit to have an "emergency mode" and i.e. pick and remove a user entry without cached password (currently we pick a user entry w/o a password that hasn't been refreshed in a long time) or maybe group with ghost users only, set the group as expired and trim its ghost members. This would be quite simple, but usually the reason /var got full is another program flooding the logs or as Sigbjorn said, growing its database, such as mysql. This solution would only be temporary if another process is filling up /var.

W might also reserve a couple of entries for the ENOSPC situation and use them as some kind of cycle buffer, for example. I'm not sure how practical it would be as there's no guarantee on how much space we might need for a login of a new user when he comes in (the user might be a member of several large groups, for example).

Returning entries without caching them at all is simply not possible in the SSSD with its current design.

Comment 14 Sigbjorn Lie 2013-12-12 12:27:23 UTC

What is the status of this request? Has any of the suggestions Jakub had made it into the code of SSSD?

Comment 15 Jakub Hrozek 2013-12-12 12:34:09 UTC

No, sorry, as stated in one of the comments above, any kind of forced cleanup would only be temporary and /var would usually be filled up really quickly again.

The best solution would be to put the database (or the /var/logs, /var/cache to tackle the problem the other way around) to a separate partition.

Comment 16 Sigbjorn Lie 2013-12-20 07:17:25 UTC

I suggest putting /var/lib/sss/ on a seperate partition instead. It's the only way to ensure that there will always be enough space for sssd to allow logins.

Comment 17 Jakub Hrozek 2013-12-20 09:07:49 UTC

Right, that's a good solution.

Comment 18 Sigbjorn Lie 2013-12-20 09:55:51 UTC

What would a reasonable size be? I suppose it depends somewhat of the size of organization. We are using about 24MB in /var/lib/sss today. How much can we expect this to grow?

Comment 19 Jakub Hrozek 2013-12-20 11:59:06 UTC

Depends on how many users do you expect to have cached..

The largest environment I have readily access to has about 7000 users and about the same number of groups. After enumerating the whole directory, the cache is about 115MB in size. 

With IPA you'd store more data, primarily HBAC rules are cached as well, but anyway, I'd be suprised if you needed more than 1GB, even with a large directory.

Comment 20 Martin Kosek 2015-04-24 11:23:34 UTC

Thank you taking your time and submitting this request for Red Hat Enterprise Linux. Unfortunately, this bug was not given a priority and was deferred both in the upstream project and in Red Hat Enterprise Linux.

Given that we are unable to fulfill this request in following Red Hat Enterprise Linux releases, I am closing the Bugzilla as DEFERRED. To request that Red Hat re-considers the decision, please re-open the Bugzilla via appropriate support channels and provide additional business and/or technical details about its importance to you.

Note that you can still track this request or even contribute patches in the referred upstream Trac ticket.

Comment 21 Abhinay Reddy Peddireddy 2016-08-09 22:56:20 UTC

Re-opening this bug as per the request from the same customer "Amway Japan" with the account number : 532521. 

Customer comment : 

We want to reopen this bug and request a solution for it.

We want to expedite this to the next level.

We have a valid support contract and last time we needed to wait a full year to get a solution !!
https://access.redhat.com/support/cases/#/case/01360311

We need a solution for this bug as we use the OS in a cloud environment. 
Logging in via ssh is the only method we are able to do.
The cloud environment does not allow for direct root login for obvious reasons.

Comment 22 Lukas Slebodnik 2016-08-10 05:20:17 UTC

sssd should not replace the monitoring of machine and solution is already described in comment 16.

Comment 23 Jakub Hrozek 2016-08-10 07:41:41 UTC

To expand a bit on Lukas' comment - there is virtually nothing we *can* do, it's not like we just decline to do the work. The cache is an architectural component of SSSD and there is no way around it. Before entries are returned from the cache, they *must* be saved to the cache first. And if the disk is full and we can't save data to the cache, there is just no way we can process the request.

You can see some basic diagrams here:
https://jhrozek.wordpress.com/2015/03/11/anatomy-of-sssd-user-lookup/

Comment 24 Gideon 2020-08-20 08:14:32 UTC

I realize that this issue is old and closed- but it is the most relevant place I found to post my issue:
putting /var/lib/sss/ on a separate partition does not work. 
btw- it is very disappointing that this bug keeps getting closed- Sigbjorn is absolutely right that companies are shifting to limit the use of local users, root in particular. Jakub Hrozek offered a few good suggestions for resolving the issue with the cache that sound easy to implement (reserving a few entries for cycling credentials sound like the best one)

Note You need to log in before you can comment on or make changes to this bug.