Red Hat Bugzilla – Bug 849538
Lookup and login for new users fails when /var is full
Last modified: 2016-08-10 03:41:41 EDT
Description of problem:
Lookup and login for new users fails when /var is full
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Full up /var. Attempt to log in with a not cached user, or run "id not-cached-username" in an already logged in shell.
User cannot be looked up and an user not found message is displayed.
I should still be able to log in when sssd in online, even /var is full.
I am using IPA as backend.
This was already discussed upstream a while ago and got deferred -- see https://fedorahosted.org/sssd/ticket/1212#comment:1
I see deferring this issue as a bad desicion. There is a focus to disable the root accounts in our organization, and other organizations. In the case of a full /var would then disable admins from logging on to the server and fixing the full disk issue.
If anything, this should be treatet as a high priority security issue as an attacker can fill up /var to prevent the administrators to log in.
Putting /var/lib/sss on a seperate partition is an ugly hack. Show me how many people actually do this?
I do not expect the cached logins to work (offline logins) when /var is full. Only when sssd is online the logins should work regardsless of a full /var.
Please reconsider the status of this bug.
(In reply to comment #4)
> I see deferring this issue as a bad desicion. There is a focus to disable
> the root accounts in our organization, and other organizations. In the case
> of a full /var would then disable admins from logging on to the server and
> fixing the full disk issue.
Disabling root completely and relying on network auth exclusively is a security issue. Anything that manages to remove your network access will result in you being unable to resolve the issues. Root should always be present on the system so that single-user mode is at least available.
> If anything, this should be treatet as a high priority security issue as an
> attacker can fill up /var to prevent the administrators to log in.
> Putting /var/lib/sss on a seperate partition is an ugly hack. Show me how
> many people actually do this?
Not many people put /var/lib/sss itself on a separate partition. A *great* many people store /var/log on a separate partition (often even a network share), which is the number one reason why /var would fill up.
> I do not expect the cached logins to work (offline logins) when /var is
> full. Only when sssd is online the logins should work regardsless of a full
The issue is nowhere near as simple as you think it is. The architecture of the SSSD requires the use of the cache during authentication. It would be a substantial redesign to accommodate what is a genuine edge-case.
> Please reconsider the status of this bug.
We recognize that it is an issue, and we'll be taking your request into account. However, given the invasive nature of the necessary changes compared to the low likelihood of the partition filling up (if /var/log is properly isolated), it's likely that we will continue to defer this in favor of spending our effort on higher-priority issues and enhancements.
Patches from the community to address this issue would be welcomed warmly and we'd be happy to assist anyone who wishes to take this on.
Yes the root account will always be available, however several of our customers are having a strong password set for root accounts and this password is locked away where only a very few people have access to them. The rest of the admin users have to use their networked accounts to access the server. A particular company I worked for this did this where the people with direct root access was only available in a different country with a different timezone.
Other companies I have worked for have had the root password locked away in a safe.
If these practices are the best practices can be discussed, but that's a seperate discussion. Having the networked access disabled just due to a full disk is unacceptable in these situations. And to my knowledge, more and more companies are moving towards having direct root access "removed", only accessing root privileges throgh sudo or similar tools.
What these companies have in common is that they all run Red Hat EL. Having SSSD failing at full filesystems might prevent their adoption of SSSD.
You assume that it's /var/log filling up the disk. I would say it's more of a common practice to seperate /var from /. I've seen a great many applications fill up /var, for instance mysql in it's default data path /var/lib/mysql. I have also seen cfengine creating large log files in it's default directory /var/cfengine, large files uploaded to a web directory in /var/www, and the list just keeps going on.
I do not know the internals of SSSD so I do not know the amount of work required to implement this. But as far as I understand SSSD will always attempt to use the cache, and if it fails either due to an expired record or non existing record it will continue to the network server.
If the issue lies with the cache engine: What if the db files sssd uses we're allocated larger on disk than their actual usage? I'm not speaking of much, but some extra MB would not be a big issue. By doing this the space would already be allocated when needed in case of a full filesystem.
One thing we could do would be to force a cleanup when the back end encounters the ENOSPC error. That would be a smallish investment, but in many cases, I'm not sure how much it would help with the current cache cleanup design.
We might change the cleanup task a bit to have an "emergency mode" and i.e. pick and remove a user entry without cached password (currently we pick a user entry w/o a password that hasn't been refreshed in a long time) or maybe group with ghost users only, set the group as expired and trim its ghost members. This would be quite simple, but usually the reason /var got full is another program flooding the logs or as Sigbjorn said, growing its database, such as mysql. This solution would only be temporary if another process is filling up /var.
W might also reserve a couple of entries for the ENOSPC situation and use them as some kind of cycle buffer, for example. I'm not sure how practical it would be as there's no guarantee on how much space we might need for a login of a new user when he comes in (the user might be a member of several large groups, for example).
Returning entries without caching them at all is simply not possible in the SSSD with its current design.
What is the status of this request? Has any of the suggestions Jakub had made it into the code of SSSD?
No, sorry, as stated in one of the comments above, any kind of forced cleanup would only be temporary and /var would usually be filled up really quickly again.
The best solution would be to put the database (or the /var/logs, /var/cache to tackle the problem the other way around) to a separate partition.
I suggest putting /var/lib/sss/ on a seperate partition instead. It's the only way to ensure that there will always be enough space for sssd to allow logins.
Right, that's a good solution.
What would a reasonable size be? I suppose it depends somewhat of the size of organization. We are using about 24MB in /var/lib/sss today. How much can we expect this to grow?
Depends on how many users do you expect to have cached..
The largest environment I have readily access to has about 7000 users and about the same number of groups. After enumerating the whole directory, the cache is about 115MB in size.
With IPA you'd store more data, primarily HBAC rules are cached as well, but anyway, I'd be suprised if you needed more than 1GB, even with a large directory.
Thank you taking your time and submitting this request for Red Hat Enterprise Linux. Unfortunately, this bug was not given a priority and was deferred both in the upstream project and in Red Hat Enterprise Linux.
Given that we are unable to fulfill this request in following Red Hat Enterprise Linux releases, I am closing the Bugzilla as DEFERRED. To request that Red Hat re-considers the decision, please re-open the Bugzilla via appropriate support channels and provide additional business and/or technical details about its importance to you.
Note that you can still track this request or even contribute patches in the referred upstream Trac ticket.
Re-opening this bug as per the request from the same customer "Amway Japan" with the account number : 532521.
Customer comment :
We want to reopen this bug and request a solution for it.
We want to expedite this to the next level.
We have a valid support contract and last time we needed to wait a full year to get a solution !!
We need a solution for this bug as we use the OS in a cloud environment.
Logging in via ssh is the only method we are able to do.
The cloud environment does not allow for direct root login for obvious reasons.
sssd should not replace the monitoring of machine and solution is already described in comment 16.
To expand a bit on Lukas' comment - there is virtually nothing we *can* do, it's not like we just decline to do the work. The cache is an architectural component of SSSD and there is no way around it. Before entries are returned from the cache, they *must* be saved to the cache first. And if the disk is full and we can't save data to the cache, there is just no way we can process the request.
You can see some basic diagrams here: