759920 – Installs to SSD should have better defaults

Bug 759920 - Installs to SSD should have better defaults

Summary: Installs to SSD should have better defaults

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	25
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Vratislav Podzimek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1021266 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-12-04 21:29 UTC by Kevin Cameron
Modified:	2017-12-12 10:29 UTC (History)
CC List:	25 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-12-12 10:29:14 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kevin Cameron 2011-12-04 21:29:20 UTC

Description of problem:

I installed Fedora 16 XFCE to my new computer a couple of days ago. This machine has an SSD and a conventional hard disk. I was surprised that anaconda didn't produce an optimal configuration for the SSD installation. Especially, one usually wants to limit the number of writes to the SSD in order to avoid excessive wear. When available, one usually wants to enable TRIM on the SSD to avoid performance loss due to fragmentation. Also, it is said that using simple fifo scheduling gives the best performance on an SSD. Perhaps there may be other optimizations that could be applied as well?

Steps to Reproduce:
1. Install F16 to a machine with an SSD and hard drive.
2. Use the default storage configuration offered by anaconda.
3. Find root dirs living on SSD:
setenv ddd `sudo find / -xdev -maxdepth 2 -wholename '/*/*' | cut -f 2 -d/ | uniq | sed -e 's/$.*$/\/\1/'`
4. Find most recently modified files and dirs:
sudo find $ddd -xdev -printf '%T+ %p\n' | sort | tail -50
5. Check for trim functionality. E.g.: https://sites.google.com/site/lightrush/random-1/checkiftrimonext4isenabledandworking
6. Check scheduler on SSD drive:
cat /sys/block/sda/queue/scheduler

Actual results:
The "/" and "/boot" partitions may or may not end up on the SSD.
"/tmp" and "/var" are simple directories under "/".
Many files and directories show up with recent modification times.
Data remains after file deletion
SSD drive scheduler is set like hard disk. E.g.: "noop deadline [cfq]"

Expected results:
The system "/" and "/boot" partitions should preferentially go on the SSD.
"/tmp" and "/var" should preferentially go on a hard disk.
Few, if any, SSD files and directories have changed since the last configuration or software change (achieved by adding the "noatime" flag in fstab.
Data zeroed after SSD file deletion if the drive supports TRIM.
SSD scheduler is configured to "noop". E.g.: "[noop] deadline cfq"

Comment 1 Chris Lumens 2012-01-06 15:13:39 UTC

anaconda gives you all the tools to decide where to put individual partitions, and we don't even make /tmp and /var separate partitions by default so we're certainly not going to start doing that only in certain circumstances.

For other settings - our position has long been that if some settings make for sensible defaults, they should be set in the filesystem/kernel/etc. instead of anaconda.  Doing it that way means all tools on the system can take advantage of those defaults, not just when you do your initial partitioning.  Reassigning for someone else to address those settings.

Comment 2 Josh Boyer 2012-01-06 15:44:29 UTC

The kernel can't currently auto-magically assign per-device I/O schedulers to my knowledge.  A manual change by the user or system boot scripts is still required.

As far as TRIM is concerned, I have a vague recollection of "support" being widely varied even among drives that claim to support it.  Always enabling it for all drives claiming support might not be great.  I've CC'd Ric who can hopefully comment on that.

Comment 3 Lukáš Czerner 2012-01-09 10:31:24 UTC

IO scheduler:
I believe that kernel should have enough information about the drive in order to set the appropriate scheduler. For example the elevator is turned off when the device is not rotational. However I really do not know which scheduler is better for which device as I have never seen any reliable test results.

Discard:
As for now, there are several ways to use TRIM (two of which make sense actually). The first one is to mount the file system with -o discard (supported by many file systems), which will send trim when range of block is released from the file system (file unlinked). But that may have some negative performance impact see my testing results - http://people.redhat.com/lczerner/discard/ext4_discard.html and it may ever brick your device (personal experience). For exactly this reason it is NOT on by default.

The other way to do it is batched discard. It is an ioctl which tells the file system to go and reclaim free space (send discard command for free ranges), so it does not need to run all the time, but you may want to run it every now and then. The frequency really depends on you workload, but running it per week, or even once in two days or so could not hurt IMHO. It is supported by some file systems (ext3, ext4, btrfs, xfs ...). See 'man fstrim' from util-linux-ng.

Unfortunately we do not have infrastructure in place to set up cron job with fstrim automatically, that is something we should investigate. Maybe it would be nice if anaconda could add a weekly cron job, when discard support is detected on the device ?

Comment 4 Vivek Goyal 2012-03-01 18:28:26 UTC

(In reply to comment #3)
> IO scheduler:
> I believe that kernel should have enough information about the drive in order
> to set the appropriate scheduler. For example the elevator is turned off when
> the device is not rotational. However I really do not know which scheduler is
> better for which device as I have never seen any reliable test results.

Turning off elevator might not always be good as SSDs can also benefit from merging.

CFQ recognizes the "rotational" flag and cuts down on idling if so be the case.

Are there any numbers which show that noop is much better then CFQ on an SSD. Apart from throughput, we also need to consider the case of READ latencies in presence of heavy buffered WRITE happening.

In RHEL6, we have put a udev rule where if drive is not SATA, set slice_idle=0 and increase queue depth. (People were looking for better out of box experience for servers, possibly with faster storage).

If things can be conclusive one way or other, I think a udev rule is not unreasonable for choosing IO scheduler.

Comment 5 Peter Oliver 2012-12-30 16:05:56 UTC

(In reply to comment #3)
> Unfortunately we do not have infrastructure in place to set up cron job with
> fstrim automatically, that is something we should investigate. Maybe it
> would be nice if anaconda could add a weekly cron job, when discard support
> is detected on the device ?

Would it be reasonable to simply have an entry in /etc/cron.weekly that would fstrim all mounted filesystems of types known to support fstrim?  You wouldn't need anaconda support for this.  If a filesystem was hosted on rotating media you would get the following error, which would have to be suppressed:

    # fstrim /tmp
    fstrim: /tmp: FITRIM ioctl failed: Operation not supported
    # echo $?
    1

Comment 6 Mattia Verga 2013-02-08 16:12:50 UTC

+1 for that.

I know that it can be done manually, but would be nice if the automatic partitioning avoids to create swap partition and put optimized flags in fstab if it detects that the installation disk is a ssd.

Comment 7 Fedora End Of Life 2013-04-03 15:38:05 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 8 Justin M. Forbes 2013-04-05 19:41:52 UTC

Is this still an issue with the 3.9 kernels in F19?

Comment 9 Jeff Moyer 2013-04-05 21:13:51 UTC

(In reply to comment #8)
> Is this still an issue with the 3.9 kernels in F19?

Yes.

Comment 10 Kevin Cameron 2013-04-06 01:49:39 UTC

I just upgraded to Fedora 19 XFCE (from 17) a couple of weeks ago. I decided to manually configure the install into the old partitions. Therefore, I don't know where Anaconda would have chosen to install partitions if left to its own devices. I assume that it hasn't changed and would have thrown the heavily-written /tmp and /var directories onto the SSD.

As far as I can tell, none of the other SSD optimizations I mentioned have been done either. I'm not complaining. I know these issues don't have easy solutions. The fstab file still showed default mount args for the SSD partitions and the scheduler was still on CFQ.

I didn't realize that trim is controversial. It's a serious problem if drives are being bricked by it. How common is that? Surely it only happens on older drives which were released before trim was being commonly used. When do we stop worrying about those? Performance is similar. Surely modern SSDs implement trim efficiently? I looked at Lukas' performance link. Unless I'm missing something, it's not a good test methodology. To really test trim performance improvements, one would need to do a bunch of writes and deletes to fill an entire SSD (not just a partition) with fragmented blocks. The smaller test in the linked page is heavily influenced by the added overhead since it never gets to a state where the advantages of using trim could be seen. I don't have the data to support it but it seems to me that today's SSDs should have trim enabled by default for best performance on average.

After I opened this bug 16 months ago, I became aware of what might be the biggest problem: file access time stamps. Hundreds of system files and directories are accessed every time the system boots. Dozens of libraries are read every time the user starts an application. Because access time stamps are enabled by default, each file and directory that's read results in a write to its respective directory. No doubt, the kernel's buffer cache helps reduce the number of these writes that actually go to the SSD but it must still be a lot. I shudder to think how much wear and tear must have been accumulated by the thousands of linux systems with SSDs around the world. I'm sure many users ignore or are are unaware of this issue. This includes my during the last two weeks since I forgot to fix it after my system update.

I should have had another step in my repo instructions:
4.5 Check access times for SSD files and dirs:
sudo find $ddd -xdev -printf '%A+ %p\n' | sort | tail -50

Are access time stamps used very much? I don't remember ever needing to use one. It seems like the "noatime" mount flag should be set by default for SSDs and perhaps for all disks.

How about having Anaconda provide the option of adding common mount flags to fstab while configuring partitions?

Comment 11 Kevin Cameron 2013-04-06 01:59:07 UTC

I should have proof-read better.  Hopefully you all get the gist of it.

Comment 12 David Tonhofer 2013-05-12 17:17:20 UTC

I have been looking into this after installation of Fedora 18 on an SSD. The filesystem was ext4 and the mount options in /etc/fstab were "default".

Some remarks:

- /tmp is now a ramdisk by default (tmpfs)

- I manually added "noatime" to the ext4 options to avoid re-writing access times (an alternative would be to use 'relatime', see http://linux.koolsolutions.com/2009/01/30/installing-linux-on-usb-part-4-noatime-and-relatime-mount-options/). There is also "nodiratime" which, however, is implied by "noatime". The more you know...

- I did not set the "discard" option to issue TRIMs to the SSD ... yet. The kernel doc at https://www.kernel.org/doc/Documentation/filesystems/ext4.txt says: "discard: ... This is useful for SSD devices and sparse/thinly-provisioned LUNs, but it is off by default until sufficient testing has been done."

Comment 13 Josh Boyer 2013-05-13 18:43:32 UTC

I'm moving this to the distribution component.  The suggested fixes are udev rules and cron files.  Doing it in the kernel some how would need to be upstream first anyway.

Comment 14 Josh Boyer 2013-12-09 18:15:21 UTC

*** Bug 1021266 has been marked as a duplicate of this bug. ***

Comment 15 Jeff Moyer 2013-12-11 19:25:00 UTC

(In reply to Kevin Cameron from comment #10)

Most SSDs today will be able to sustain multiple complete drive writes per day for the advertised lifetime of the disk.  I highly doubt /var and /tmp will kill your SSD.

fstrim is the preferred method for using trim for a couple of reasons.  First, you don't stall the I/O path (the currently implemented ATA trim command is non-queueable).  Second, if you use a batched discard, you can send larger blocks down to the drive, which can be very beneficial for certain classes of device.
We should definitely have a nightly or weekly cron job to run fstrim.

The relatime mount option is already the default, so I wouldn't worry about atime updates either.

Comment 16 Peter Oliver 2013-12-19 22:14:02 UTC

You're probably way ahead of me here, but I see that a forthcoming version of fstrim will have an "--all" option to trim all filesystems that are able.  See https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=36c370cbf1481aa8724dff8b7b7fec4a8ba9930b.

Comment 19 Andrea 2014-04-10 07:37:00 UTC

I am in no way an expert of any of this, but I can bring my experience, for what is worth.
I have been following the suggestions of Arch Linux wiki (https://wiki.archlinux.org/index.php/Solid_State_Drives#Tips_for_Maximizing_SSD_Performance) for more than a year now, basically adding noatime and discard options to fstab. I have done this with two SSDs on a workstation running scientific linux and a thinkpad X laptop which has seen various distros. Despite very heavy use, in particular on the workstation, I have encountered no issue yet.

Comment 20 dhardy 2014-09-19 08:12:46 UTC

Can I repeat the suggestion to do fstrim via a standard cron job? I have a weekly cron job as follows to do this, though it's not ideal:

1) it should probably only be run when the computer is idle, since it pauses all disk operations
2) it should automatically determine mount points
3) possibly the task should be run more frequently or be resheduled if not run due to the computer being on battery mode or in use

https://gist.github.com/anonymous/b93d9c73ea149f0fc35f

Comment 21 Vratislav Podzimek 2014-10-01 07:08:58 UTC

(In reply to dhardy from comment #20)
> Can I repeat the suggestion to do fstrim via a standard cron job? I have a
> weekly cron job as follows to do this, though it's not ideal:
> 
> 1) it should probably only be run when the computer is idle, since it pauses
> all disk operations
> 2) it should automatically determine mount points
fstrim does that for you -- have a look at the '-a' option.

Comment 22 Vratislav Podzimek 2014-10-01 07:10:01 UTC

I think the implementation should be done via a systemd service (probably a timer) packaged in something like 'ssd-utils'.

Comment 23 Lukáš Czerner 2014-10-01 11:12:19 UTC

(In reply to Vratislav Podzimek from comment #22)
> I think the implementation should be done via a systemd service (probably a
> timer) packaged in something like 'ssd-utils'.

Does it really have to be unnecessarily complicated ? Why would cron job not be enough ?

-Lukas

Comment 24 Vratislav Podzimek 2014-10-01 12:09:21 UTC

(In reply to Lukáš Czerner from comment #23)
> (In reply to Vratislav Podzimek from comment #22)
> > I think the implementation should be done via a systemd service (probably a
> > timer) packaged in something like 'ssd-utils'.
> 
> Does it really have to be unnecessarily complicated ? Why would cron job not
> be enough ?
Is there a way to run cron job when the system is in idle state? Or is there a way to reschedule a cron job if system is not in the idle state?

Comment 25 Lukáš Czerner 2014-10-01 12:31:15 UTC

Of course,

check the state before you run fstrim, or even better use nice and ionice.

-Lukas

Comment 26 Chris Murphy 2015-04-02 07:06:45 UTC

Since Fedora 21 there is a systemd fstrim timer, disabled by default, scheduled to run one a week. This is more appropriate than discard mount option for the overwhelming majority of SSDs (non-queued trim).

Comment 27 Jan Kurik 2015-07-15 15:12:26 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 28 Vaclav Tunka 2016-11-14 10:03:05 UTC

Any progress on having fstrim.timer enabled by default when fedora detects SSD drive is installed on the system?

Comment 29 Vratislav Podzimek 2016-11-16 20:36:31 UTC

(In reply to Vaclav Tunka from comment #28)
> Any progress on having fstrim.timer enabled by default when fedora detects
> SSD drive is installed on the system?

Unfortunately, the only progress in Fedora 25 is that blivet-2.1 now automatically adds an internal 'ssd' tag to its Device objects. It would be super easy to add <5 lines of code to Anaconda to automatically enable the fstrim.service if the system is installed to such a disk, but this request somehow got off our radar. :(

Comment 30 Vratislav Podzimek 2016-11-16 20:38:39 UTC

I'm assigning this to myself and to the anaconda component (which now needs a patch). If anybody has any troubles with that, feel free to revert the changes and justify.

Comment 31 Chris Murphy 2016-11-16 22:19:39 UTC

I'd say it's a catch-22 to do this by default. There's a recent thread on the XFS list about this, and the bottom line is that the manufacturers still don't have enough consistency or reliability.

http://www.spinics.net/lists/linux-xfs/msg01885.html
http://www.spinics.net/lists/linux-xfs/msg02128.html
http://www.spinics.net/lists/linux-xfs/msg02135.html

For Fedora Workstation, if a user has an afflicted drive and a typical casual usage, once a week trim causing hangs may not be a big deal. But in other cases where the workload involves lots of deletes, it might be annoying. Not discard and fstrim.timer are doing the same thing, it's just one happens once a week and one happens on demand. So the problem is more noticeable with discard, and could seem really spurious when it's once a week and even harder to track down the problem should someone hit it.

My suggestion is to not enable it by default. Unfortunately for now the user needs to know to enable it manually and then watch out for problems. It's unlikely the reverse is true, to enable it and expect the user to deduce that their once a week system hang is due to the fstrim timer.

Another factor is in the most rare cases, it could result in corruption or data loss due to firmware bugs.

Comment 32 Vratislav Podzimek 2016-11-18 07:49:16 UTC

I also remember a recent discussion about 'discard' not being enabled by default on LUKS (which is also a bit "thin ice" territory). Maybe there should be some general Fedora policy about this? Is this worth a FESCO ticket/discussion?

Comment 33 Chris Murphy 2016-11-18 18:45:10 UTC

cryptsetup open --allow-discards is not the default, and FITRIM fails without it on those block devices. Until the manufacturers get it together, I don't see how there can be any policy other than to not use discard or fstrim by default. The user has to test their use case against their hardware to make a determination.

Comment 34 Fedora End Of Life 2017-11-16 19:17:17 UTC

This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 35 Fedora End Of Life 2017-12-12 10:29:14 UTC

Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.

anaconda-maint-list
andrea.cimatoribus
bioinfornatics
bughunt
bugzilla
dennis
gansalmon
g.kaviyarasu
itamar
jforbes
jmoyer
jonathan
kernel-maint
lczerner
madhu.chinakonda
mattia.verga
mavit
mkolman
pmrpla
rwheeler
vanmeeuwen+fedora
vgoyal
vpodzime
vponcova
vtunka