Red Hat Bugzilla – Bug 1464294
Gnome-Shell deadly lockup with btrfs fle system if certain extensions are installed.
Last modified: 2018-02-01 05:06:43 EST
Description of problem:
I have been testing Fedora26 beta1-4 and also the nightlies. Installation is on physical hardware (ssd and spinning disks, not virtual system)
Ext4 developer recommended that SSD's not be formatted with ext4, but to use btrfs (RedHat recommendation). That recommendation is to extend SSD life (btrfs does COW, and avoids journalling)
BTRFS file system
In trying to setup the Gnome extension TASKBAR (by zpydr), The Gnome session (Wayland and xorg) options will lockup solidly, A restore from backup is the only option to recover. (killall -u locked_up_user does not restore the session.
EXT4 File System
THERE IS NO PROBLEM if the file system is ext4 or xfs for any of the GNOME extensions
Version-Release number of selected component (if applicable):
All current software as of June 22,2017
Setup /home on a small btrfs formatted partition.
install extensions Taskbar by zypydr
do alignment settings (if you can)
Steps to Reproduce:
Gnome will lockup and not be recoverable by rebooting or by forced logoff
Relogin of user after a lockup must recover session
Have worked multiple tests with extension's author to prove it was not his software.
This could be an implementation stopper or at least information in the installation guide to warn of the problem.
I just setup 3 different Fedora 26 candidates (all from July 2, 2017
My testing consisted of 3 Fedora 26Gnome installed on ext4 and 3 Fedora 26Gnome installed onto btrfs.
All my comments mentioned above were reconfirmed -- gnome and btrfs do not get along well.
BTRFS Problems wth Extensions
Generally, I test with gcc 7 (does not work), I test with the FF browser and I add extensions -- I first download the gnome "tweak tool" and subsequently begin to add the extensions I have used with Fedora 22,23,24, and 25. With Fedora 22-25, all worked, not gui lockups or other.
With Fedora26 candidate, some extensions will not install, some will lockup on landing within /home/user/.local/share/gnome-shell/extensions.
When the system locks up, there is no longer a keyboard light. I have to reboot.
When I examine the gear (distribution selection between gnome and gnome-org, the default setting is up in a third area "awesome". If I do not reset the login to gnome--a system lockup occurs.
Even with gnome and some reliable extensions lockups occur.
Ext4 problems with Extensions
Why btrfs? Theodore Ts'o in a publication suggested that btrfs was better for SSD's than ext4. Journalling will shorten the life of SSD's. Btrfs's COW cuts I/O to half-- what is wanted for a SSD.
Definitely a gnome-shell btrfs problem.
test with the attachment.
With Gnome 3.26 candidate and Fedora 27, the problem is less severe.
There is definitely an incompatibilty between Gnome-shell resident on a btrfs file system.
This problem occurs for the distributed extensions coming with F27 or the extension that I provided. You can use that extension for debugging Gnome.
The above extension which crashes on installation of Gnome 3.24 Fedora 26)
does not crash with the gnome 3.26 beta under Fedora 27 beta. Unless
estabishing execution parameters (configuration).
My observation is that the schema is ignored during configuration, causing out or bounds values and the crash.
Once installed the extension functions as designed. Gnome Shell is flawed.
Can crash activities configurator if the host file system is btrfs
A good candidate to test gnome-shell is
TaskBbar by Zpydr
This second extension installs cleanly, works fine with
ext4, xfs, lvm lvm-thin, f2fs
But not with btrfs.
The btrfs problem and Gnome has been present since F25
Works for me with
TaskBbar by Zpydr
Activities Configurator by nls1729
Once in a few days tests, attempting to log onto the system fails.
I ran as root to a virtual terminal. When the problem did occur, via
root, and the top command, it showed gnome-shell in a tight 99.5% cpu loop.
a kill of the process and a second logon succeeded.
However, until the next boot, logging in after logging out did not fail
Random fails only after a reboot and random means, not every occurrence.
Just providing this additional information in case someone else raises a bug report.
The problem has returned with the beta Fedora 27
How to demonstrate
Create an installation of F27 beta using btrfs
After all updates (as of Oct 25)
perform a gnome extension for;
install activities configurator. Did setting up parameters lock up the system (keyboard lost)?
install Gno-menu did setting up parameters lockup the system (keyboard lost)?
install TaskBar by zpydr did setting up parameters lock up the system
Ditto for OpenWeather by Jens Lody (its also distributed within the ISO)
I could go on, but just to advise you that on reboot, one cannot log into Fedora with Gnome or Gnome-xorg
Install with /home as an ext4 or xfs based file system. You will not have any issues with installation of the four listed or any other
I have researched this further and IF ~/.config/dconf/user is on a btrfs file system, gnome will lockup during setup of parameters or randomly thereafter
If ~/.conf/dconf/user is placed on an ext4 file system, while the rest of /home is on btrfs (I used a ln -s to relocate dconf directory) Gnome will work properly.
It took me days of testing to dermine that somehow the ~/.config/dconf/user was being corrupted if it was btrfs based.
This is a gnome bug or a btrfs bug. The four extensions listed above work with other than a btrfs host.
It may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1469129
Created attachment 1347877 [details]
Systemd journal from test
When I tried to reproduce the issue, I switched TaskBar to a non-existent bottom panel, and the system stalled and woke up after unknown time. My system is a Sony Vaio E series notebook, Intel i5, 4Core, 8 GB RAM and AMD/ATI grafics, no SSD.
The systemd journal was flooded with messages (5000 lines of totally 7700). Of course, I can't say if it is btrfs related. I could not identify any error message referring to SELinux or I/O and the filesystem.
Please search for tty2 to spot the incident. I tried then to login as root.
Gnome Support tried to pass the problem off as an extension bug. I responded to them that the extension runs flawlessly on all but full btrfs systems.
Other extensions experienced similar problems.
Last June I reported the problem, as you can see from above.
I noticed in the past 5 days, updates to glibc and other gnome objects. Since then the problems with gnome-shell have diminished grately
I am able to use the same unmodified taskbar on Fedora 27 beta with btrfs for / and btrfs for /home.
In a few days, I will wipe a partition and install a fresh Fedora 27 gnome system using both btrfs for / and for /home. If I encounter no issues, I will mark this bug as "works for me". Until then....
This is a pernicious bug. This is what I do to prove it is a gnome-shell bug
a) Create a brand new Fedora 27 beta. The final F27 is out in 6 days.
b) I set the default to btrfs as the default file system. That puts
/ and /home as btrfs files ... /home is a subvolume of btrfs root.
c) I log to my user leslie.
I install TaskBar@by zpydr (Other extensions will cause problems, but this one has many additional settings)
During the setup of Taskbar (eg, spacing between icons, what to display, etc. Gnome shell will lockup the keyboard and mouse.
I try to log to Fed27. Have a grey screen.
Gnome shell is looping at 105% (quad core cpu). I enter root terminal mode and do a killall -u leslie # leslie is my user.
I relog and I can get in.
This repetition of first login being rejected is a problem, as is the modification in c) above.
Testing /home an ext4
d) On the same drive, I have a ext4 partition /home2 I tar /home and untar it onto /home2 and swap the /home and /home2 entries within /etc/fstab
e)Now, fedora 27 works from cold boot, and any setup issues have disappeared.
In plain words, /home with gnome 3.26.1 shell cannot safely reside on a btrfs system.
MORE DETAILED TESTS.
reverting to /home on btrfs and using a symbolic link (ln -s )
move /home/leslie/.config/dconf to an ext4 file system
Crashing and boot problem stops occurring. Problem solved in same way as step d) above.
I have been chasing this problem and walking through problematic extensions since the beginning of 3.24.
With the patches/fixes/updates to Gnome 3.26.1, as of November 10,2017,
The bulk of the problems have almost disappeared.
The extensions which fail are not in static operation but when doing a setting
for example, turning on/off an option
adding some spacing or width for an icon
changing the size of a field.
One can do one or two actions and then POW, the crash lockup occurs, the keyboard is LOST, the Mouse pointer moves on the screen, but it's buttons are not functioning. Other times there is no keyboard or and black screen.
After several reboots, where some settings are done for each boot, the extension is stable and works from thereon in.
And to repeat one more time.
Transferring /home to a /ext4 or xfs system solves the "extension setting up" problem.
And by the way, the problems are present with the gnome extensions furnished with Fedora 27 via dnf .
A second experiment I did is to relocate file to ext4
~/.config/dconf/user using ln -s and the problem disappears.
Please refer to 1469129 for similar problem and discussion
The weeks between June 22 to November 10, saw a very large number of glib and gnome-shell updates.
I am now able, with one or two reboots, (should not have to reboot), install any extension onto a "btrfs only" system.
I discovered that if I put one critical file onto an ext4 system, I can do a clean installation of all Gnome extensions without having to reboot. And on logging in after a fresh power on, the problem is solved. That file is
I relocated that file to an ext4 file system using a soft link even though all Fedora partitions save one, on my SSD are btrfs formatted.
My steps were simple because, on Fedora, /boot is created onto an ext4 file system. This is what I did to solve the problem.
1) mkdir -p /boot/leslie/.config/dconf
Note again, /boot is an ext4 file system thanks to grub2 requirements.
2) cp -ra /home/leslie/.config/dconf/user /boot/leslie/.config/dconf
3) mv /home/leslie/.config/dconf /home/leslie/.config/dconf.bak #just in case.
4) ln -s /boot/leslie/.config/dconf home/leslie/.config/dconf
5) chown -R leslie:leslie /boot/.config
6) Log into my Leslie logon. Success.
Does btrfs operate differently from ext4 regarding writing/reading and return codes, or is the problem still unsolved?
4) is missing a / for home/leslie should be /home/leslie
Sorry to say, your workaround looks like a poor solution to me.
Referring to my post https://forums.fedoraforum.org/showthread.php?315877-The-beta-is-almost-the-final-Just-act-as-it-is-the-final&p=1797585#post1797585
I confirm now, openSUSE sets xfs as the default filesystem for /home.
If you choose btrfs for /home, it sets the extended attribute No_COW for all akonadi or baloo database files under ~/.local/share/
So I think, that it is not a btrfs issue, but an issue of bad btrfs implementation. ~/.config/dconf should definitely have the NoCOW attribute (chattr +C ...).
Created attachment 1354697 [details]
Repeated test of comment#9 with No_COW attributes set
I repeated my test of comment#9 with No_COW attribute set on ~/.config/dconf.
I confirm now: It works, the system keeps responsive all the time.
Thank you for comments 9,15 and16.
I must read about modifying btrfs for the No_COW.
Should the COW issue be a user/Linux requirement or should
Gnome Shell handle it?
I will set No_COW for /home?
That would of course include all /home subdirectories.
Does it also mean no journalling for /home?
No, only ~/.config/dconf. Otherwise you would disregard your initial statement.
No_COW disables COW (Copy-On-Write). I am not shure, but journalling will/must work instead.
As follows - user is not logged in:
# cd /home/user_name/.config
# mv dconf dconf.old
# mkdir dconf
# chattr +C dconf
# chown user_name.group_name dconf
Then login and Gnome will create a new file .config/dconf/user as in the attachment of comment #16. chattr on a existing file won't work.
The modification should ideally be made upstream, by the Gnome project. I learnt now that my backup media should also be formatted with btrfs. Otherwise the flag might be lost during backup.
Furthermore, I am no Gnome user. I don't know what other data (zeitgeist?) should be treated the same way. Also I would not place a virtual machine or database files on btrfs.
I am waiting for a decision. Is it a user setting (I don't think so as the .config is a hidden folder) or is it a gnome-shell problem that they must resolve when the file system is btrfs.
I chose the move /home to an xfs partItion, bypassing the problem.
I've been trying to find the cause of what I think is this issue.
My root filesystem is BTRFS, and I use gnome-shell (3.26.1, and the problem has been happening to me since at least 3.24)
The underlying problem is that gsettings (backed by the dconf backend) emits "changed" events for settings keys in various circumstances where the values of the relevant settings have not changed.
Meanwhile, lots of JS code in gnome-shell and extensions assumes that this will not be the case. In some circumstances there is infinite recursion, for example when a handler for the "changed" event causes the a "changed" event to be dispatched for the same key (even if the value of that key never actually changes).
In one case, the spurious event occurs when JS code sets a setting to a value that it already holds. There is a patch here to address this problem: https://bugzilla.gnome.org/show_bug.cgi?id=790640.
In another case, dconf emits a "changed" event for all keys in the database because it reaches a state where it has lost track of which keys have changed since a client subscribed to the "changed" events: https://bugzilla.gnome.org/show_bug.cgi?id=790640. I suspect that this case may be the one that is related to BTRFS. In this case, I think an infinite loop is possible regardless of whether JS code refrains from doing anything when "changed" handlers are called without a change.
For me, all these problems are difficult to reproduce (probably related to race conditions or something), but most of the problems I have found happen much more frequently when running gnome-shell in valgrind with the Taskbar extension enabled
Some examples of resulting bugs: https://bugzilla.gnome.org/show_bug.cgi?id=782688, https://bugzilla.gnome.org/show_bug.cgi?id=786186, https://bugzilla.gnome.org/show_bug.cgi?id=788110