Bug 1464294 - Gnome-Shell deadly lockup with btrfs fle system if certain extensions are installed.
Summary: Gnome-Shell deadly lockup with btrfs fle system if certain extensions are i...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: gnome-shell
Version: 27
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Owen Taylor
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-23 01:48 UTC by Leslie Satenstein
Modified: 2018-11-30 19:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-30 19:32:47 UTC


Attachments (Terms of Use)
Systemd journal from test (1.16 MB, text/plain)
2017-11-04 19:37 UTC, Ulrich Beckmann
no flags Details
Repeated test of comment#9 with No_COW attributes set (1.29 KB, text/plain)
2017-11-18 13:01 UTC, Ulrich Beckmann
no flags Details

Description Leslie Satenstein 2017-06-23 01:48:59 UTC
Description of problem:

BACKGROUND
I have been testing Fedora26 beta1-4 and also the nightlies. Installation is on physical hardware (ssd and spinning disks, not virtual system)

Ext4 developer recommended that SSD's not be formatted with ext4, but to use btrfs (RedHat recommendation). That recommendation is to extend SSD life (btrfs does COW, and avoids journalling)

ENDBACKGROUND


BTRFS file system
In trying to setup the Gnome extension TASKBAR (by zpydr), The Gnome session (Wayland and xorg) options will lockup solidly, A restore from backup is the only option to recover.  (killall -u locked_up_user does not restore the session.

EXT4 File System
THERE IS NO PROBLEM if the file system is ext4 or xfs for any of the GNOME extensions
 



Version-Release number of selected component (if applicable):

All current software as of June 22,2017

How reproducible:

Setup /home on a small btrfs formatted partition.  
Install gnome-tweak-tool
install extensions Taskbar by zypydr
do alignment settings (if you can)

Steps to Reproduce:
1.
2.
3.

Actual results:

Gnome will lockup and not be recoverable by rebooting or by forced logoff

Expected results:

Relogin of user after a lockup must recover session 


Additional info:

Have worked multiple tests with extension's author to prove it was not his software.

This could be an implementation stopper or at least information in the installation guide to warn of the problem.

Comment 1 Leslie Satenstein 2017-07-05 23:59:45 UTC
I just setup 3 different Fedora 26 candidates (all from July 2, 2017
and using

https://kojipkgs.fedoraproject.org/compose/26/latest-Fedora-26/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-26-1.4.iso

My testing consisted of 3 Fedora 26Gnome installed on ext4 and 3 Fedora 26Gnome installed onto btrfs.

All my comments mentioned above were reconfirmed -- gnome and btrfs do not get along well.

BTRFS Problems wth Extensions
Generally, I test with gcc 7 (does not work), I test with the FF browser and I add extensions -- I first download the gnome "tweak tool" and subsequently begin to add the extensions I have used with Fedora 22,23,24, and 25. With Fedora 22-25, all worked, not gui lockups or other.

With Fedora26 candidate, some extensions will not install, some will lockup on landing within /home/user/.local/share/gnome-shell/extensions.  
When the system locks up, there is no longer a keyboard light. I have to reboot.
When I examine the gear (distribution selection between gnome and gnome-org, the default setting is up in a third area "awesome".  If I do not reset the login to gnome--a system lockup occurs.
Even with gnome and some reliable extensions lockups occur.


Ext4 problems with Extensions
None

Why btrfs?   Theodore Ts'o in a publication suggested that btrfs was better for SSD's than ext4. Journalling will shorten the life of SSD's. Btrfs's COW cuts I/O to half-- what is wanted for a SSD.

Comment 2 Leslie Satenstein 2017-08-01 05:30:27 UTC
Definitely a gnome-shell btrfs problem.

test with the attachment.

Comment 3 Leslie Satenstein 2017-09-03 02:20:43 UTC
With Gnome 3.26 candidate and Fedora 27, the problem is less severe. 

There is definitely an incompatibilty between Gnome-shell resident on a btrfs file system. 

This problem occurs for the distributed extensions coming with F27 or the extension that I provided. You can use that extension for debugging  Gnome.

The above extension which crashes on installation of Gnome 3.24 Fedora 26) 
does not crash with the gnome 3.26 beta   under Fedora 27 beta.  Unless 
estabishing execution parameters (configuration). 

My observation is that the schema is ignored during configuration, causing out or bounds values and the crash.

Once installed the extension functions as designed.  Gnome Shell is flawed.

Comment 4 Leslie Satenstein 2017-09-22 14:15:52 UTC
Can crash activities configurator if the host file system is btrfs

A good candidate to test gnome-shell is 

TaskBbar by Zpydr  

This second extension installs cleanly, works fine with  
ext4, xfs, lvm lvm-thin, f2fs 

But not with btrfs.  

The btrfs problem and Gnome has been present since F25

Comment 5 Leslie Satenstein 2017-09-26 09:04:51 UTC
Works for me with

Fedora-Workstation-netinst-x86_64-27-20170925.n.0.iso

TaskBbar by Zpydr 
Activities Configurator by nls1729

Comment 6 Leslie Satenstein 2017-09-28 23:47:48 UTC
Once in a few days tests, attempting to log onto the system fails.
I ran as root to a virtual terminal.  When the problem did occur, via
root, and the top command, it showed gnome-shell in a tight 99.5% cpu loop.
a kill of the process and a  second logon succeeded.

However, until the next boot, logging in after logging out did not fail

Random fails only after a reboot and random means, not every occurrence.

Just providing this additional information in case someone else raises a bug report.

Comment 7 Leslie Satenstein 2017-10-25 21:14:39 UTC
The problem has returned with the beta Fedora 27
How to demonstrate
Create an installation of F27 beta using btrfs 
After all updates (as of Oct 25)
perform a gnome extension for;
install activities configurator.    Did setting up parameters lock up the system (keyboard lost)? 

If not
install  Gno-menu                    did setting up parameters lockup the system (keyboard lost)?

if not

install TaskBar by zpydr            did setting up parameters lock up the system
(keyboard lost)


Ditto for OpenWeather by Jens Lody (its also distributed within the ISO)

I could go on, but just to advise you that on reboot, one cannot log into Fedora with Gnome or Gnome-xorg

TEST2
Install with /home as an ext4 or xfs based file system. You will not have any issues with installation of the four listed or any other

I have researched this further and IF ~/.config/dconf/user is on a btrfs file system, gnome will lockup during setup of parameters or randomly thereafter

If ~/.conf/dconf/user is placed on an ext4 file system, while the rest of /home is on btrfs (I used a ln -s to relocate dconf directory) Gnome will work properly.

It took me days of testing to dermine that somehow the ~/.config/dconf/user was being corrupted if it was btrfs based.
 
This is a gnome bug or a btrfs bug. The four extensions listed above work with other than a btrfs host.

  
              .

Comment 8 陳鐸元 2017-11-03 06:43:07 UTC
It may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1469129

Comment 9 Ulrich Beckmann 2017-11-04 19:37:54 UTC
Created attachment 1347877 [details]
Systemd journal from test

When I tried to reproduce the issue, I switched TaskBar to a non-existent bottom panel, and the system stalled and woke up after unknown time. My system is a Sony Vaio E series notebook, Intel i5, 4Core, 8 GB RAM and AMD/ATI grafics, no SSD.

The systemd journal was flooded with messages (5000 lines of totally 7700). Of course, I can't say if it is btrfs related. I could not identify any error message referring to SELinux or  I/O and the filesystem.

Please search for tty2 to spot the incident. I tried then to login as root.

Ulrich Beckmann

Comment 10 Leslie Satenstein 2017-11-06 03:41:41 UTC
Hi Ulrich

Gnome Support tried to pass the problem off as an extension bug.  I responded to them that the extension runs flawlessly on all but full btrfs systems.

Other extensions experienced similar problems.  

Last June I reported the problem, as you can see from above.

I noticed in the past 5 days, updates to glibc and other gnome objects. Since then the problems with gnome-shell have diminished grately
I am able to use the same unmodified taskbar on Fedora 27 beta  with btrfs for / and btrfs for /home.

In a few days, I will wipe a partition and install a fresh Fedora 27 gnome system using both btrfs for / and for /home.  If I encounter no issues, I will mark this bug as "works for me". Until then....

Comment 11 Leslie Satenstein 2017-11-08 19:33:45 UTC
Hi owen 

This is a pernicious bug.  This is what I do to prove it is a gnome-shell bug

a) Create a brand new Fedora 27 beta. The final F27 is out in 6 days.

b) I set the default to btrfs as the default file system. That puts
/ and /home as btrfs files ... /home is a subvolume of btrfs root.

c) I log to my user leslie.
I install TaskBar@by zpydr (Other extensions will cause problems, but this one has many additional settings)
During the setup of Taskbar (eg, spacing between icons, what to display, etc. Gnome shell will lockup the keyboard and mouse.

I reboot.
I try to log to Fed27. Have a grey screen.
Gnome shell is looping at 105% (quad core cpu). I enter root terminal mode and do a killall -u leslie   # leslie is my user.
I relog and I can get in.

This repetition of first login being rejected is a problem, as is the modification in c) above.
Testing /home an ext4

d) On the same drive, I have a ext4 partition  /home2  I tar /home and untar it onto /home2 and swap the /home and /home2 entries within /etc/fstab 

e)Now, fedora 27 works from cold boot, and any setup issues have disappeared.

In plain words,  /home with gnome 3.26.1 shell cannot safely reside on a btrfs system.

e)
MORE DETAILED TESTS.

reverting to /home on btrfs  and using a symbolic link (ln -s )
move /home/leslie/.config/dconf   to an ext4 file system 

Crashing and boot problem stops occurring. Problem solved in same way as step d) above.
I have been chasing this problem and walking through problematic extensions since the beginning of 3.24.

Comment 12 Leslie Satenstein 2017-11-11 04:31:31 UTC
With the patches/fixes/updates to Gnome 3.26.1, as of November 10,2017,
The bulk of the problems have almost disappeared.

The extensions which fail are not in static operation but when doing a setting

for example,   turning on/off an option
               adding some spacing or width for an icon
               changing the size of a field.

One can do one or two actions and then POW, the crash lockup occurs, the keyboard is LOST, the Mouse pointer moves on the screen, but it's buttons are not functioning. Other times there is no keyboard or and black screen.

After several reboots, where some settings are done for each boot, the extension is stable and works from thereon in. 

And to repeat one more time.

Transferring /home to a /ext4 or xfs system solves the "extension setting up" problem.      

And by the way, the problems are present with the gnome extensions furnished with Fedora 27 via dnf .

A second experiment I did is to relocate file to ext4 
~/.config/dconf/user      using ln -s    and the problem disappears.

Please refer to 1469129 for similar problem and discussion

Comment 13 Leslie Satenstein 2017-11-15 15:14:06 UTC
The weeks between June 22 to November 10, saw a very large number of glib and gnome-shell updates.
I am now able, with one or two reboots, (should not have to reboot), install any extension onto a "btrfs only" system.

I discovered that if I put one critical file onto an ext4 system, I can do a clean installation of all Gnome extensions without having to reboot. And on logging in after a fresh power on, the problem is solved. That file is

~/.config/dconf/user

I relocated that file to an ext4 file system using a soft link even though all Fedora partitions save one, on my SSD are btrfs formatted.   

My steps were simple because, on Fedora, /boot is created onto an ext4 file system.  This is what I did to solve the problem.

As root.
1) mkdir     -p  /boot/leslie/.config/dconf
Note again, /boot is an ext4 file system thanks to grub2 requirements.
2) cp -ra /home/leslie/.config/dconf/user  /boot/leslie/.config/dconf 
3)  mv  /home/leslie/.config/dconf  /home/leslie/.config/dconf.bak  #just in case.
4) ln -s /boot/leslie/.config/dconf     home/leslie/.config/dconf    
5) chown -R leslie:leslie /boot/.config
6) Log into my Leslie logon.   Success.

Does btrfs operate differently from ext4 regarding writing/reading and return codes, or is the problem still unsolved?

Comment 14 Leslie Satenstein 2017-11-15 15:41:06 UTC
4) is missing a / for home/leslie should be /home/leslie

Comment 15 Ulrich Beckmann 2017-11-17 15:58:17 UTC
Leslie,

Sorry to say, your workaround looks like a poor solution to me.

Referring to my post https://forums.fedoraforum.org/showthread.php?315877-The-beta-is-almost-the-final-Just-act-as-it-is-the-final&p=1797585#post1797585
I confirm now, openSUSE sets xfs as the default filesystem  for /home.
If you choose btrfs for /home, it sets the extended attribute No_COW for all akonadi or baloo database files under ~/.local/share/

So I think, that it is not a btrfs issue, but an issue of bad btrfs implementation. ~/.config/dconf should definitely have the NoCOW attribute (chattr +C ...).

Regards
Ulrich

Comment 16 Ulrich Beckmann 2017-11-18 13:01:33 UTC
Created attachment 1354697 [details]
Repeated test of comment#9 with No_COW attributes set

I repeated my test of comment#9 with No_COW attribute set on ~/.config/dconf.

I confirm now: It works, the system keeps responsive all the time.

Best regards,
Ulrich

Comment 17 Leslie Satenstein 2017-11-19 01:52:17 UTC
Hi Ulrich

Thank you for comments 9,15 and16. 
I must read about modifying btrfs for the No_COW. 

Should the COW issue be a user/Linux requirement or should
Gnome Shell handle it?  

I will set No_COW for /home?  
That would of course include all /home subdirectories.

Does it also mean no journalling for /home?

Comment 18 Ulrich Beckmann 2017-11-19 16:23:22 UTC
No, only ~/.config/dconf. Otherwise you would disregard your initial statement.
No_COW disables COW (Copy-On-Write). I am not shure, but journalling will/must work instead.

As follows - user is not logged in:

# cd /home/user_name/.config
# mv dconf dconf.old
# mkdir dconf
# chattr +C dconf
# chown user_name.group_name dconf

Then login and Gnome will create a new file .config/dconf/user as in the attachment of comment #16. chattr on a existing file won't work.

The modification should ideally be made upstream, by the Gnome project. I learnt now that my backup media should also be formatted with btrfs. Otherwise the flag might be lost during backup.

Furthermore, I am no Gnome user. I don't know what other data (zeitgeist?) should be treated the same way. Also I would not place a virtual machine or database files on btrfs. 

Ulrich

Comment 19 Leslie Satenstein 2017-11-24 22:25:24 UTC
I am waiting for a decision. Is it a user setting (I don't think so as the .config is a hidden folder) or is it a gnome-shell problem that they must resolve when the file system is btrfs.

I chose the move /home to an xfs partItion, bypassing the problem.

Comment 20 Daniel Playfair Cal 2017-11-27 02:10:23 UTC
I've been trying to find the cause of what I think is this issue.

My root filesystem is BTRFS, and I use gnome-shell (3.26.1, and the problem has been happening to me since at least 3.24)

The underlying problem is that gsettings (backed by the dconf backend) emits "changed" events for settings keys in various circumstances where the values of the relevant settings have not changed.

Meanwhile, lots of JS code in gnome-shell and extensions assumes that this will not be the case. In some circumstances there is infinite recursion, for example when a handler for the "changed" event causes the a "changed" event to be dispatched for the same key (even if the value of that key never actually changes).

In one case, the spurious event occurs when JS code sets a setting to a value that it already holds. There is a patch here to address this problem: https://bugzilla.gnome.org/show_bug.cgi?id=790640.

In another case, dconf emits a "changed" event for all keys in the database because it reaches a state where it has lost track of which keys have changed since a client subscribed to the "changed" events: https://bugzilla.gnome.org/show_bug.cgi?id=790640. I suspect that this case may be the one that is related to BTRFS. In this case, I think an infinite loop is possible regardless of whether JS code refrains from doing anything when "changed" handlers are called without a change.

For me, all these problems are difficult to reproduce (probably related to race conditions or something), but most of the problems I have found happen much more frequently when running gnome-shell in valgrind with the Taskbar extension enabled

Some examples of resulting bugs: https://bugzilla.gnome.org/show_bug.cgi?id=782688, https://bugzilla.gnome.org/show_bug.cgi?id=786186, https://bugzilla.gnome.org/show_bug.cgi?id=788110

Comment 21 Ben Cotton 2018-11-27 17:16:09 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Ben Cotton 2018-11-30 19:32:47 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.