Bug 1479433 - Prolonged use of Eclipse on btrfs causes failure to boot
Summary: Prolonged use of Eclipse on btrfs causes failure to boot
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-08 14:38 UTC by Pierre Blavy
Modified: 2019-09-06 03:44 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-06 03:44:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
This is the list of files from the eclipse folder that crashed my computer. (4.89 MB, text/plain)
2017-08-08 14:45 UTC, Pierre Blavy
no flags Details
boot0 to boot -3 (935.70 KB, application/zip)
2017-08-09 07:15 UTC, Pierre Blavy
no flags Details

Description Pierre Blavy 2017-08-08 14:38:52 UTC
Description of problem:

When many eclipse plug-ins are installed, it mess-up the file-system until the computer is unable to boot. At boot, you get few lines on the screen (i.e., the usual ones I presume), for less than one second, then everything crashed to a black screen. Cannot launch linux, cannot read the lines on the screen.


How reproducible:
Not always, you have to use eclipse for a while and install a lot of plug ins to make this crash happen. It doesn't crash after install, we have to use eclipse to mess up the file system.




Steps to Reproduce:
1. install eclipse
2. install a lot of plug ins (here CDT, PHP plugin, texlipse)
3. use it until it mess up the file system. It can takes a while and a lot of projects.

Actual results:


- Booting in console only mode (by adding 3 to the grub line) doesn't fix the problem.

- Fixing the btrfs partition linux is installed on doesn't fix the problem either. Fix tool says everything is OK.
btrfs check --repair --check-data-csum --progress /dev/sddXXX
btrfs rescue zero-log /dev/sddXXX
btrfs super-recover /dev/sddXXX
btrfs chunk-recover /dev/sddXXX

- The btrfs partition IS readable from a live CD. But we cannot boot on it while the system is installed on this partition.

- Deleting the eclipse folder FIXES the problem.

I do have this problem now, with eclipse on a btrfs partition. Previously I had zfs mess when eclipse was on a ZFS partition. 



Expected results:
1) Linux should boot, even when the filesystem is messed up by a ton of files (NOTE the partition is readable from a live CD).

2) BTRFS should do something to prevent file bombing, I prefer an eclipse crash than a non bootable system, or a messed up partition.

3) ZFS has trouble too. zfs-fuse crashes when eclipse is installed on a zfs partition, and deleting the eclipse folder is far less complicated on zfs than on btrfs. zfs-fuse should be patched too to be robust to file bombing.

4) Maybe, if the stars are aligned, you can explain to eclipse developers what's a database, introduce them to sqlite, and explain them how easier and better it is to use it instead of file-bombing the file-system. But as far as I know eclipse, if something can be dirty, it will be dirty.


Additional info : WORKAROUNDS (save your data/os)

--- ZFS ---

Previously I had trouble when eclipse was installed on a ZFS disk. It took me a while to get access to my data partition as eclipse realy messed up the file system a lot. Deleting the eclipse folder fixed the problem, but I had a hard time deleting it, as zfs-fuse crashed and as the messy filesystem has the side effect of freezing linux.

This is the recipe that worked for me
- try to delete the eclipse folder (zfs may crash)
- if zfs crash : try to delete a few file at a time
- if rm or ls crash: try to delete a few file at a time with guesses like rm -r a*; rm -r b*; rm -r c* and etc.

Note that the partition remains valid so zfs scrub will NOT fix the problem.


--- BTRFS ---
From my experience, BTRFS leads to less trouble, as long as eclipse IS NOT on the system partition. When things gets wrong, you can simply boot on live CD, mount the BTRFS, and to  a rm -r eclipse.

Comment 1 Pierre Blavy 2017-08-08 14:45:29 UTC
Created attachment 1310653 [details]
This is the list of files from the eclipse folder that crashed my computer.

This is the list of files from the eclipse folder that crashed my computer. 
find > /tmp/eclipse.log

Comment 2 Mat Booth 2017-08-08 15:58:35 UTC
Hi Pierre, it looks like you are using Eclipse downloaded from eclipse.org rather than the Eclipse RPM from the Fedora repos, so there's nothing I can do support-wise with the Eclipse package against which you filed this bug.

I'm going to re-assign this bug to the kernel component because Eclipse's misuse of the disk aside, a filesystem ought to be able to handle what's thrown at it -- I've never heard of such corruption/crash problems using Eclipse on an ext4 partition for example.

Comment 3 Roland Grunberg 2017-08-08 16:07:47 UTC
Does building headless with maven cause the issue afterwards (ignoring the installation of Eclipse) ? I mean there are quite a lot of files created/touched by Eclipse, but 75% or so are basically from that git repo.

If this is Eclipse-related then I think we'd need to narrow down what plugin is causing this.

Comment 4 Pierre Blavy 2017-08-08 16:18:41 UTC
@MAt : 
Hi! thank for the answer.
 - I'm currently using the eclipse.org version, you're true.
 - I had some problem with the fedora version of eclipse, that makes me switch to the eclipse.org, but I'm unsure if they are related to the file-system or not.
 - I've never tried on ext4


@Roland
- I don't know how to test that.
- I think that first, it's filesystem related. The filesystem must either refuse to write file when someone is filebombing, or be robust enough to handle a large number of files.
- I've got no idea about what "building with maven" means. 

@ALL
Maybe we can write a sh script that filebombs and see if it causes problem to file system. As Mat said, "a filesystem ought to be able to handle what's thrown at it", so we must fist see if the problem is eclipse or the filesystem (here BTRFS and ZFS).

Comment 5 Roland Grunberg 2017-08-08 16:36:35 UTC
Regarding 'building with maven', assuming you start witha  working system, you could try :
- Ensuring 'maven' is installed on the system (eg. dnf install maven)
- From $HOME/git/org.eclipse.cdt/ folder, run 'mvn clean verify -DskipTests'

Afterwards, you could try to see if the same behaviour occurs again.

Comment 6 Josh Boyer 2017-08-08 16:49:45 UTC
Fedora doesn't support ZFS, so that's out of scope.

Fedora's BTRFS support is also rather limited.  I suspect you'd be better served by discussing these findings directly with the btrfs developers on the upstream list.

Comment 7 Chris Murphy 2017-08-09 04:54:28 UTC
Try extracting journal files from the btrfs file system using journalctl -d pointed to <mp>/var/log/journal<machineid>/ for the -b-1 and -b-2 boot, use > to output to a file then attach here.

Also useful to include (btrfs commands require root):
btrfs fi us <mp>
btrfs insp dump-t -r <dev>
uname -r
btrfs ver

And optionally download the btrfs-debugfs python script file from upstream git:
https://github.com/kdave/btrfs-progs

btrfs-debugfs -b <mp>


<mp>=mountpoint path

Comment 8 Pierre Blavy 2017-08-09 07:15:59 UTC
Created attachment 1311022 [details]
boot0 to boot -3

I'm quite sure the boots were only logged when thing works, the ones when btrfs was buggy were probably not logged.

Comment 9 Pierre Blavy 2017-08-09 07:18:09 UTC
 btrfs fi us /
Overall:
    Device size:		 195.31GiB
    Device allocated:		 135.01GiB
    Device unallocated:		  60.30GiB
    Device missing:		     0.00B
    Used:			  53.40GiB
    Free (estimated):		 140.42GiB	(min: 140.42GiB)
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		 512.00MiB	(used: 0.00B)

Data,single: Size:133.00GiB, Used:52.88GiB
   /dev/sdd2	 133.00GiB

Metadata,single: Size:2.01GiB, Used:530.58MiB
   /dev/sdd2	   2.01GiB

System,single: Size:4.00MiB, Used:16.00KiB
   /dev/sdd2	   4.00MiB

Unallocated:
   /dev/sdd2	  60.30GiB






btrfs insp dump-t -r /dev/sdd2
btrfs-progs v4.9.1
root tree: 330530816 level 1
chunk tree: 131072 level 0
extent tree key (EXTENT_TREE ROOT_ITEM 0) 329990144 level 2
device tree key (DEV_TREE ROOT_ITEM 0) 58245120 level 0
fs tree key (FS_TREE ROOT_ITEM 0) 4243456 level 0
checksum tree key (CSUM_TREE ROOT_ITEM 0) 329940992 level 2
uuid tree key (UUID_TREE ROOT_ITEM 0) 181370880 level 0
file tree key (257 ROOT_ITEM 0) 329859072 level 2
file tree key (260 ROOT_ITEM 0) 55551868928 level 0
data reloc tree key (DATA_RELOC_TREE ROOT_ITEM 0) 4358144 level 0
total bytes 209715200000
bytes used 57335488512
uuid 189e4036-677d-4052-9613-15335cf954e3




 uname -r
4.11.9-300.fc26.x86_64



btrfs ver
btrfs-progs v4.9.1

Comment 10 Chris Murphy 2017-08-09 16:16:16 UTC
Well we need more information or it's all just speculation. There's plenty of unallocated space, and no snapshots. There are no kernel complaints about the file system in these logs, so if these logs don't show the problem we need other logs. If you can boot single user mode, and at least get networking and sshd running then you could remote in and 'journalctl f' to capture anything that happens while transitioning to multi-user.target. At the moment there's just nothing to go on.

The alternative is to reproduce in a VM backed by LVM LV, or if it must be a file on Btrfs make sure to set chattr +C on the backing file at creation time e.g. touch, then chattr, then fallocate -l). And now when booting the VM use parameter console=ttyS0, and virsh console to record the failing boot process.

Comment 11 Pierre Blavy 2017-08-25 15:23:00 UTC
I did some testing on a virtual machine, my testing did NOT reproduce the bug.
I get messages for "no space left on device", but fedora sucessfully booted.
I've configured the VM by doing a minimal install (so console only) of F26 x86_64 trough netinstall in virtual box.



--- TEST 1 NO BUG : a lot of files, with content ---

for i in {1..1000}
do
  for j in {1..1000}
  do
    for k in {1..32}
    do
      echo "$i-$j-$k" > "$i-$j-$k.txt";
    done
  done
done


--- TEST 2 NO BUG : a lot of empty files ---

for i in {1..1000}
do
  for j in {1..1000}
  do
    for k in {1..32}
    do
      touch "$i-$j-$k.txt";
    done
  done
done


--- TEST 3 NO BUG : complicated paths ---

for i in {1..1000}
do
  mkdir "diri_$i";
  cd    "diri_$i";
  for j in {1..1000}
  do
    mkdir "dirj_$j";
    cd    "dirj_$j";
    for k in {1..32}
    do
      touch "$i-$j-$k.txt";
    done
    cd ..;
  done
  cd ..;
done

Comment 12 Chris Murphy 2017-08-25 15:51:25 UTC
Try kernel 4.12.6 or newer, which includes commit 4a309747.

Comment 13 Pierre Blavy 2018-02-19 16:14:43 UTC
Eclipse is still a large source of trouble but I've never experience the boot failure problem again. I think that we can consider it solved for now. Thank for your help, and sorry for not giving more informations, it's still very mysterious for me too.


Note You need to log in before you can comment on or make changes to this bug.