Description of problem: When many eclipse plug-ins are installed, it mess-up the file-system until the computer is unable to boot. At boot, you get few lines on the screen (i.e., the usual ones I presume), for less than one second, then everything crashed to a black screen. Cannot launch linux, cannot read the lines on the screen. How reproducible: Not always, you have to use eclipse for a while and install a lot of plug ins to make this crash happen. It doesn't crash after install, we have to use eclipse to mess up the file system. Steps to Reproduce: 1. install eclipse 2. install a lot of plug ins (here CDT, PHP plugin, texlipse) 3. use it until it mess up the file system. It can takes a while and a lot of projects. Actual results: - Booting in console only mode (by adding 3 to the grub line) doesn't fix the problem. - Fixing the btrfs partition linux is installed on doesn't fix the problem either. Fix tool says everything is OK. btrfs check --repair --check-data-csum --progress /dev/sddXXX btrfs rescue zero-log /dev/sddXXX btrfs super-recover /dev/sddXXX btrfs chunk-recover /dev/sddXXX - The btrfs partition IS readable from a live CD. But we cannot boot on it while the system is installed on this partition. - Deleting the eclipse folder FIXES the problem. I do have this problem now, with eclipse on a btrfs partition. Previously I had zfs mess when eclipse was on a ZFS partition. Expected results: 1) Linux should boot, even when the filesystem is messed up by a ton of files (NOTE the partition is readable from a live CD). 2) BTRFS should do something to prevent file bombing, I prefer an eclipse crash than a non bootable system, or a messed up partition. 3) ZFS has trouble too. zfs-fuse crashes when eclipse is installed on a zfs partition, and deleting the eclipse folder is far less complicated on zfs than on btrfs. zfs-fuse should be patched too to be robust to file bombing. 4) Maybe, if the stars are aligned, you can explain to eclipse developers what's a database, introduce them to sqlite, and explain them how easier and better it is to use it instead of file-bombing the file-system. But as far as I know eclipse, if something can be dirty, it will be dirty. Additional info : WORKAROUNDS (save your data/os) --- ZFS --- Previously I had trouble when eclipse was installed on a ZFS disk. It took me a while to get access to my data partition as eclipse realy messed up the file system a lot. Deleting the eclipse folder fixed the problem, but I had a hard time deleting it, as zfs-fuse crashed and as the messy filesystem has the side effect of freezing linux. This is the recipe that worked for me - try to delete the eclipse folder (zfs may crash) - if zfs crash : try to delete a few file at a time - if rm or ls crash: try to delete a few file at a time with guesses like rm -r a*; rm -r b*; rm -r c* and etc. Note that the partition remains valid so zfs scrub will NOT fix the problem. --- BTRFS --- From my experience, BTRFS leads to less trouble, as long as eclipse IS NOT on the system partition. When things gets wrong, you can simply boot on live CD, mount the BTRFS, and to a rm -r eclipse.
Created attachment 1310653 [details] This is the list of files from the eclipse folder that crashed my computer. This is the list of files from the eclipse folder that crashed my computer. find > /tmp/eclipse.log
Hi Pierre, it looks like you are using Eclipse downloaded from eclipse.org rather than the Eclipse RPM from the Fedora repos, so there's nothing I can do support-wise with the Eclipse package against which you filed this bug. I'm going to re-assign this bug to the kernel component because Eclipse's misuse of the disk aside, a filesystem ought to be able to handle what's thrown at it -- I've never heard of such corruption/crash problems using Eclipse on an ext4 partition for example.
Does building headless with maven cause the issue afterwards (ignoring the installation of Eclipse) ? I mean there are quite a lot of files created/touched by Eclipse, but 75% or so are basically from that git repo. If this is Eclipse-related then I think we'd need to narrow down what plugin is causing this.
@MAt : Hi! thank for the answer. - I'm currently using the eclipse.org version, you're true. - I had some problem with the fedora version of eclipse, that makes me switch to the eclipse.org, but I'm unsure if they are related to the file-system or not. - I've never tried on ext4 @Roland - I don't know how to test that. - I think that first, it's filesystem related. The filesystem must either refuse to write file when someone is filebombing, or be robust enough to handle a large number of files. - I've got no idea about what "building with maven" means. @ALL Maybe we can write a sh script that filebombs and see if it causes problem to file system. As Mat said, "a filesystem ought to be able to handle what's thrown at it", so we must fist see if the problem is eclipse or the filesystem (here BTRFS and ZFS).
Regarding 'building with maven', assuming you start witha working system, you could try : - Ensuring 'maven' is installed on the system (eg. dnf install maven) - From $HOME/git/org.eclipse.cdt/ folder, run 'mvn clean verify -DskipTests' Afterwards, you could try to see if the same behaviour occurs again.
Fedora doesn't support ZFS, so that's out of scope. Fedora's BTRFS support is also rather limited. I suspect you'd be better served by discussing these findings directly with the btrfs developers on the upstream list.
Try extracting journal files from the btrfs file system using journalctl -d pointed to <mp>/var/log/journal<machineid>/ for the -b-1 and -b-2 boot, use > to output to a file then attach here. Also useful to include (btrfs commands require root): btrfs fi us <mp> btrfs insp dump-t -r <dev> uname -r btrfs ver And optionally download the btrfs-debugfs python script file from upstream git: https://github.com/kdave/btrfs-progs btrfs-debugfs -b <mp> <mp>=mountpoint path
Created attachment 1311022 [details] boot0 to boot -3 I'm quite sure the boots were only logged when thing works, the ones when btrfs was buggy were probably not logged.
btrfs fi us / Overall: Device size: 195.31GiB Device allocated: 135.01GiB Device unallocated: 60.30GiB Device missing: 0.00B Used: 53.40GiB Free (estimated): 140.42GiB (min: 140.42GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:133.00GiB, Used:52.88GiB /dev/sdd2 133.00GiB Metadata,single: Size:2.01GiB, Used:530.58MiB /dev/sdd2 2.01GiB System,single: Size:4.00MiB, Used:16.00KiB /dev/sdd2 4.00MiB Unallocated: /dev/sdd2 60.30GiB btrfs insp dump-t -r /dev/sdd2 btrfs-progs v4.9.1 root tree: 330530816 level 1 chunk tree: 131072 level 0 extent tree key (EXTENT_TREE ROOT_ITEM 0) 329990144 level 2 device tree key (DEV_TREE ROOT_ITEM 0) 58245120 level 0 fs tree key (FS_TREE ROOT_ITEM 0) 4243456 level 0 checksum tree key (CSUM_TREE ROOT_ITEM 0) 329940992 level 2 uuid tree key (UUID_TREE ROOT_ITEM 0) 181370880 level 0 file tree key (257 ROOT_ITEM 0) 329859072 level 2 file tree key (260 ROOT_ITEM 0) 55551868928 level 0 data reloc tree key (DATA_RELOC_TREE ROOT_ITEM 0) 4358144 level 0 total bytes 209715200000 bytes used 57335488512 uuid 189e4036-677d-4052-9613-15335cf954e3 uname -r 4.11.9-300.fc26.x86_64 btrfs ver btrfs-progs v4.9.1
Well we need more information or it's all just speculation. There's plenty of unallocated space, and no snapshots. There are no kernel complaints about the file system in these logs, so if these logs don't show the problem we need other logs. If you can boot single user mode, and at least get networking and sshd running then you could remote in and 'journalctl f' to capture anything that happens while transitioning to multi-user.target. At the moment there's just nothing to go on. The alternative is to reproduce in a VM backed by LVM LV, or if it must be a file on Btrfs make sure to set chattr +C on the backing file at creation time e.g. touch, then chattr, then fallocate -l). And now when booting the VM use parameter console=ttyS0, and virsh console to record the failing boot process.
I did some testing on a virtual machine, my testing did NOT reproduce the bug. I get messages for "no space left on device", but fedora sucessfully booted. I've configured the VM by doing a minimal install (so console only) of F26 x86_64 trough netinstall in virtual box. --- TEST 1 NO BUG : a lot of files, with content --- for i in {1..1000} do for j in {1..1000} do for k in {1..32} do echo "$i-$j-$k" > "$i-$j-$k.txt"; done done done --- TEST 2 NO BUG : a lot of empty files --- for i in {1..1000} do for j in {1..1000} do for k in {1..32} do touch "$i-$j-$k.txt"; done done done --- TEST 3 NO BUG : complicated paths --- for i in {1..1000} do mkdir "diri_$i"; cd "diri_$i"; for j in {1..1000} do mkdir "dirj_$j"; cd "dirj_$j"; for k in {1..32} do touch "$i-$j-$k.txt"; done cd ..; done cd ..; done
Try kernel 4.12.6 or newer, which includes commit 4a309747.
Eclipse is still a large source of trouble but I've never experience the boot failure problem again. I think that we can consider it solved for now. Thank for your help, and sorry for not giving more informations, it's still very mysterious for me too.