979439 – X11 blocked if several small files are being written to SW RAID1

Bug 979439 - X11 blocked if several small files are being written to SW RAID1

Summary: X11 blocked if several small files are being written to SW RAID1

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dmraid
Sub Component:
Version:	18
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Heinz Mauelshagen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-28 14:13 UTC by Zdenek Wagner
Modified:	2014-02-05 22:00 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-02-05 22:00:03 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Relevant part of /var/log/messages from the time of permanent block (236.02 KB, text/plain) 2013-07-25 09:45 UTC, Zdenek Wagner	no flags	Details
Relevant part of /var/log/messages, raid check (15.52 KB, text/plain) 2013-07-25 09:55 UTC, Zdenek Wagner	no flags	Details
View All

Description Zdenek Wagner 2013-06-28 14:13:39 UTC

Description of problem:
X11 becomes blocked (no response within a few minutes to keyboard and mouse, sometimes the mouse cursor even disappears) if several small files are being written within a short time interval. It typically happens if a book consists of many small (5 KB or less) TeX files and I make just small changes in several of them, if 100 small XML files are transformed by XSLT, if firefox updates cache. There is ususlly no problem if I edit large images in prepress resolution in gimp and save them unfrequently but when I work with text files, then saving 50 bytes may take 2 minutes, sometimes even 30 minutes. During that time the disk activity LED is on and KDE allows me to switch from one desktop to another but nothing else. The system monitor shows that the processor load is below 5%, memory usage is below 5%, swap usage is zero.

Version-Release number of selected component (if applicable):
dmraid-events-1.0.0.rc16-18.fc18.x86_64
dmraid-1.0.0.rc16-18.fc18.x86_64

How reproducible:
Always but with varying delay

Steps to Reproduce:
1. Save several small files within a short time interval from any text editor or open firefox with two tabs, one with gmail and one with facebook
2.
3.

Actual results:
Extremely long block of X11

Expected results:
RAID1 activity should not block X11

Additional info:

My hardware:
Motherboard: Intel® Desktop Board DH77KC
Processor: Intel® Core™ i5-3570 Processor

SSD connected to a Serial ATA 6Gb/s port, it contains /boot, swap, / and /usr/local, SSD type is: Model Family: Intel 520 Series SSDs Device Model: INTEL SSDSC2CW120A3 User Capacity: 120,034,123,776 bytes [120 GB]

2 HD connected to Serial ATA 3Gb/s ports, SW RAID1 used, LUKS encrypted LVM containing /var/www, /var/lib/mysql, /opt and /home. HD are Caviar Green. Although these HDs are not officially intended for RAID1, the HW vendor told me that they use them in office computers (not in servers) without problems. I had no problems with these HDs in Fedora 17 on an older motherboard. I have to replace the motherboard after its damage and install Fedora 18 because F17 did not recognize the new HW properly.

$ df -h
Filesystem                        Size  Used Avail Use% Mounted on
devtmpfs                           16G     0   16G   0% /dev
tmpfs                              16G   96K   16G   1% /dev/shm
tmpfs                              16G  9.9M   16G   1% /run
tmpfs                              16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/fedora_centos-pradan   46G   11G   33G  26% /
tmpfs                             4.0G  8.4M  4.0G   1% /tmp
/dev/sda2                         976M   98M  828M  11% /boot
/dev/mapper/fedora_centos-jagah    48G   19G   26G  42% /usr/local
/dev/mapper/vg_hadr-opt           3.9G  1.7G  2.1G  45% /opt
/dev/mapper/vg_hadr-lib           976M   72M  854M   8% /var/lib/mysql
/dev/mapper/vg_hadr-web           9.8G  1.4G  7.9G  15% /var/www
/dev/mapper/vg_hadr-ghar          1.4T  452G  840G  35% /home
pi.1.202:                  29G   15G   14G  53% /home/zw/Empty/mount

$ mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=16411260k,nr_inodes=4102815,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
/dev/mapper/fedora_centos-pradan on / type ext4 (rw,relatime,data=ordered)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tmpfs on /tmp type tmpfs (rw,size=4194304k)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda2 on /boot type ext4 (rw,relatime,data=ordered)
/dev/mapper/fedora_centos-jagah on /usr/local type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg_hadr-opt on /opt type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg_hadr-lib on /var/lib/mysql type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg_hadr-web on /var/www type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg_hadr-ghar on /home type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
pi.1.202: on /home/zw/Empty/mount type fuse.sshfs (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

Contents of /dev/disk/by-id:
ata-HL-DT-ST_DVDRAM_GH24NS95_K4YCARB2323 -> ../../sr0
ata-INTEL_SSDSC2CW120A3_CVCV2280022X120BGN -> ../../sda
ata-INTEL_SSDSC2CW120A3_CVCV2280022X120BGN-part1 -> ../../sda1
ata-INTEL_SSDSC2CW120A3_CVCV2280022X120BGN-part2 -> ../../sda2
ata-INTEL_SSDSC2CW120A3_CVCV2280022X120BGN-part3 -> ../../sda3
ata-INTEL_SSDSC2CW120A3_CVCV2280022X120BGN-part4 -> ../../sda4
ata-INTEL_SSDSC2CW120A3_CVCV2280022X120BGN-part5 -> ../../sda5
ata-WDC_WD15EADS-00P8B0_WD-WMAVU0298557 -> ../../sdc
ata-WDC_WD15EADS-00P8B0_WD-WMAVU0298557-part1 -> ../../sdc1
ata-WDC_WD15EADS-00P8B0_WD-WMAVU0365712 -> ../../sdb
ata-WDC_WD15EADS-00P8B0_WD-WMAVU0365712-part1 -> ../../sdb1
dm-name-fedora_centos-jagah -> ../../dm-1
dm-name-fedora_centos-pradan -> ../../dm-0
dm-name-luks-3907bc4e-f453-4634-8501-81191014fd52 -> ../../dm-2
dm-name-vg_hadr-ghar -> ../../dm-4
dm-name-vg_hadr-lib -> ../../dm-3
dm-name-vg_hadr-opt -> ../../dm-6
dm-name-vg_hadr-web -> ../../dm-5
dm-uuid-CRYPT-LUKS1-3907bc4ef4534634850181191014fd52-luks-3907bc4e-f453-4634-8501-81191014fd52 -> ../../dm-2
dm-uuid-LVM-ezEOcVQR076mQdISQd23T0zifO2CW6I50WVyufll8XDqLIaBoUq3abrs0T99UN2d -> ../../dm-1
dm-uuid-LVM-ezEOcVQR076mQdISQd23T0zifO2CW6I54Uj1nfwA3w04MHTAESQmOnSIOmjfEdwh -> ../../dm-0
dm-uuid-LVM-sA6JHxqf9RDJZBhbPReZKrEM57YmaUK306t35PFoHDYkgS5etSYc9RXodqE1hQMO -> ../../dm-6
dm-uuid-LVM-sA6JHxqf9RDJZBhbPReZKrEM57YmaUK3k3L4KuLm9gymxhXTappH1PcAejMqdrQw -> ../../dm-4
dm-uuid-LVM-sA6JHxqf9RDJZBhbPReZKrEM57YmaUK3L5rAE0LUeH9OfgBiIRGU8KpRAgSqoFKc -> ../../dm-5
dm-uuid-LVM-sA6JHxqf9RDJZBhbPReZKrEM57YmaUK3Sgt29PWmO7JDeAusRGdTbATUaAD3ZSGE -> ../../dm-3
md-name-centos:0 -> ../../md0
md-uuid-4529b033:c29fa85b:1be7bfe3:df70aa70 -> ../../md0
usb-Generic_Ultra_HS-SD_MMC_T12000087432-0:0 -> ../../sdd
wwn-0x5001480000000000 -> ../../sr0
wwn-0x50014ee001a048da -> ../../sdb
wwn-0x50014ee001a048da-part1 -> ../../sdb1
wwn-0x50014ee0ac4afc7c -> ../../sdc
wwn-0x50014ee0ac4afc7c-part1 -> ../../sdc1
wwn-0x5001517bb29b435e -> ../../sda
wwn-0x5001517bb29b435e-part1 -> ../../sda1
wwn-0x5001517bb29b435e-part2 -> ../../sda2
wwn-0x5001517bb29b435e-part3 -> ../../sda3
wwn-0x5001517bb29b435e-part4 -> ../../sda4
wwn-0x5001517bb29b435e-part5 -> ../../sda5

Running "smartctl -t long" on all disks did not find any error.

It is not the problem of I/O generally, USB disks as well as Raspberry Pi model B mounted via sshfs using ethernet work without any problems and without blocking X11.

It does not seem to be a LUKS problem because I have USB 2.0 sticks with LUKS encrypted ext4 filesystem and there is not speed penalty due to encryption, they are currently faster than the SATA disks.

/var/log/messages does not contain any information explaining why X11 is blocked but as I can see, it becomes worse when dbus processes net.reactivated.Fprint (although no printer is installed).

Comment 1 Heinz Mauelshagen 2013-07-15 14:03:50 UTC

Did you make sure, that your HDDs aren't flawed?
The disk light on for so long and the insane IO delays you're describing
strike me...

Please check dmesg when this occurs.

This isn't during initial full sync, which will explain the disk activity?

Comment 2 Zdenek Wagner 2013-07-15 14:17:54 UTC

(In reply to Heinz Mauelshagen from comment #1)
> Did you make sure, that your HDDs aren't flawed?

As I wrote, "smartctl -t long" says that the disks are OK. Is there a better test or can I somehow verify that it is not caused by different response time of the two HDDs?

> The disk light on for so long and the insane IO delays you're describing
> strike me...
> 
It took half an hour yesterday and several hours on Saturday.

> Please check dmesg when this occurs.
> 
dmesg says nothing that can be related to HDD or dmraid, /var/log/messages contains more. A few months ago bad problems were triggered by packagekit, it helped to remove it but now packagekit was updated and no longer causes problems. Another problem is related to net.reactivated.Fprint. Using Google search I found that others have problems with it too and it was suggested to remove it by "yum remove fprintd". It helped a little but not much. Is there a way to make reporting more verbose? It will of course degrade performance but detailed messages might help. Anyway, /var/log/messages are on SDD in my computer, verbose diagnostics should not degrade RAID much.

> This isn't during initial full sync, which will explain the disk activity?

The disks worked fine for some time but now I have these problems always.

I can send /var/log/messages but it only contains start time end end time of RAID check and RAID1 rebuild but not all problems are that bad that check/rebuild is needed.

Comment 3 Zdenek Wagner 2013-07-25 09:45:41 UTC

Created attachment 778163 [details]
Relevant part of /var/log/messages from the time of permanent block

These are relevant messages from the time of permanent block. After system start I was abble to update my working copy from subversion and run TeX. Afterwards I tried to open the PDF in AR. The HDD LED was on and it did not open within 10 minutes although the size of the PDF file was 50 KB. I went to have a dinner and when I returned after half an hour, the PDF file was open in AR. I did some work on the file and finally committed it. The PDF file was kept open and I just refreshed it after each LaTeX run (LaTeX runs are fast because I use TeX Live from TUG and it is installed on SDD). Afterwards I left the computer running without doing any work. Later when I started gmail in firefox trying to write a mail, firefox locked after typing the first two characters, I was able to do nothing but reboot. The HDD LED was still on since the boot. After reboot the HDD LED was again permanently on but I was able to write the mail. The computer behaved normally only from 02:10 to 02:14.

Comment 4 Zdenek Wagner 2013-07-25 09:55:16 UTC

Created attachment 778167 [details]
Relevant part of /var/log/messages, raid check

This shows the messages. Just a few seconds before the raid check started I was editing a few small files and committing the changes to the svn repository. By inspecting to log files on several computers I found that CentOS 5.x does not perform any raid check while Fedora 17 and Fedora 18 do. The difference is that the server running Fedora 17 periodically reads data from a network, performs some calculations, saves both the source data and results and finally sends the results to another server. Two cores from the quad core comuter are permanently running. Looking at the log file from my program I see no slowing caused by the raid check. In Fedora 18, as you can see in the attached file, I could connect USB stick and run hdup2. The first step in my backup script is a hotcopy of svn repositories. It normally takes a second but during the raid check it takes approximatelly 10 minutes.

Comment 5 Zdenek Wagner 2013-07-25 10:00:54 UTC

I have just attached two parts of my /var/log/messages. I do not see any error message that may explain the source of the problem. The NTP packets come from my Raspberry Pi and I do not wish to have them accepted by chrony.Anyway, I had the same HD problems even before I bought Raspberry. I would greatly appreciate if you could suggest me additional tests that may help to find the source of the problem. I would prefer non-destructive tests but if needed, I know how to remove a disk and mount it on another computer via USB as a degraded RAID. I have tried and have the detailed instructions.

Comment 6 Fedora End Of Life 2013-12-21 14:11:10 UTC

This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Zdenek Wagner 2013-12-21 14:25:10 UTC

The problem seems to be caused by the RAID. After removing one of the disks and formatting the remaining disk as LUKS encrypted LVM (no RAID) the problem disappeared.

Comment 8 Fedora End Of Life 2014-02-05 22:00:03 UTC

Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.