Description of problem:
I can't create or remove standard partitions in gnome-disks. I tried with enforcing=0, doesn't help. No error is displayed, I see no error in journal, simply nothing happens.
In journal I see only polkit messages, nothing else:
Nov 09 13:31:13 f25 dbus-daemon: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.87' (uid=0 pid=2158 comm="/usr/lib/polkit-1/polkit-agent-helper-1 kparal " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
Nov 09 13:31:13 f25 systemd: Starting Fingerprint Authentication Daemon...
Nov 09 13:31:13 f25 dbus-daemon: [system] Successfully activated service 'net.reactivated.Fprint'
Nov 09 13:31:13 f25 audit: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=fprintd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 09 13:31:13 f25 systemd: Started Fingerprint Authentication Daemon.
Nov 09 13:31:16 f25 audit: USER_AUTH pid=2158 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_unix acct="kparal" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success'
Nov 09 13:31:16 f25 audit: USER_ACCT pid=2158 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="kparal" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success'
Nov 09 13:31:16 f25 polkitd: Operator of unix-session:2 successfully authenticated as unix-user:kparal to gain TEMPORARY authorization for action org.freedesktop.udisks2.modify-device-system for system-bus-name::1.68 [/usr/bin/gnome-disks --gapplication-service] (owned by unix-user:kparal)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. add some clean drives to a VM
2. boot either a Live or an installed system
3. try to create a new partition in gnome-disks (beware of bug 1393373 when creating the partition table), nothing happens
This seems to break basic application functionality:
"All applications that can be launched using the standard graphical mechanism of a release-blocking desktop after a default installation of that desktop must start successfully and withstand a basic functionality test. "
Same here, happened this morning. Had to use `fdisk` and `mkfs` from command line.
I can reproduce the same problem with cockpit. When creating a partition, the spinner is spinning endlessly, but nothing happens. Nothing interesting in the journal.
Talking to Vratislav Podzimek who is solving the issue in libblockdev: seems like udev is sending different set of events than before and this breaks storaged/libblockdev.
The problem goes away with the testing libblockdev packages from http://koji.fedoraproject.org/koji/taskinfo?taskID=16373088
+1 blocker. I can see people using the live to do re-partitioning
I think I'm +1 blocker on this one, unfortunately; it's a fairly basic function for gnome-disks (the fact that it affects cockpit is also significant, though this functionality isn't explicitly in the cockpit criteria).
I note that what vpodzime actually changed here is to revert two recent explicitly-backported 'fixes' for udev interaction, so I am curious about the background as to why those changes were made in the first place...
+1 blocker. It's also not strictly in the criteria, but having this working in Cockpit for Server deployments is fairly important.
Related failures in the Cockpit integration test suite: https://github.com/cockpit-project/cockpit/pull/5314
That's +4 blocker, setting accepted. We need an update for this; as I can't find tsmetana or vpodzime on IRC, guess I'll do it.
libblockdev-1.9-8.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-76a96e8bf3
+1 blocker and +1 FE
(In reply to Adam Williamson from comment #8)
> I note that what vpodzime actually changed here is to revert two recent
> explicitly-backported 'fixes' for udev interaction, so I am curious about
> the background as to why those changes were made in the first place...
Those were fixes for less frequent/severe bugs. Dropping the patches gives Vratislav some time to find a better fix for them while not blocking the release.
(In reply to Tomas Smetana from comment #4)
> Talking to Vratislav Podzimek who is solving the issue in libblockdev: seems
> like udev is sending different set of events than before and this breaks
Can you point to the change in udevd that causes this change in behavior?
My current understanding is:
- udevd takes an exclusive lock on $DEV before issuing the BLKRRPART ioctl.
- udevd takes a shared lock on $DEV before processing rules for it.
This behavior as part of the udev API.
Storaged takes a shared lock on the partition table device while creating a partition and formatting it. The lock is held all the way from before calling parted until after calling mkfs. This is to make sure that udevd does not issue the BLKRRPART ioctl while storaged is working with the partition table and partition.
(This is important because BLKRRPART is a kind of party crasher: While it runs, device nodes in /dev will actually disappear briefly. This must not happen while Storaged is trying to run wipefs or mkfs on them. It doesn't matter much if it happens after Storaged is done, but I would still argue that BLKRRPART needs to be avoided completely or fixed in the kernel to not issue bogus remove/add event pairs.)
Libblockdev (used to) take an exclusive lock. This might prevent udevd from processing some important rules. Maybe a shared lock is all that is needed.
However, as with storaged, clients of libblockdev probably need to take their own lock anyway if they want to use the block device of the new partition. (There is no telling when exactly udevd will do the BLKRRPART, so we need to lock until we don't care anynmore.)
Maybe libblockdev just needs to document this, if it doesn't need the lock for itself?
(In reply to Fedora Update System from comment #12)
> libblockdev-1.9-8.fc25 has been submitted as an update to Fedora 25.
First of all: Marius, thanks for the detailed info and pointers!!!
Turns out the problem was not with the uevents, but with the locking itself. libblockdev hanged on the flock() call because it was trying to get an *exclusive* lock on the device file. And that wasn't possible due to storaged's shared lock on it.
Let me describe you why libblockdev was trying to get an *exclusive* lock in the first place. It was a fix for failing tests which started to fail on our jenkins slaves after the lvm2 packages were updated. libblockdev is using libparted and to create a partition (or do any change of the partition table, really) it has to call two libparted functions -- commit_to_disk() and commit_to_os(). What started happening was that the latter failed because the device was busy. udev was to blame, of course :), but I thought it was LVM stepping in between the two function calls and opening the device. Because the failures started appearing after *only* the lvm2 packages were updated. So I introduced an *exclusive* lock to prevent udev from generating uevents in between the above function calls.
Turns out, something in LVM or DM got faster making *udev* react on changes on the device faster and stepping in with its BLKRRPART ioctl(). So a *shared* lock seems to be sufficient to prevent the issues and test failures while it doesn't cause issues in combination with storaged. The change is implemented in the following PR - https://github.com/rhinstaller/libblockdev/pull/143 - which I have already merged so that we can do a Copr build of libblockdev and start running storaged tests with it.
One interesting fact to conclude this -- libparted is not using the BLKRRPART ioctl() to tell kernel to reread the partition table on the device. It is using some other ioctl() to tell kernel where exactly partitions start and end. That allows libparted to support partition tables not supported by the kernel. However, udev is ruining this all with its BLKRRPART ioctl() run afterwards.
libblockdev-1.9-8.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-76a96e8bf3
(In reply to Vratislav Podzimek from comment #17)
> First of all: Marius, thanks for the detailed info and pointers!!!
> Turns out the problem was not with the uevents, but with the locking itself.
> libblockdev hanged on the flock() call because it was trying to get an
> *exclusive* lock on the device file. And that wasn't possible due to
> storaged's shared lock on it.
Ahh, of course! I didn't think of that.
> One interesting fact to conclude this -- libparted is not using the
> BLKRRPART ioctl() to tell kernel to reread the partition table on the
> device. It is using some other ioctl() to tell kernel where exactly
> partitions start and end. That allows libparted to support partition tables
> not supported by the kernel.
Yes, and when commit_to_os fails with EBUSY, it does not mean that somebody else has the device open or locked, it means that commit_to_os has tried to tell the kernel about a partition that the kernel already knew about. (Or more precisely, the newly announced partition overlaps with an already existing partition.) This happens when udevd gets its BLKRRPART in between commit_to_disk and commit_to_os.
So one thing libparted could do is to handle EBUSY and check whether the kernel state is already as expected and just carry on.
> However, udev is ruining this all with its
> BLKRRPART ioctl() run afterwards.
Yeah, that ioctl should be unnecessary and it is sometimes harmful. It is useful when someone "dd"s a partition table to a device. Udevd will then reread the partition table automatically. I would argue that in this case, we can expect the user to also run "partprobe" explicitly.
But I also think that using a shared flock is a correct approach, so I am happy with leaving the BLKRRPART ioctl in udev.
libblockdev-1.9-8.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.