Bug 1393379 - partitions can't be created nor removed in gnome-disks, cockpit
Summary: partitions can't be created nor removed in gnome-disks, cockpit
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: storaged
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tomas Smetana
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F25FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2016-11-09 12:36 UTC by Kamil Páral
Modified: 2016-11-15 13:37 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-15 13:37:08 UTC


Attachments (Terms of Use)

Description Kamil Páral 2016-11-09 12:36:02 UTC
Description of problem:
I can't create or remove standard partitions in gnome-disks. I tried with enforcing=0, doesn't help. No error is displayed, I see no error in journal, simply nothing happens.

In journal I see only polkit messages, nothing else:

Nov 09 13:31:13 f25 dbus-daemon[715]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.87' (uid=0 pid=2158 comm="/usr/lib/polkit-1/polkit-agent-helper-1 kparal " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
Nov 09 13:31:13 f25 systemd[1]: Starting Fingerprint Authentication Daemon...
Nov 09 13:31:13 f25 dbus-daemon[715]: [system] Successfully activated service 'net.reactivated.Fprint'
Nov 09 13:31:13 f25 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=fprintd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 09 13:31:13 f25 systemd[1]: Started Fingerprint Authentication Daemon.
Nov 09 13:31:16 f25 audit[2158]: USER_AUTH pid=2158 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_unix acct="kparal" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success'
Nov 09 13:31:16 f25 audit[2158]: USER_ACCT pid=2158 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="kparal" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success'
Nov 09 13:31:16 f25 polkitd[736]: Operator of unix-session:2 successfully authenticated as unix-user:kparal to gain TEMPORARY authorization for action org.freedesktop.udisks2.modify-device-system for system-bus-name::1.68 [/usr/bin/gnome-disks --gapplication-service] (owned by unix-user:kparal)


Version-Release number of selected component (if applicable):
gnome-disk-utility-3.22.0-1.fc25.x86_64
storaged-2.6.2-2.fc25.x86_64


How reproducible:
always

Steps to Reproduce:
1. add some clean drives to a VM
2. boot either a Live or an installed system
3. try to create a new partition in gnome-disks (beware of bug 1393373 when creating the partition table), nothing happens

Comment 1 Kamil Páral 2016-11-09 12:36:25 UTC
This seems to break basic application functionality:
"All applications that can be launched using the standard graphical mechanism of a release-blocking desktop after a default installation of that desktop must start successfully and withstand a basic functionality test. "
https://fedoraproject.org/wiki/Fedora_25_Final_Release_Criteria#Default_application_functionality

Comment 2 Anass Ahmed 2016-11-09 13:52:50 UTC
Same here, happened this morning. Had to use `fdisk` and `mkfs` from command line.

Comment 3 Kamil Páral 2016-11-09 14:42:17 UTC
I can reproduce the same problem with cockpit. When creating a partition, the spinner is spinning endlessly, but nothing happens. Nothing interesting in the journal.

Comment 4 Tomas Smetana 2016-11-09 15:11:27 UTC
Talking to Vratislav Podzimek who is solving the issue in libblockdev: seems like udev is sending different set of events than before and this breaks storaged/libblockdev.

Comment 5 Tomas Smetana 2016-11-09 15:49:42 UTC
The problem goes away with the testing libblockdev packages from http://koji.fedoraproject.org/koji/taskinfo?taskID=16373088

Comment 6 Dennis Gilmore 2016-11-09 17:06:57 UTC
+1 blocker. I can see people using the live to do re-partitioning

Comment 7 Nick Bebout 2016-11-09 17:07:41 UTC
+1 blocker

Comment 8 Adam Williamson 2016-11-09 17:08:08 UTC
I think I'm +1 blocker on this one, unfortunately; it's a fairly basic function for gnome-disks (the fact that it affects cockpit is also significant, though this functionality isn't explicitly in the cockpit criteria).

I note that what vpodzime actually changed here is to revert two recent explicitly-backported 'fixes' for udev interaction, so I am curious about the background as to why those changes were made in the first place...

Comment 9 Stephen Gallagher 2016-11-09 17:09:25 UTC
+1 blocker. It's also not strictly in the criteria, but having this working in Cockpit for Server deployments is fairly important.

Comment 10 Stef Walter 2016-11-09 17:11:19 UTC
Related failures in the Cockpit integration test suite: https://github.com/cockpit-project/cockpit/pull/5314

Comment 11 Adam Williamson 2016-11-09 17:11:35 UTC
That's +4 blocker, setting accepted. We need an update for this; as I can't find tsmetana or vpodzime on IRC, guess I'll do it.

Comment 12 Fedora Update System 2016-11-09 17:13:26 UTC
libblockdev-1.9-8.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-76a96e8bf3

Comment 13 Ben Williams 2016-11-09 17:14:31 UTC
+1 blocker and +1 FE

Comment 14 Tomas Smetana 2016-11-09 18:01:08 UTC
(In reply to Adam Williamson from comment #8)
 
> I note that what vpodzime actually changed here is to revert two recent
> explicitly-backported 'fixes' for udev interaction, so I am curious about
> the background as to why those changes were made in the first place...

Those were fixes for less frequent/severe bugs. Dropping the patches gives Vratislav some time to find a better fix for them while not blocking the release.

Comment 15 Marius Vollmer 2016-11-10 08:37:55 UTC
(In reply to Tomas Smetana from comment #4)
> Talking to Vratislav Podzimek who is solving the issue in libblockdev: seems
> like udev is sending different set of events than before and this breaks
> storaged/libblockdev.

Can you point to the change in udevd that causes this change in behavior?

My current understanding is:

 - udevd takes an exclusive lock on $DEV before issuing the BLKRRPART ioctl.

 - udevd takes a shared lock on $DEV before processing rules for it.

This behavior as part of the udev API.

Storaged takes a shared lock on the partition table device while creating a partition and formatting it.  The lock is held all the way from before calling parted until after calling mkfs.  This is to make sure that udevd does not issue the BLKRRPART ioctl while storaged is working with the partition table and partition.

(This is important because BLKRRPART is a kind of party crasher: While it runs, device nodes in /dev will actually disappear briefly.  This must not happen while Storaged is trying to run wipefs or mkfs on them.  It doesn't matter much if it happens after Storaged is done, but I would still argue that BLKRRPART needs to be avoided completely or fixed in the kernel to not issue bogus remove/add event pairs.)

Libblockdev (used to) take an exclusive lock.  This might prevent udevd from processing some important rules.  Maybe a shared lock is all that is needed.

However, as with storaged, clients of libblockdev probably need to take their own lock anyway if they want to use the block device of the new partition.  (There is no telling when exactly udevd will do the BLKRRPART, so we need to lock until we don't care anynmore.)

Maybe libblockdev just needs to document this, if it doesn't need the lock for itself?

Comment 16 Kamil Páral 2016-11-10 09:48:59 UTC
(In reply to Fedora Update System from comment #12)
> libblockdev-1.9-8.fc25 has been submitted as an update to Fedora 25.
> https://bodhi.fedoraproject.org/updates/FEDORA-2016-76a96e8bf3

Looks fixed.

Comment 17 Vratislav Podzimek 2016-11-10 15:14:23 UTC
First of all: Marius, thanks for the detailed info and pointers!!!

Turns out the problem was not with the uevents, but with the locking itself. libblockdev hanged on the flock() call because it was trying to get an *exclusive* lock on the device file. And that wasn't possible due to storaged's shared lock on it.

Let me describe you why libblockdev was trying to get an *exclusive* lock in the first place. It was a fix for failing tests which started to fail on our jenkins slaves after the lvm2 packages were updated. libblockdev is using libparted and to create a partition (or do any change of the partition table, really) it has to call two libparted functions -- commit_to_disk() and commit_to_os(). What started happening was that the latter failed because the device was busy. udev was to blame, of course :), but I thought it was LVM stepping in between the two function calls and opening the device. Because the failures started appearing after *only* the lvm2 packages were updated. So I introduced an *exclusive* lock to prevent udev from generating uevents in between the above function calls.

Turns out, something in LVM or DM got faster making *udev* react on changes on the device faster and stepping in with its BLKRRPART ioctl(). So a *shared* lock seems to be sufficient to prevent the issues and test failures while it doesn't cause issues in combination with storaged. The change is implemented in the following PR - https://github.com/rhinstaller/libblockdev/pull/143 - which I have already merged so that we can do a Copr build of libblockdev and start running storaged tests with it.

One interesting fact to conclude this -- libparted is not using the BLKRRPART ioctl() to tell kernel to reread the partition table on the device. It is using some other ioctl() to tell kernel where exactly partitions start and end. That allows libparted to support partition tables not supported by the kernel. However, udev is ruining this all with its BLKRRPART ioctl() run afterwards.

Comment 18 Fedora Update System 2016-11-10 19:26:30 UTC
libblockdev-1.9-8.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-76a96e8bf3

Comment 19 Marius Vollmer 2016-11-11 08:12:08 UTC
(In reply to Vratislav Podzimek from comment #17)
> First of all: Marius, thanks for the detailed info and pointers!!!

Sure!

> Turns out the problem was not with the uevents, but with the locking itself.
> libblockdev hanged on the flock() call because it was trying to get an
> *exclusive* lock on the device file. And that wasn't possible due to
> storaged's shared lock on it.

Ahh, of course!  I didn't think of that.

> One interesting fact to conclude this -- libparted is not using the
> BLKRRPART ioctl() to tell kernel to reread the partition table on the
> device. It is using some other ioctl() to tell kernel where exactly
> partitions start and end. That allows libparted to support partition tables
> not supported by the kernel.

Yes, and when commit_to_os fails with EBUSY, it does not mean that somebody else has the device open or locked, it means that commit_to_os has tried to tell the kernel about a partition that the kernel already knew about.  (Or more precisely, the newly announced partition overlaps with an already existing partition.)  This happens when udevd gets its BLKRRPART in between commit_to_disk and commit_to_os.

So one thing libparted could do is to handle EBUSY and check whether the kernel state is already as expected and just carry on.

> However, udev is ruining this all with its
> BLKRRPART ioctl() run afterwards.

Yeah, that ioctl should be unnecessary and it is sometimes harmful.  It is useful when someone "dd"s a partition table to a device.  Udevd will then reread the partition table automatically.  I would argue that in this case, we can expect the user to also run "partprobe" explicitly.

But I also think that using a shared flock is a correct approach, so I am happy with leaving the BLKRRPART ioctl in udev.

Comment 20 Fedora Update System 2016-11-15 13:37:08 UTC
libblockdev-1.9-8.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.