Bug 920681

Summary: /dev/snapshot's design is accident prone
Product: Red Hat Enterprise Linux 6 Reporter: J.H.M. Dassen (Ray) <rdassen>
Component: kernelAssignee: Laszlo Ersek <lersek>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: bmr, mkarg, rbinkhor
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-04 11:18:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description J.H.M. Dassen (Ray) 2013-03-12 14:31:10 UTC
Description of problem:
The design of /dev/snapshot's interfaces is prone to accidents (unexpected side effects).

Version-Release number of selected component (if applicable):
2.6.32-358.0.1.el6

How reproducible:
100%

Steps to Reproduce:
1. Install RHEL6 as a virtualisation guest using a virtualisation technology that supports hot-add/hot-remove of vCPUs (e.g. KVM, vmware); configure at least 2 vCPUs.
2. Verify that CPU1 can be enabled or disabled dynamically:
   # echo -n 0 > /sys/devices/system/cpu/cpu1/online
   # echo -n 1 > /sys/devices/system/cpu/cpu1/online
3. Run
   # cat < /dev/snapshot
4. Try to disable CPU1:
   # echo -n 0 > /sys/devices/system/cpu/cpu1/online

Actual results:
[root@rhel6 ~]# echo -n 0 > /sys/devices/system/cpu/cpu1/online 
-bash: echo: write error: Device or resource busy

Expected results:
CPU1 can be still be disabled dynamically.

Additional info:
While "cat < /dev/snapshot" may look quite artificial, there are very normal commands that an administrator or an administrative tool can invoke that have the same side effect; in particular "sginfo -l".

Comment 1 J.H.M. Dassen (Ray) 2013-03-12 14:44:55 UTC
The "sginfo -l" aspect of this is now being tracked through bug #920687.

Comment 8 Laszlo Ersek 2013-04-04 11:18:09 UTC
(commenting publicly because the bug is private already)

I think this bug has been analyzed completely by Ray (in customer portal case 00796597, associated with bug 920687 too, comment 1 here), and by Sander Keemink from the customer side.

Basically what happens is: customer runs a script periodically that invokes "sginfo -l". "sginfo" in that mode happens to open /dev/snapshot (snapshot_open() in "kernel/power/user.c"), the O_RDONLY mode calls pm_notifier_call_chain(PM_HIBERNATION_PREPARE).

In "kernel/cpu.c" that notifier sets "cpu_hotplug_disabled":

cpu_hotplug_pm_callback()
  cpu_hotplug_disable_before_freeze()
    cpu_hotplug_disabled = 1

From the other side, massaging the CPU "online" files in sysfs enters store_online() in "drivers/base/cpu.c"

store_online()
  cpu_up() [kernel/cpu.c]
    if cpu_hotplug_disabled then return -EBUSY

If the qemu guest agent met this error when hot-(un-)plugging a VCPU, it would propagate the error to libvirtd. I think the entire logic is sane, except that opening a device *maybe* shouldn't immediately emit a significant notification.

The kernel logic is identical in upstream, so this bug should be taken there first... if we (or they) consider it a bug at all. I seem to recall that just opening eg. watchdog devices is not without consequences (*), and I guess the same could be said about opening tape drives or just plain FIFOs in the filesystem (if there's a writer blocked on them, waiting for the first reader to open).

(*) Yup, see "Documentation/watchdog/watchdog-api.txt":

  The simplest API:

  All drivers support the basic mode of operation, where the watchdog
  activates as soon as /dev/watchdog is opened and will reboot unless
  the watchdog is pinged within a certain time [...]

With that in mind I'm closing this BZ as NOTABUG: "sginfo" just shouldn't indiscriminately open device nodes. It could search for names more carefully, and/or call stat() before open() and check the st_rdev field.

Ray, if you disagree with the closure, please feel free to reopen the BZ, but in that case I'll de-assign myself (as the BZ is not virt-related). Thanks.

Comment 9 Laszlo Ersek 2013-04-04 11:23:05 UTC
FWIW the /dev/snapshot behavior is documented in "Documentation/power/userland-swsusp.txt":

  The interface consists of a character device providing the open(),
  release(), read(), and write() operations as well as several ioctl()
  commands defined in include/linux/suspend_ioctls.h .  The major and minor
  numbers of the device are, respectively, 10 and 231, and they can
  be read from /sys/class/misc/snapshot/dev.

  The device can be open either for reading or for writing.  If open for
  reading, it is considered to be in the suspend mode. [...]

My "/sys/class/misc/snapshot/dev" says "10:231", this even gives "sginfo" a hint what to look for in "st_rdev" with makedev()/major()/minor().

Comment 10 J.H.M. Dassen (Ray) 2013-04-15 10:00:17 UTC
Global Support Services has privatised comments pertaining to Red Hat internal workflow considerations and has not identified sensitive data in or attached to this bugzilla. Uncloaking this bugzilla as per Engineering's suggestion to facilitate discussions with upstream (kernel and/or sg3_utils).