Bug 1484130 - [cifs] F_OFD_GETLK implemented wrong with CIFS protocol version 2.0+ (causes 'Failed to get "write" lock' error when trying to run qemu with disk image file on a CIFS share)
Summary: [cifs] F_OFD_GETLK implemented wrong with CIFS protocol version 2.0+ (causes ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-22 18:50 UTC by Adam Williamson
Modified: 2019-11-13 17:46 UTC (History)
42 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-17 20:11:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace (141.90 KB, text/plain)
2017-08-23 06:36 UTC, Adam Williamson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 200273 0 None None None 2019-05-14 11:10:01 UTC

Description Adam Williamson 2017-08-22 18:50:18 UTC
My regular 'test' VM that I use for all sorts of testing via virt-manager has its disk image file on a CIFS share. With qemu 2.10 in Fedora 27, this VM refuses to start up, with this error:

Error starting domain: internal error: qemu unexpectedly closed the monitor: 2017-08-22T18:41:52.457937Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/4 (label charserial0)
2017-08-22T18:41:52.499023Z qemu-system-x86_64: -drive file=/share/data/isos/vms/desktop_test_1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0: Failed to get "write" lock
Is another process using the image?

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 89, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 125, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 82, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1489, in startup
    self._backend.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1062, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: 2017-08-22T18:41:52.457937Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/4 (label charserial0)
2017-08-22T18:41:52.499023Z qemu-system-x86_64: -drive file=/share/data/isos/vms/desktop_test_1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0: Failed to get "write" lock
Is another process using the image?

(The CIFS share is mounted at /share/data). If I've read it correctly, this is new code in 2.10, introduced in commit 244a5668106297378391b768e7288eb157616f64 . I suspect perhaps this code is broken when dealing with a disk image file on a CIFS share?

Note, this isn't SELinux, at least I don't think so; I've tried with it set to permissive, and there are no obvious related denials logged in the journal.

Comment 1 Richard W.M. Jones 2017-08-22 19:47:21 UTC
Hmm, this is caused by a new feature in qemu (unfortunately a backwards
compat break) where qemu now attempts to take a lock on all disk images
unless you add extra flags to tell it not to.  However it shouldn't
break in this case.  Adding Fam who wrote this.

See also bug 1378241.

Comment 2 Fam Zheng 2017-08-23 02:29:32 UTC
What is the output of "lslocks"? Is there an existing lock applied on the image when you try to boot it?

Could you also collect the output of

  echo q | strace -f $QEMU /share/data/isos/vms/desktop_test_1.qcow2 -monitor stdio

?

Comment 3 Adam Williamson 2017-08-23 05:44:11 UTC
"What is the output of "lslocks"? Is there an existing lock applied on the image when you try to boot it?"

[root@adam qemu (f27 %)]# lslocks | grep qcow
[root@adam qemu (f27 %)]# 

...doesn't look like it.

[root@adam qemu (f27 %)]# echo q | strace -f $QEMU /share/data/isos/vms/desktop_test_1.qcow2 -monitor stdio
execve("/share/data/isos/vms/desktop_test_1.qcow2", ["/share/data/isos/vms/desktop_tes"..., "-monitor", "stdio"], 0x7ffd2b8ba8f8 /* 45 vars */) = -1 EACCES (Permission denied)
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 6), ...}) = 0
write(2, "strace: exec: Permission denied\n", 32strace: exec: Permission denied
) = 32
getpid()                                = 25658
exit_group(1)                           = ?
+++ exited with 1 +++
[root@adam qemu (f27 %)]#

Comment 4 Fam Zheng 2017-08-23 06:29:21 UTC
Thanks for the update, please test the strace command again with $QEMU replaced by the actual qemu binary path in your environment.

Comment 5 Adam Williamson 2017-08-23 06:36:36 UTC
Created attachment 1316954 [details]
strace

Comment 6 Fam Zheng 2017-08-23 09:06:27 UTC
In the strace there is this line (where the lock error comes from):

[pid 26778] fcntl(10, F_OFD_GETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=201, l_len=1, l_pid=18446744073709551615}) = 0

which means the image is locked somewhere. But as you said, no program is reported by "lslocks", and in fact a few line above in the strace log there is:

[pid 26778] fcntl(10, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=201, l_len=1}) = 0

which means QEMU has successfully acquired a shared lock.

I cannot reproduce with an image in a CIFS mount point on Fedora 26. I'll try to set up a fedora rawhide VM to reproduce this.

Meanwhile, I have a few more questions that may help:

1) do local images work?
2) what is the host kernel version?
3) what is the last known good setup (host version and qemu version)?

Comment 7 Fam Zheng 2017-08-23 09:50:31 UTC
Now I can reproduce the same symptom in a Fedora Rawhide VM, and it looks like a kernel issue.

When I downgrade to a Fedora 26 release kernel (kernel-4.11.8-300.fc26), the same command works:

echo q | qemu-system-x86_64 /mnt/test.qcow2 -nographic -monitor stdio -nodefaults

Bad output:

qemu-system-x86_64: -nodefaults: Failed to get "write" lock
Is another process using the image?

Good output:

QEMU 2.9.93 monitor - type 'help' for more information
(qemu) q

I'm moving the component to kernel so cifs experts can take a look.

Meanwhile I'll write a small test program and attach here.

Comment 8 Fam Zheng 2017-08-23 10:09:37 UTC
Here is the simplified reproducer:

[root@localhost ~]# cat test-ofd-lock.c 
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
    int ret;
    int fd;
    struct flock fl = {
        .l_whence = SEEK_SET,
        .l_start  = 0,
        .l_len    = 0,
        .l_type   = F_RDLCK,
    };
    if (argc < 2) {
            fprintf(stderr, "Usage: %s <file>\n", argv[0]);
            return 1;
    }
    fd = open(argv[1], O_RDWR);
    if (fd < 0) {
            perror("open");
            return errno;
    }
    ret = fcntl(fd, F_OFD_SETLK, &fl);
    if (ret) {
            perror("setlk");
            return errno;
    }
    fl.l_type = F_WRLCK;
    ret = fcntl(fd, F_OFD_GETLK, &fl);
    if (ret) {
            perror("getlk");
            return errno;
    }
    if (fl.l_type != F_UNLCK) {
            fprintf(stderr, "get lock test failed\n");
            return 1;
    }
    return 0;
}
[root@localhost ~]# make test-ofd-lock
cc     test-ofd-lock.c   -o test-ofd-lock
[root@localhost ~]# touch /tmp/test && ./test-ofd-lock /tmp/test
[root@localhost ~]# echo $?
0
[root@localhost ~]# touch /mnt/test && ./test-ofd-lock /mnt/test
get lock test failed
[root@localhost ~]# mount | grep /mnt
//192.168.31.1/tddownload on /mnt type cifs (rw,relatime,vers=3.0,cache=strict,username=admin,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.31.1,file_mode=0755,dir_mode=0755,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1,user=admin)


Note that with Fedora 26 kernel, the error of /mnt/test doesn't happen.

The bad and good kernels I used are:

[root@localhost ~]# rpm -q kernel
kernel-4.13.0-0.rc6.git0.1.fc28.x86_64
kernel-4.11.8-300.fc26.x86_64

Comment 9 Adam Williamson 2017-08-23 14:59:01 UTC
Thanks for the investigation! I'll poke the kernel folks.

Comment 10 Adam Williamson 2017-08-23 17:53:16 UTC
Fam: could you possibly try and do a bisection to identify the commit responsible for this? https://01.org/linuxgraphics/gfx-docs/drm/admin-guide/bug-bisect.html has some basic instructions on this. The kernel folks suggest testing a 4.12 kernel from F26 updates to narrow down the target range for the bisection; in fact you can try various kernel builds from https://koji.fedoraproject.org/koji/packageinfo?packageID=8 too. You may be able to narrow it down to something like 'kernel-4.13.0-0.rc4.git0.1.fc27 fails but kernel-4.13.0-0.rc3.git4.1.fc27 works', or something like that, which would certainly help.

I could do this too, of course, but I've got rather a lot of F27 work on ATM :( If you're busy too, I understand, and I'll try to get to it when I can, using your reproducer.

Comment 11 Fam Zheng 2017-08-24 08:40:18 UTC
OK, I don't have a fedora tree, so I am bisecting the upstream kernel between current master and v4.11.

Comment 12 Chuck Ebbert 2017-08-24 09:31:21 UTC
Commit eef914a9eb5eb83e60eb498315a491cd1edc13a1 upstream (in 4.13-rc1) changed the default SMB protocol from 1.0 to 3.0. I can't even get an SMB share from a Centos 7 server mounted with that default. Adding "-o vers=1.0" makes everything work -- the share gets mounted and Fam's test program returns 0.

Comment 13 Chuck Ebbert 2017-08-24 10:05:33 UTC
With rawhide kernel 4.13-rc6 on f26 and using default mount options, trying to mount a share from a Centos 7.3 server, I get this error:

  CIFS VFS: ioctl error in smb2_get_dfs_refer rc=-2

Kernel 4.12.5 does not have this problem (and Fam's test program works as well).

Comment 14 Fam Zheng 2017-08-24 10:49:39 UTC
In deed, my bisection also shows this is the first bad commit [1]. Reverting that on top of current master fixes the issue for me.

FYI I can mount an SMB share from Fedora 26 server. Haven't tried Centos 7 server.

Due to how wide the impact of this change is, I will leave the deeper investigation to cifs folks.

[1]: bad commit

commit eef914a9eb5eb83e60eb498315a491cd1edc13a1
Author: Steve French <smfrench>
Date:   Sat Jul 8 17:30:41 2017 -0500

    [SMB3] Improve security, move default dialect to SMB3 from old CIFS
    
    Due to recent publicity about security vulnerabilities in the
    much older CIFS dialect, move the default dialect to the
    widely accepted (and quite secure) SMB3.0 dialect from the
    old default of the CIFS dialect.
    
    We do not want to be encouraging use of less secure dialects,
    and both Microsoft and CERT now strongly recommend not using the
    older CIFS dialect (SMB Security Best Practices
    "recommends disabling SMBv1").
    
    SMB3 is both secure and widely available: in Windows 8 and later,
    Samba and Macs.
    
    Users can still choose to explicitly mount with the less secure
    dialect (for old servers) by choosing "vers=1.0" on the cifs
    mount
    
    Signed-off-by: Steve French <smfrench>
    Reviewed-by: Pavel Shilovsky <pshilov>

Comment 15 Laura Abbott 2017-08-24 14:18:54 UTC
Thanks for the bisect. Did this get reported to upstream anywhere?

Comment 16 Laura Abbott 2017-08-24 14:28:05 UTC
Never mind, the coffee hasn't quite kicked in and I didn't think about what this actually means. Yes, if someone who has deeper knowledge about CIFS can explain what changed that would be useful for deciding how to proceed.

Comment 17 Adam Williamson 2017-08-24 15:28:04 UTC
Oh duh, I should have mentioned that; I actually ran into this change in another context already:

https://bugzilla.redhat.com/show_bug.cgi?id=1474539

of course, we don't really have the full answer yet, right? Is there any *reason* this locking should work with the old dialect but not the new one?

Comment 18 Adam Williamson 2017-08-24 15:31:57 UTC
Oh, and, uh...here's a thing: I'm actually explicitly mounting the share in question using 'vers=2.0' because SMB3 doesn't actually work with my server.

If I remount the share using 'vers=1.0' instead, I get a different error. The error doesn't appear *immediately* on trying to launch the VM as it does with 2.0, it appears after some time, and it's:

Error starting domain: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainCreate)

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 89, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 125, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 82, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1489, in startup
    self._backend.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1062, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainCreate)

Comment 19 mlaverdiere 2017-10-21 12:52:36 UTC
I'm Fedora 27 beta and I thought at first that I was experiencing 2 different and unrelated bugs: #1 Not able to mount my CIFS shares anymore; #2 Boxes (gnome-boxes) not able to run VM anymore.

It turned out that solving bug #1 (by adding vers=1.0 to my fstab cifs config lines) also solved bug #2.

As already mentionned, see this for bug #1:  Bug 1474539 - Default CIFS protocol changed from SMB 1.0 to SMB 3.0 in kernel 4.13, breaks mounts from some servers

Comment 20 Andrew Roberts 2017-11-18 05:23:43 UTC
This is now being encountered in the wild by Fedora 27 release users. I've had to copy virtual machines to local disk in order to continue to run.

Under Fedora 26 I was already forcing version 3 of SMB using vers=3.0 on the mount line, and it was working fine then.

This is clearly a regression from Fedora 26. The issue was noted in August, its now three months later, and it is a security related bug affecting the current Fedora kernel.

Has this bug been forgotten?

Comment 21 Adam Williamson 2017-11-22 01:02:27 UTC
yeah, this is still making life rather awkward for me :/ richard, fam, where are we with this?

Comment 22 Fam Zheng 2017-11-22 13:20:56 UTC
No idea, the bug is pending on a CIFS developer to address.

Comment 23 Adam Williamson 2017-11-22 21:58:29 UTC
Have we taken some steps to contact a CIFS developer, or are we just waiting around for one to show up by chance? :)

Comment 24 Yaniv Kaul 2017-11-23 14:27:33 UTC
I'm seeing the same issue, only without CIFS. My backing file is on /dev/shm/... and the overlay (which virt-sysprep via libguestfs tries to create) is on /tmp.

Comment 25 Dr. David Alan Gilbert 2017-11-23 14:37:31 UTC
(In reply to Yaniv Kaul from comment #24)
> I'm seeing the same issue, only without CIFS. My backing file is on
> /dev/shm/... and the overlay (which virt-sysprep via libguestfs tries to
> create) is on /tmp.

Please take that as a separate bz

Comment 26 bialekr 2017-11-24 13:32:57 UTC
After upgrade to Fedora 27 I face a similar issue trying to start two VMs on the same host using shared devices (though, I'm not using CIFS) . It worked perfectly in older Fedora releases. 

Any workarounds?

sudo virsh start 02_cldb01-GI-12102
error: Failed to start domain 02_cldb01-GI-12102
error: internal error: process exited while connecting to monitor: 2017-11-24T13:25:30.984133Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/5 (label charserial0)
2017-11-24T13:25:31.057312Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2: Failed to get "write" lock
Is another process using the image?

kernel 4.13.13-300.fc27.x86_64
qemu-system-x86-2.10.1-1.fc27.x86_64

Thank you,
Robert

Comment 27 Dr. David Alan Gilbert 2017-11-24 14:49:42 UTC
(In reply to bialekr from comment #26)
> After upgrade to Fedora 27 I face a similar issue trying to start two VMs on
> the same host using shared devices (though, I'm not using CIFS) . It worked
> perfectly in older Fedora releases. 
> 
> Any workarounds?
> 
> sudo virsh start 02_cldb01-GI-12102
> error: Failed to start domain 02_cldb01-GI-12102
> error: internal error: process exited while connecting to monitor:
> 2017-11-24T13:25:30.984133Z qemu-system-x86_64: -chardev pty,id=charserial0:
> char device redirected to /dev/pts/5 (label charserial0)
> 2017-11-24T13:25:31.057312Z qemu-system-x86_64: -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,
> id=virtio-disk2: Failed to get "write" lock
> Is another process using the image?
> 
> kernel 4.13.13-300.fc27.x86_64
> qemu-system-x86-2.10.1-1.fc27.x86_64
> 
> Thank you,
> Robert

Yes, that's expected if you really are sharing devices; what exactly are you sharing? Is it raw or qcow?  If it's raw you can specify allowing sharing I think.
(Not sure of the syntax)

Comment 28 Richard W.M. Jones 2017-11-24 14:58:31 UTC
(In reply to bialekr from comment #26)
> After upgrade to Fedora 27 I face a similar issue trying to start two VMs on
> the same host using shared devices (though, I'm not using CIFS) .
                                              ^^^^^^^^^^^^^^^^^^^
So it's not this bug.  Please see bug 1378242.

Comment 29 bialekr 2017-11-24 14:59:46 UTC
(In reply to Dr. David Alan Gilbert from comment #27)
> (In reply to bialekr from comment #26)
> > After upgrade to Fedora 27 I face a similar issue trying to start two VMs on
> > the same host using shared devices (though, I'm not using CIFS) . It worked
> > perfectly in older Fedora releases. 
> > 
> > Any workarounds?
> > 
> > sudo virsh start 02_cldb01-GI-12102
> > error: Failed to start domain 02_cldb01-GI-12102
> > error: internal error: process exited while connecting to monitor:
> > 2017-11-24T13:25:30.984133Z qemu-system-x86_64: -chardev pty,id=charserial0:
> > char device redirected to /dev/pts/5 (label charserial0)
> > 2017-11-24T13:25:31.057312Z qemu-system-x86_64: -device
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,
> > id=virtio-disk2: Failed to get "write" lock
> > Is another process using the image?
> > 
> > kernel 4.13.13-300.fc27.x86_64
> > qemu-system-x86-2.10.1-1.fc27.x86_64
> > 
> > Thank you,
> > Robert
> 
> Yes, that's expected if you really are sharing devices; what exactly are you
> sharing? Is it raw or qcow?  If it's raw you can specify allowing sharing I
> think.
> (Not sure of the syntax)


I share raw files between two vms (test installation for Oracle Real Application Clusters). Example:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/work1/vms/cldb01/asm_u01_00.img'/>
      <target dev='vdf' bus='virtio'/>
      <shareable/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/>
    </disk>

As I already mentioned, this configuration worked perfectly in Fedora 26/25. After upgrade to Fedora 27 trying to start the second vm fails with "Failed to get "write" lock".

Comment 30 Adam Williamson 2017-11-24 17:42:00 UTC
The locking itself is what's new in F27. But your case of having an issue with it is different from ours, as Richard said.

Comment 31 Laura Abbott 2018-02-20 20:00:19 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  As kernel maintainers, we try to keep up with bugzilla but due the rate at which the upstream kernel project moves, bugs may be fixed without any indication to us. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.
 
Fedora 27 has now been rebased to 4.15.3-300.f27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 32 Andrew Roberts 2018-02-21 04:35:17 UTC
This issue is still there with the latest kernel:

uname -a
Linux ryzen 4.15.3-300.fc27.x86_64 #1 SMP Tue Feb 13 17:02:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Trying to run qemu with a virtual disk on a SMB share:

qemu-system-i386: Failed to get "write" lock
Is another process using the image?
qemu-system-i386: Initialization of device isa-fdc failed: Device initialization failed.

qemu was run as follows:

/usr/bin/qemu-system-i386 \
   -drive file=a.hd,if=floppy,format=raw,index=0 \
   -drive file=c.hd,if=ide,format=raw,index=2 \
   -boot order=ca \
   -m 128M 

I have write permission to the files, and filesystem is mounted read/write,
and more importantly this worked fine under FC26

ls -l *.hd
-rwxr-xr-x. 1 aroberts aroberts   1474560 Aug 26 19:10 a.hd
-rwxr-xr-x. 1 aroberts aroberts 104857600 Aug 27 03:59 c.hd

mount | grep share
//192.168.1.30/share on /mnt/share type cifs (rw,relatime,vers=3.0,cache=strict,username=aroberts,domain=WORKGROUP,uid=1000,forceuid,gid=1000,forcegid,addr=192.168.1.30,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1,x-systemd.automount)

Why are you just mass posting against all the kernel bugs, asking if the rebase fixed it, instead of actually testing things?

I have filed at least three bugs against FC27, none of which have been addressed at all. Two kernel bugs, one in Thunderbird. Both the kernel and thunderbird have been updated multiple times since then, but the bugs persist. This does not encourage people to report issues if they are ignored.

Comment 33 Adam Williamson 2018-03-25 15:24:30 UTC
Yes, I'm still hitting this, and it's *still* very annoying.

Did we find a CIFS/SMB developer to run it by yet?

Comment 34 Adam Williamson 2018-03-27 17:31:24 UTC
Filed upstream, since this seemed to be going nowhere:

https://bugs.launchpad.net/qemu/+bug/1759337

Comment 35 Adam Williamson 2018-03-27 17:32:33 UTC
Andrew: btw - "Why are you just mass posting against all the kernel bugs, asking if the rebase fixed it, instead of actually testing things?"

Because there are hundreds or thousands of kernel bugs, and we have two kernel maintainers. They can't possibly test all of those bugs. (Also, most of them require specific configurations or even particular hardware to test, which they likely don't have).

Comment 36 Adam Williamson 2018-03-27 17:33:57 UTC
Setting back to qemu as this isn't really a kernel bug. The kernel changing the default protocol version was an intentional and appropriate change, qemu is going to have to learn to work with the newer protocol version.

Comment 37 Fam Zheng 2018-03-27 18:55:54 UTC
It is a kernel bug. The code snippet in comment 8 shows clearly that the kernel is doing the wrong thing, which cannot be fixed/worked around by QEMU.

In man 2 fcntl:

       F_OFD_GETLK (struct flock *)
              On input to this call, lock describes an open file description lock we would like to place on the file.  If the lock could  be  placed,  fcntl()  does  not
              actually  place  it,  but  returns F_UNLCK in the l_type field of lock and leaves the other fields of the structure unchanged.  If one or more incompatible
              locks would prevent this lock being placed, then details about one of these locks are returned via lock, as described above for F_GETLK.

which is not the case with the new CIFS behaviour.

Comment 38 Adam Williamson 2018-03-27 19:13:58 UTC
Ah OK, sorry, I didn't realize that part: I thought we'd just assigned it to kernel based on the bisect.

Laura, Justin, any ideas who we could talk to about this?

Comment 39 Laura Abbott 2018-03-27 19:15:37 UTC
Let's try the suggested mailing list at https://fedoraproject.org/wiki/KernelBugTriage

Comment 40 Adam Williamson 2018-06-25 16:51:59 UTC
When you say "let's", do you mean you did it, or I ought to do it?

Comment 41 Adam Williamson 2018-06-25 17:03:52 UTC
bumping release to keep this alive, but note it still affects f27.

Comment 42 Laura Abbott 2018-06-25 17:35:38 UTC
nfs-maint is cc'd on the bug. I do not have high confidence that filing a bugzilla will do anything.

Comment 43 Adam Williamson 2018-06-25 17:38:42 UTC
"nfs-maint is cc'd on the bug"

only as of about 15 minutes ago, when I added it. :)

"I do not have high confidence that filing a bugzilla will do anything."

it can't hurt.

Comment 44 J. Bruce Fields 2018-06-25 18:39:12 UTC
Upstream CIFS bugs should go to Steve French <sfrench> and linux-cifs.org.  Possibly also cc: Pavel Shilovsky <pshilov> and jlayton (who worked on OFD code).

Comment 45 Adam Williamson 2018-06-25 19:51:52 UTC
Could you (or I guess Fam or Laura) possibly send it to them? It'd probably look better coming from someone they know than from some random Fedora monkey...

Comment 46 Laura Abbott 2018-06-26 00:02:16 UTC
e-mail sent. We'll see who responds.

Comment 47 Justin M. Forbes 2018-07-23 15:07:03 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.

Fedora 28 has now been rebased to 4.17.7-200.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 48 Andrew Roberts 2018-07-23 15:27:07 UTC
uname -a
Linux ryzen 4.17.7-200.fc28.x86_64 #1 SMP Tue Jul 17 16:28:31 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The issue is still there with the lastest kernel.
qemu-system-i386: Initialization of device isa-fdc failed: Failed to get "write" lock

Comment 49 Adam Williamson 2018-10-03 20:04:53 UTC
It seems there have been three different attempts to fix this upstream, but each failed review. First, from Ronnie:

https://www.spinics.net/lists/linux-cifs/msg14745.html
https://www.spinics.net/lists/linux-cifs/msg14746.html

Second, from Paulo Alcantara:

https://www.spinics.net/lists/linux-cifs/msg14794.html
https://www.spinics.net/lists/linux-cifs/msg14795.html

Third, also from Paulo:

https://www.spinics.net/lists/linux-cifs/msg14918.html

That second attempt by Paulo got a review by Pavel Shilovsky:

https://www.spinics.net/lists/linux-cifs/msg14919.html

which Paulo thanked him for, but never seems to have followed up on otherwise. So, this seems to be stuck again.

I will ping those three folks and ask if we can get anywhere further with this.

Comment 50 Ronnie Sahlberg 2018-10-03 23:25:31 UTC
I have sent an updated patch to linux-cifs. Lets see how it goes.

Comment 51 Adam Williamson 2018-10-04 19:32:23 UTC
Ronnie: thanks very much!

Comment 52 Ronnie Sahlberg 2018-11-05 23:19:19 UTC
This has been merged now to upstream. So closing this issue for now.

Comment 53 Adam Williamson 2018-11-06 00:00:54 UTC
That's great, though the bug is filed against F28, note: it'd be nice to get the fix in F28 and F29 if we can (of course it'll filter down from upstream eventually). Thanks a lot for the help in getting this done!

Comment 54 Adam Williamson 2018-11-09 18:25:20 UTC
Bad news, I'm afraid: I just tried with 4.20.0-0.rc1.git1.2.fc30.x86_64 , which I'm fairly sure should have the patch:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/cifs?id=9645759ce6b39013231f4fa312834935c93fe5bc

and trying to run a VM with a disk image from a CIFS share still fails:

Error starting domain: internal error: process exited while connecting to monitor: 2018-11-09T18:21:19.411524Z qemu-system-x86_64: -drive file=/share/data/isos/vms/desktop_test_1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0: Failed to get "write" lock
Is another process using the image?

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1400, in startup
    self._backend.create()
  File "/usr/lib64/python3.7/site-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: process exited while connecting to monitor: 2018-11-09T18:21:19.411524Z qemu-system-x86_64: -drive file=/share/data/isos/vms/desktop_test_1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0: Failed to get "write" lock
Is another process using the image?

Comment 55 Justin M. Forbes 2019-01-29 16:26:51 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.

Fedora 28 has now been rebased to 4.20.5-100.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.

If you experience different issues, please open a new bug report for those.

Comment 56 Laura Abbott 2019-04-09 20:46:49 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora XX has now been rebased to 5.0.6  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.
 
If you experience different issues, please open a new bug report for those.

Comment 57 Justin M. Forbes 2019-09-17 20:11:16 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 58 Adam Williamson 2019-11-13 02:19:46 UTC
I'm still having a *lot* of trouble with this. (And for bonus points, if I access the disk images via NFS instead, they seem to keep getting corrupted). I'll maybe re-open it or file a new one when I can get some data on the issue.

Comment 59 Benjamin Coddington 2019-11-13 11:42:09 UTC
(In reply to Adam Williamson from comment #58)
> I'm still having a *lot* of trouble with this. (And for bonus points, if I
> access the disk images via NFS instead, they seem to keep getting
> corrupted). I'll maybe re-open it or file a new one when I can get some data
> on the issue.

Hi Adam, I'm no CIFS expert, but I'd be interested to know the details of the NFS problem, we can work on that in another BZ if necessary.  Sorry for all the trouble, this looks painful and should work.

Comment 60 Cole Robinson 2019-11-13 17:46:18 UTC
(In reply to Adam Williamson from comment #58)
> I'm still having a *lot* of trouble with this. (And for bonus points, if I
> access the disk images via NFS instead, they seem to keep getting
> corrupted). I'll maybe re-open it or file a new one when I can get some data
> on the issue.

qemu had some corruption error/warning issues recently that should be resolved with latest updates-testing,
so the corruption issue may be separate


Note You need to log in before you can comment on or make changes to this bug.