Bug 1335383
| Summary: | cannot resume a vm that went to paused state after killing gluster fuse mount process | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> |
| Component: | core | Assignee: | Ravishankar N <ravishankar> |
| Status: | CLOSED NOTABUG | QA Contact: | Anoop <annair> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.1 | CC: | knarra, pkarampu, rgowdapp, rhinduja, rhs-bugs, sasundar, storage-qa-internal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-25 04:10:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1258386 | ||
|
Description
RamaKasturi
2016-05-12 07:01:27 UTC
vm which i am trying to resume is running on zod.lab.eng.blr.redhat.com and the vm name is BootStrom_windows_vm-6
log snippet from engine logs:
================================
2016-05-12 11:42:04,987 INFO [org.ovirt.engine.core.bll.RunVmCommand] (ajp-/127.0.0.1:8702-3) [296ab749] Lock Acquired to object 'EngineLock:{exclusiveLocks='[319340d7
-690d-42c5-b583-809cfa03e82e=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-05-12 11:42:05,161 INFO [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-2) [296ab749] Running command: RunVmCommand internal: false. Enti
ties affected : ID: 319340d7-690d-42c5-b583-809cfa03e82e Type: VMAction group RUN_VM with role type USER
2016-05-12 11:42:05,168 INFO [org.ovirt.engine.core.vdsbroker.ResumeVDSCommand] (org.ovirt.thread.pool-6-thread-2) [296ab749] START, ResumeVDSCommand( ResumeVDSCommand
Parameters:{runAsync='true', hostId='c7356010-a54c-4848-91c1-6e861dcea129', vmId='319340d7-690d-42c5-b583-809cfa03e82e'}), log id: 2c2a1fea
2016-05-12 11:42:05,170 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ResumeBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-2) [296ab749] START, ResumeBrokerVDSCom
mand(HostName = hosted_engine_3, ResumeVDSCommandParameters:{runAsync='true', hostId='c7356010-a54c-4848-91c1-6e861dcea129', vmId='319340d7-690d-42c5-b583-809cfa03e82e'}), log id: 2d673cd2
2016-05-12 11:42:05,961 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ResumeBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-2) [296ab749] FINISH, ResumeBrokerVDSCommand, log id: 2d673cd2
2016-05-12 11:42:05,961 INFO [org.ovirt.engine.core.vdsbroker.ResumeVDSCommand] (org.ovirt.thread.pool-6-thread-2) [296ab749] FINISH, ResumeVDSCommand, return: PoweringUp, log id: 2c2a1fea
2016-05-12 11:42:05,962 INFO [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-6-thread-2) [296ab749] Lock freed to object 'EngineLock:{exclusiveLocks='[319340d7-690d-42c5-b583-809cfa03e82e=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-05-12 11:42:05,978 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-2) [296ab749] Correlation ID: 296ab749, Job ID: e1d832e3-96ad-48c4-b3a2-bfa7ee4c9624, Call Stack: null, Custom Event ID: -1, Message: VM BootStrom_windows_vm-6 was resumed by admin@internal (Host: hosted_engine_3).
I tried the same test with libvirt + qemu-kvm + glusterfs-fuse, excluding RHEV. Versions glusterfs-3.7.9-4.el7rhgs RHEV 3.6.5 RHEL 7.2 1. fuse mounted the sharded replica 3 gluster volume 2. created the VM Image file 3. Installed VM with RHEL 6.5 and booted the VM 4. When the VM is up and running, killed the gluster mount process ( pkill glusterfs ) Observations are, 1. The VM went in to paused state 2. When the volume is mounted back, the VMs continued in paused state 3. Manually resuming the VM too doesn't work ( # virsh resume vm1 ) 4. Killed the VM, starting it again helped. Logs ( /var/log/libvirt/qemu/vm1.log) -------------------------------------- <snip> 2016-05-12 07:43:42.203+0000: starting up libvirt version: 1.2.17, package: 13.el7_2.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-03-02-11:10:27, x86-034.build.eng.bos.redhat.com), qemu version: 1.5.3 (qemu-kvm-1.5.3-105.el7_2.4) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name vm1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 90d2e762-04d9-4f5e-b001-152d71cce31e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/home/vmstore/vm1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:29:30:8d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-vm1/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on char device redirected to /dev/pts/2 (label charserial0) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) qemu: terminating on signal 15 from pid 12218 </snip> (In reply to SATHEESARAN from comment #3) > I tried the same test with libvirt + qemu-kvm + glusterfs-fuse, excluding > RHEV. > > Versions > glusterfs-3.7.9-4.el7rhgs > RHEV 3.6.5 > RHEL 7.2 > Mistakenly mentioned RHEV version, there is no RHEV in this test Adding the qemu, libvirt versions libvirt-1.2.17-13.el7_2.4.x86_64 qemu-kvm-common-1.5.3-105.el7_2.4.x86_64 qemu-kvm-1.5.3-105.el7_2.4.x86_64 sos reports can be found in the link below: ================================================== http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1335383/ Pranith - can you check if this is related to Bug 1330044? Here too, one of brick processes is killed (In reply to Sahina Bose from comment #6) > Pranith - can you check if this is related to Bug 1330044? Here too, one of > brick processes is killed For the worth of the information - In this case mount process is killed and again started by mounting ( again ) Nope, this is not because of either EIO/EINVAL. It seems to be because of ENOTCONN. Hi Ravi,
Following is what i did to verify the behaviour of a native XFS mount. wrote a small python script as below and ran the command ./godown /mnt/fio_test to crash XFS. python script failed with INPUT/OUTPUT error. Once the remount happens the script does not continue writing to the file.
f = open ('/mnt/fio_test/test.txt', 'a')
x = 1
while True:
f.write("To infinity and beyond! We're getting close, on %d now!" % (x))
x += 1
Thanks
kasturi
Thanks for the confirmation Kasturi. Closing the BZ as this seems to be expected behaviour even on on-disk file systems based on comment #10. |