Description of problem: HVM guest got disk read-only after local migration Version-Release number of selected component (if applicable): xen-3.0.3-115.el5 kernel-xen-2.6.18-210.el5 How reproducible: 100% Steps to Reproduce: 1. start a linux hvm guest with "xm create RHEL-5.4-64-hvm.conf" 2. run local migration "xm migrate -l ID localhost" 3. after migration done, run step 2 again Actual results: Guest vm got disk read-only error after 2nd migration Expected results: Everything should be fine after migration Additional info: 1. for WinXP hvm guest, guest will hang after first migration 2. this issue happens both with and without pv driver 3. after checking "xenstore-ls" before and after every migration, I found that the vm's name changed strangely: before migration: "/vm/1efb30c3-86fd-9dd7-4934-9b32b6a84432" after 1st migration: "/vm/1efb30c3-86fd-9dd7-4934-9b32b6a84432-1" after 2nd migration: "/vm/1efb30c3-86fd-9dd7-4934-9b32b6a84432" 4. there is error msg in xend's log after 1st migration: ----------- [2010-08-10 06:01:48 xend 6808] DEBUG (DevController:160) Waiting for devices usb. [2010-08-10 06:01:48 xend 6808] DEBUG (DevController:160) Waiting for devices vbd. [2010-08-10 06:01:48 xend 6808] DEBUG (DevController:166) Waiting for 768. [2010-08-10 06:01:48 xend 6808] DEBUG (DevController:538) hotplugStatusCallback /local/domain/0/backend/vbd/26/768/hotplug-status. [2010-08-10 06:01:48 xend 6808] DEBUG (DevController:552) hotplugStatusCallback 5. [2010-08-10 06:01:48 xend 6808] ERROR (XendCheckpoint:356) Device 768 (vbd) could not be connected. File /home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw is loopback-mounted through /dev/loop0, which is mounted in a guest domain, and so cannot be mounted now. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 354, in restore dominfo.waitForDevices() # Wait for backends to set up File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2440, in waitForDevices self.waitForDevices_(c) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1453, in waitForDevices_ return self.getDeviceController(deviceClass).waitForDevices() File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 162, in waitForDevices return map(self.waitForDevice, self.deviceIDs()) File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 196, in waitForDevice raise VmError("Device %s (%s) could not be connected.\n%s" % VmError: Device 768 (vbd) could not be connected. File /home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw is loopback-mounted through /dev/loop0, which is mounted in a guest domain, and so cannot be mounted now. -------------------
Created attachment 437615 [details] logs for rhgz622501 ---logs in /var/log/xen/-------- qemu-dm.16446.log qemu-dm.16857.log qemu-dm.17080.log qemu-dm.17204.log qemu-dm.17628.log qemu-dm.17850.log qemu-dm.17980.log qemu-dm.18380.log xend.log xen-hotplug.log ---xenstore-ls log, before, 1st migration, 2nd migration, with pv driver---- xenstore-ls-ID22-rhel54-before-migrate xenstore-ls-ID23-rhel54-after-migrate-1st xenstore-ls-ID24-rhel54-after-migrate-2nd ---xenstore-ls log, before, 1st migration, 2nd migration, without pv driver---- xenstore-ls-ID25-rhel54-nopv-before-migrate xenstore-ls-ID26-rhel54-nopv-after-migrate-1st xenstore-ls-ID27-rhel54-nopv-after-migrate-2nd ---xenstore-ls log, before, 1st migration,WinXP 32bit with pv driver---- xenstore-ls-ID28-winxp-before-migrate xenstore-ls-ID29-winxp-after-migrate-1st ---xm dmesg --- xm-dmesg.log
-----------winxp config file------------ # Xen configuration generated by xen-autotest vncunused = "1" kernel = "/usr/lib/xen/boot/hvmloader" uuid = "1efb30c3-86fd-9dd7-4934-9b72b6a84422" on_poweroff = "destroy" vif = ['mac=00:21:7F:B7:11:02,script=vif-bridge,bridge=xenbr0,type=netfront'] name = "winXP-32bit" on_reboot = "restart" localtime = "0" builder = "hvm" apic = "1" sdl = "0" device_model = "/usr/lib64/xen/bin/qemu-dm" vcpus = "4" pae = "1" memory = "512" vnclisten = "0.0.0.0" vnc = "1" disk = ['file:/home/ovirt-VMs/WinXP-32-hvm.raw,xvda,w'] acpi = "1" maxmem = "512" soundhw = "es1370" ---------RHEL5.4-64bit config file------------ # Xen configuration generated by xen-autotest vncunused = "1" kernel = "/usr/lib/xen/boot/hvmloader" uuid = "1efb30c3-86fd-9dd7-4934-9b32b6a84432" on_poweroff = "destroy" vif = ['mac=00:21:7F:B7:43:02,script=vif-bridge,bridge=xenbr0'] name = "RHEL5.4-64bit-hv" on_reboot = "restart" localtime = "0" builder = "hvm" apic = "1" sdl = "0" device_model = "/usr/lib64/xen/bin/qemu-dm" vcpus = "2" pae = "1" memory = "1024" vnclisten = "0.0.0.0" vnc = "1" #disk = ['file:/home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw,xvda,w'] disk = ['file:/home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw,hda,w'] acpi = "1" maxmem = "1024" soundhw = "sb16"
This is strange, studying those logs it appears that before the first migration there's VBD drive 51712 for definition in /vm/$UUID and frontend: vbd = "" 51712 = "" frontend = "/local/domain/22/device/vbd/51712" frontend-id = "22" backend-id = "0" backend = "/local/domain/0/backend/vbd/22/51712" ... and also for backend: vbd = "" 22 = "" 51712 = "" domain = "RHEL5.4-64bit-hv" frontend = "/local/domain/22/device/vbd/51712" dev = "xvda" state = "4" params = "/home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw" mode = "w" online = "1" frontend-id = "22" type = "file" node = "/dev/loop0" physical-device = "7:0" hotplug-status = "connected" sectors = "16777216" info = "0" sector-size = "512" so all the paths are valid but after the first migration the /vm/$UUID is missing and there's no trace of vbd in the file. When doing diff between state after first and after second migration there's no change considering the vbd device, i.e. the device is still missing so it appears that after first migration the guest is still having the information to be able to boot (which is strange and it should not be having those information). Also, the Windows HVM guest is having those vbd entries in the xenstore already but the error message is printed here: + 29 = "" 51712 = "" domain = "winXP-32bit" - frontend = "/local/domain/28/device/vbd/51712" + frontend = "/local/domain/29/device/vbd/51712" dev = "xvda" - state = "4" + state = "5" params = "/home/ovirt-VMs/WinXP-32-hvm.raw" mode = "w" - online = "1" - frontend-id = "28" + online = "0" + frontend-id = "29" type = "file" - node = "/dev/loop0" - physical-device = "7:0" - hotplug-status = "connected" - sectors = "20971520" - info = "0" - sector-size = "512" + hotplug-error = "File /home/ovirt-VMs/WinXP-32-hvm.raw is loopback-mo..." + hotplug-status = "busy" This means that the drive parameters are being removed and state is being changed from connected to closing and online is being set to 0 - that's why there's the issue. So, Pengzhen, the Windows HVM guest with PV drivers will hang and Linux guest is having the drive mounted as read-only and is it seeing it as read-only? Are does it hang/panic on not seeing the disk at all? Thanks, Michal
(In reply to comment #3) > So, Pengzhen, the Windows HVM guest with PV drivers will hang and Linux guest > is having the drive mounted as read-only and is it seeing it as read-only? Are > does it hang/panic on not seeing the disk at all? > > Thanks, > Michal Hi Michal, 1. Yes, winodw HVM guest will hang(after 1st migration) and linux hvm guest will not hang and have the driver mounted read-only(after 2nd migration) 2. Yes, it can see the driver. fdisk shows the driver still online. You can check "guest-vm-dmesg" in the tarball attached.
Created attachment 437852 [details] Patch to fix local migrations Well, I was able to reproduce this and I did the investigation. Finally I found out that this is caused by the message saying "File /home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw is loopback-mounted through /dev/loop0". The issue is that for Linux guests the block device script doesn't write the error into xenstore which makes it working after the first live localhost migration but not after the second one. Windows PV drivers are most probably having a slightly different implementation which causes that it's not working immediately after the first live localhost migration. This is the patch that basically checks for the same path being used and if it's used it's checking for the domain name. If the name matches (since we can't consider UUID because of we change UUID for localhost migration purposes) then we can assume the migration is the localhost one so we can skip the check for sharing of this image file. I did test it using the RHEL-5 x86_64 guest on RHEL-5 x86_64 dom0 and it was returning I/O errors after the second migration when this patch was not applied. With this patch applied I managed to do 10 localhost migrations of the same guest in a row and the disk was always mounted as read-write with no I/O errors. I also tried Windows 2003 x86 guest without PV drivers and it was working fine to do a loop of several live localhost migrations in a row but the very same guest just with the PV drivers failed to migrate to localhost even for the first time so I guess this is the purely Windows drivers issue. Thanks, Michal
Pengzhen, I recommend filing a bug against xenpv-win for this issue too since it was working fine without PV drivers but not with them. Will you file a bug including all the version information for XenPV Windows drivers and relevant information from the testing yourself please? Thanks, Michal
(In reply to comment #5) > Created an attachment (id=437852) [details] > Patch to fix local migrations > > Well, I was able to reproduce this and I did the investigation. Finally I found > out that this is caused by the message saying "File > /home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw is loopback-mounted through > /dev/loop0". The issue is that for Linux guests the block device script doesn't > write the error into xenstore which makes it working after the first live > localhost migration but not after the second one. Windows PV drivers are most > probably having a slightly different implementation which causes that it's not > working immediately after the first live localhost migration. > > This is the patch that basically checks for the same path being used and if > it's used it's checking for the domain name. If the name matches (since we > can't consider UUID because of we change UUID for localhost migration purposes) > then we can assume the migration is the localhost one so we can skip the check > for sharing of this image file. > > I did test it using the RHEL-5 x86_64 guest on RHEL-5 x86_64 dom0 and it was > returning I/O errors after the second migration when this patch was not > applied. With this patch applied I managed to do 10 localhost migrations of the > same guest in a row and the disk was always mounted as read-write with no I/O > errors. > > I also tried Windows 2003 x86 guest without PV drivers and it was working fine > to do a loop of several live localhost migrations in a row but the very same > guest just with the PV drivers failed to migrate to localhost even for the > first time so I guess this is the purely Windows drivers issue. > > Thanks, > Michal Hi Michal, Verified the patch work for linux hvm guest. Local migration succeed with 3 times. However, It still failed for windows hvm guest even for the first local migration. So I do not think it is purely xenpv-win driver issue. Maybe it is due to windows handle block device in a different way to linux. You can use WinXP 32bit guest for test. Local migration for win2003 32bit will PASS even with xenpv-win driver. Regards, Pengzhen
(In reply to comment #7) > (In reply to comment #5) > > Created an attachment (id=437852) [details] [details] > > Patch to fix local migrations > > > > Well, I was able to reproduce this and I did the investigation. Finally I found > > out that this is caused by the message saying "File > > /home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw is loopback-mounted through > > /dev/loop0". The issue is that for Linux guests the block device script doesn't > > write the error into xenstore which makes it working after the first live > > localhost migration but not after the second one. Windows PV drivers are most > > probably having a slightly different implementation which causes that it's not > > working immediately after the first live localhost migration. > > > > This is the patch that basically checks for the same path being used and if > > it's used it's checking for the domain name. If the name matches (since we > > can't consider UUID because of we change UUID for localhost migration purposes) > > then we can assume the migration is the localhost one so we can skip the check > > for sharing of this image file. > > > > I did test it using the RHEL-5 x86_64 guest on RHEL-5 x86_64 dom0 and it was > > returning I/O errors after the second migration when this patch was not > > applied. With this patch applied I managed to do 10 localhost migrations of the > > same guest in a row and the disk was always mounted as read-write with no I/O > > errors. > > > > I also tried Windows 2003 x86 guest without PV drivers and it was working fine > > to do a loop of several live localhost migrations in a row but the very same > > guest just with the PV drivers failed to migrate to localhost even for the > > first time so I guess this is the purely Windows drivers issue. > > > > Thanks, > > Michal > > Hi Michal, > Verified the patch work for linux hvm guest. Local migration succeed with 3 > times. > However, It still failed for windows hvm guest even for the first local > migration. So I do not think it is purely xenpv-win driver issue. Maybe it is > due to windows handle block device in a different way to linux. > You can use WinXP 32bit guest for test. Local migration for win2003 32bit will > PASS even with xenpv-win driver. > > > Regards, > Pengzhen Pengzhen, please read the comment 6. I mentioned this is not working for Windows 2003 with PV drivers but it's working for the guest *without* PV drivers so this is the issue you should file against xenpv-win component which is the component for Windows PV drivers. It can pass for Windows 2003 x86 (32-bit) now since I've found out I may be using some older version of PV drivers and the latest may have it fixes. Nevertheless if the guest that's not working for localhost migration is using PV drivers try to uninstall them and retest. I did try both Windows 2003 x86 and Windows XP x86 without PV drivers and I was unable to reproduce but I did manage to reproduce it when having PV drivers installed. If it's reproducible only when using PV drivers and not otherwise you should file a bug against xenpv-win instead. Regards, Michal
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #5) > > > Created an attachment (id=437852) [details] [details] [details] > > > Patch to fix local migrations > > > > > > Well, I was able to reproduce this and I did the investigation. Finally I found > > > out that this is caused by the message saying "File > > > /home/ovirt-VMs/RHEL-Server-5.4-64-hvm.raw is loopback-mounted through > > > /dev/loop0". The issue is that for Linux guests the block device script doesn't > > > write the error into xenstore which makes it working after the first live > > > localhost migration but not after the second one. Windows PV drivers are most > > > probably having a slightly different implementation which causes that it's not > > > working immediately after the first live localhost migration. > > > > > > This is the patch that basically checks for the same path being used and if > > > it's used it's checking for the domain name. If the name matches (since we > > > can't consider UUID because of we change UUID for localhost migration purposes) > > > then we can assume the migration is the localhost one so we can skip the check > > > for sharing of this image file. > > > > > > I did test it using the RHEL-5 x86_64 guest on RHEL-5 x86_64 dom0 and it was > > > returning I/O errors after the second migration when this patch was not > > > applied. With this patch applied I managed to do 10 localhost migrations of the > > > same guest in a row and the disk was always mounted as read-write with no I/O > > > errors. > > > > > > I also tried Windows 2003 x86 guest without PV drivers and it was working fine > > > to do a loop of several live localhost migrations in a row but the very same > > > guest just with the PV drivers failed to migrate to localhost even for the > > > first time so I guess this is the purely Windows drivers issue. > > > > > > Thanks, > > > Michal > > > > Hi Michal, > > Verified the patch work for linux hvm guest. Local migration succeed with 3 > > times. > > However, It still failed for windows hvm guest even for the first local > > migration. So I do not think it is purely xenpv-win driver issue. Maybe it is > > due to windows handle block device in a different way to linux. > > You can use WinXP 32bit guest for test. Local migration for win2003 32bit will > > PASS even with xenpv-win driver. > > > > > > Regards, > > Pengzhen > > Pengzhen, > please read the comment 6. I mentioned this is not working for Windows 2003 > with PV drivers but it's working for the guest *without* PV drivers so this is > the issue you should file against xenpv-win component which is the component > for Windows PV drivers. It can pass for Windows 2003 x86 (32-bit) now since > I've found out I may be using some older version of PV drivers and the latest > may have it fixes. Nevertheless if the guest that's not working for localhost > migration is using PV drivers try to uninstall them and retest. > > I did try both Windows 2003 x86 and Windows XP x86 without PV drivers and I was > unable to reproduce but I did manage to reproduce it when having PV drivers > installed. If it's reproducible only when using PV drivers and not otherwise > you should file a bug against xenpv-win instead. > > Regards, > Michal Hi Michal, I have tried with winxp *without* xenpv-win driver and it could migration. However, when you try to click start menu to shutdown the vm or run dir with C:\, the guest hang immediately. And there is chance that the winxp vm will hang after migration without any operation inside vm. I did the above test with your patch. So this is not only xenpv-win issue Regards, Pengzhen
Could you check again with windows hvm guest without pv driver?
Yeah, I did again and it's working fine for Windows XP and 2003 *without* PV drivers. What version of Xen are you using? Are you having this patch applied to the latest virttest packages? I was unable to reproduce the results from comment 9 so I don't know. Michal
Oh, one more thing: This is about migrations and if it happens without any migrations when you run the guest it's pretty strange. Please file a new bugzilla with *exact* steps to reproduce it including the version information (also including information about the guest - Windows XP? 2003? 32-bit? 64-bit? etc.) Thanks, Michal
(In reply to comment #12) > Oh, one more thing: This is about migrations and if it happens without any > migrations when you run the guest it's pretty strange. Please file a new > bugzilla with *exact* steps to reproduce it including the version information > (also including information about the guest - Windows XP? 2003? 32-bit? 64-bit? > etc.) > > Thanks, > Michal Hi Michal, I am using "xen-3.0.3-115_x86_64" and "kernel-xen-2.6.18-210". Guest is "WinXP 32bit". I mean the vm is OK without migration. It will only hang after migration, even without PV driver. Regards, Pengzhen
(In reply to comment #13) > (In reply to comment #12) > > Oh, one more thing: This is about migrations and if it happens without any > > migrations when you run the guest it's pretty strange. Please file a new > > bugzilla with *exact* steps to reproduce it including the version information > > (also including information about the guest - Windows XP? 2003? 32-bit? 64-bit? > > etc.) > > > > Thanks, > > Michal > > Hi Michal, > > I am using "xen-3.0.3-115_x86_64" and "kernel-xen-2.6.18-210". > Guest is "WinXP 32bit". > I mean the vm is OK without migration. It will only hang after migration, even > without PV driver. > > Regards, > Pengzhen Hi Pengzhen, this is the issue then. The -115 version of Xen package doesn't have the fix applied. I've checked mrezanin's virttest package and it's not applied as well so I created my own version of Xen package with this patch applied - before you told me about you use -115 package I thought you're having your own recompiled Xen package with this patch applied but apparently you don't. The version I recompiled is -virttest31 based and it's named 'xen-3.0.3-115.el5virttest31.g7e4798b', the package is located at: http://people.redhat.com/minovotn/xen/ Please test using this version of Xen package, Michal Note: I didn't mean this is not a bug, this surely is (and that's why this is having a patch already and why this is in POST state) but there's also a bug in Windows PV drivers but you need too file a new bug against xenpv-win then.
(In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > > Oh, one more thing: This is about migrations and if it happens without any > > > migrations when you run the guest it's pretty strange. Please file a new > > > bugzilla with *exact* steps to reproduce it including the version information > > > (also including information about the guest - Windows XP? 2003? 32-bit? 64-bit? > > > etc.) > > > > > > Thanks, > > > Michal > > > > Hi Michal, > > > > I am using "xen-3.0.3-115_x86_64" and "kernel-xen-2.6.18-210". > > Guest is "WinXP 32bit". > > I mean the vm is OK without migration. It will only hang after migration, even > > without PV driver. > > > > Regards, > > Pengzhen > > Hi Pengzhen, > this is the issue then. The -115 version of Xen package doesn't have the fix > applied. I've checked mrezanin's virttest package and it's not applied as well > so I created my own version of Xen package with this patch applied - before you > told me about you use -115 package I thought you're having your own recompiled > Xen package with this patch applied but apparently you don't. > > The version I recompiled is -virttest31 based and it's named > 'xen-3.0.3-115.el5virttest31.g7e4798b', the package is located at: > > http://people.redhat.com/minovotn/xen/ > > Please test using this version of Xen package, > Michal > > Note: I didn't mean this is not a bug, this surely is (and that's why this is > having a patch already and why this is in POST state) but there's also a bug in > Windows PV drivers but you need too file a new bug against xenpv-win then. Hi Michal, I was using xen-115 but I patched "/etc/xen/scripts/block" manually with your patch. And I have tried with "xen-3.0.3-115.el5virttest31.g7e4798b" just now, it is still not work for windows hvm guest. Can you have a look with my server? Regards, Pengzhen
Well, Pengzhen, we've been investigating this and we've found out both local and remote migration is working fine on Intel but neither local nor remote migration was not working on AMD. I don't know whether it can be relevant but to hypervisor/kernel but I'm seeing following messages on your AMD machine: Aug 12 20:57:16 amd-B95-8-1 kernel: Warning Timer ISR/1: Time went backwards: delta=-11000403 delta_cpu=636999597 shadow=18650886977204 off=210022848 processed=18651108000000 cpu_processed=18650460000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 0: 18651084000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 1: 18650460000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 2: 18651032000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 3: 18651104000000 Aug 12 20:57:16 amd-B95-8-1 kernel: Warning Timer ISR/0: Time went backwards: delta=-10955058 delta_cpu=13044942 shadow=18650696978825 off=400067434 processed=18651108000000 cpu_processed=18651084000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 0: 18651084000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 1: 18651092000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 2: 18651032000000 Aug 12 20:57:16 amd-B95-8-1 kernel: 3: 18651104000000 I can see no error in xend.log but in `xm dmesg` output I've discovered following messages at the end: (XEN) traps.c:1877:d0 Domain attempted WRMSR 00000000c001001f from 00582000:00000008 to 00586000:00000008. (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count 116770000, period 1167700000ns, irq=253 (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count 116770000, period 1167700000ns, irq=253 (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count 2562350000, period 4148663520ns, irq=253 (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count 2562350000, period 4148663520ns, irq=253 (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count 3403050000, period 3965728928ns, irq=253 So I guess this is something hypervisor related since according to the testing it's always working on Intel but never on AMD. Regards, Michal
(In reply to comment #16) > So I guess this is something hypervisor related since according to the testing > it's always working on Intel but never on AMD. > Could be. This looks like the biggest clue to me (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. I hear we can't even save+restore though, so this is unrelated to the local migration problem and needs its own bug. In that bug we need to figure out if it's machine dependant, processor dependant, guest dependent, etc. For this bug, QA should avoid trying to test on machines where they can't even save and restore.
(In reply to comment #16) > Well, Pengzhen, we've been investigating this and we've found out both local > and remote migration is working fine on Intel but neither local nor remote > migration was not working on AMD. > > I don't know whether it can be relevant but to hypervisor/kernel but I'm seeing > following messages on your AMD machine: > > Aug 12 20:57:16 amd-B95-8-1 kernel: Warning Timer ISR/1: Time went backwards: > delta=-11000403 delta_cpu=636999597 shadow=18650886977204 off=210022848 > processed=18651108000000 cpu_processed=18650460000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 0: 18651084000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 1: 18650460000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 2: 18651032000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 3: 18651104000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: Warning Timer ISR/0: Time went backwards: > delta=-10955058 delta_cpu=13044942 shadow=18650696978825 off=400067434 > processed=18651108000000 cpu_processed=18651084000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 0: 18651084000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 1: 18651092000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 2: 18651032000000 > Aug 12 20:57:16 amd-B95-8-1 kernel: 3: 18651104000000 > > I can see no error in xend.log but in `xm dmesg` output I've discovered > following messages at the end: > > (XEN) traps.c:1877:d0 Domain attempted WRMSR 00000000c001001f from > 00582000:00000008 to 00586000:00000008. > (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. > (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count > 116770000, period 1167700000ns, irq=253 > (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. > (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count > 116770000, period 1167700000ns, irq=253 > (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. > (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count > 2562350000, period 4148663520ns, irq=253 > (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. > (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count > 2562350000, period 4148663520ns, irq=253 > (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. > (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count > 3403050000, period 3965728928ns, irq=253 > > So I guess this is something hypervisor related since according to the testing > it's always working on Intel but never on AMD. > > Regards, > Michal Maybe. I had cloned a bug for this error msg. It is regression of rhbz437252. https://bugzilla.redhat.com/show_bug.cgi?id=617043 https://bugzilla.redhat.com/show_bug.cgi?id=437252 Regards, Pengzhen
(In reply to comment #17) > (In reply to comment #16) > > So I guess this is something hypervisor related since according to the testing > > it's always working on Intel but never on AMD. > > > > Could be. This looks like the biggest clue to me > > (XEN) save.c:174:d0 HVM restore: Xen changeset was not saved. > > I hear we can't even save+restore though, so this is unrelated to the local > migration problem and needs its own bug. In that bug we need to figure out if > it's machine dependant, processor dependant, guest dependent, etc. > > For this bug, QA should avoid trying to test on machines where they can't even > save and restore. Hi Andrew, Yes, the fix is actually working for both Win/Linux hvm guest on Intel machine and this issue should be considered fixed. Then there should be a separate bug for the migration and save/restore issue on AMD machine, what do you think? Regards, Pengzhen
(In reply to comment #19) > Then there should be a separate bug for the migration and save/restore issue on > AMD machine, what do you think? Agreed. Although, I also think we should try to hunt down an AMD machine that doesn't have the "time went backwards" issues in order to do a clean test, i.e. determine whether we're looking at a machine dependant problem or processor dependant problem here. Drew
New BZ opened already and is here bug 623729.
*** Bug 608964 has been marked as a duplicate of this bug. ***
(In reply to comment #20) > (In reply to comment #19) > > Then there should be a separate bug for the migration and save/restore issue on > > AMD machine, what do you think? > > Agreed. Although, I also think we should try to hunt down an AMD machine that > doesn't have the "time went backwards" issues in order to do a clean test, i.e. > determine whether we're looking at a machine dependant problem or processor > dependant problem here. > > Drew Oh, this was not on colossus just for clarification. It was some other AMD machine and I was having access to 2 machines (for remote migration testing) and I was able to see this on both of the machines. I don't know the CPUs now but what I know for sure is that one machine was having Phenome B2 processor. Michal
Created attachment 438920 [details] Patch to fix local migrations v2 This is the patch for BZ 622501 that basically checks for the local migrations in progress. If two guests are trying to use the image file it's checking if the name of the guests matches (since we can't consider UUID because of we change UUID for localhost migration purposes) and then if the name is the same we can assume the migration is the localhost one so we can skip the check for sharing of this image file. Differences between version 1 and version 2 (this one): - Fixed bugs in the comparison signs and different approach chosen - Tested with multiple guests not to disable the check entirely - Few optimalizations of xenstore-read calls Michal
Created attachment 439115 [details] Patch to fix local migrations v3 This is the patch for BZ 622501 that basically checks for the local migrations in progress. If two guests are trying to use the image file it's checking if the name of the guests matches (since we can't consider UUID because of we change UUID for localhost migration purposes) and then if the name is the same we can assume the migration is the localhost one so we can skip the check for sharing of this image file. Differences between version 1 and version 2: - Fixed bugs in the comparison signs and different approach chosen - Tested with multiple guests not to disable the check entirely - Few optimalizations of xenstore-read calls Differences between version 2 and version 3 (this one): - The check loop (new one) has been merged into the previous loop to avoid using two almost identical loops Michal
Created attachment 462020 [details] Patch to fix local migration v4 Patch for new codebase with vbd backports implemented. Michal
Created attachment 472037 [details] Patch v5 New version of the patch where local migration check has been moved to the check_sharing function. Michal
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
Irreproducible on -124.el5 version of Xen package so closing as CURRENTRELEASE. Michal
It's fixed by the patch of bug 679280.