Bug 589864
Summary: | [RHEL6]: Migration failure with error 'internal error canonical hostname pointed to localhost' | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | dyuan |
Component: | libvirt | Assignee: | Chris Lalancette <clalance> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | bugproxy, clalance, dallan, hbrock, hye, llim, markwiz, mishu, nzhang, rstrode, xen-maint, yoyzhang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0_8_1-8_el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-11-11 14:48:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
dyuan
2010-05-07 07:01:14 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. I talked to clalance about this today, and the conclusion is this is actually a libvirt bug. reassigning If /etc/hosts contains the hostname on any other line than one including 127.0.0.1, NetworkManager will not add the hostname to a 127.0.0.1 line. But your hostname *must* map to something (either in /etc/hosts or via DNS), otherwise quite a few other things like X and ssh will break... *** Bug 580827 has been marked as a duplicate of this bug. *** Yuan Dan, After discussing this problem with the NetworkManager people and other userspace people, it seems that this is actually a bug in libvirt. I've come up with a patch which should help the situation; I've uploaded it to: http://people.redhat.com/clalance/bz589864 Can you test this package out on a machine which still has the migration problem, and see if it fixes it for you? Thanks, Chris Lalancette The same error as Bug 580827 after installed the following 5 packages on both machines in http://people.redhat.com/clalance/bz589864/ libvirt-0.8.1-3bz591661.el6.x86_64.rpm libvirt-devel-0.8.1-3bz591661.el6.x86_64.rpm libvirt-client-0.8.1-3bz591661.el6.x86_64.rpm libvirt-python-0.8.1-3bz591661.el6.x86_64.rpm libvirt-debuginfo-0.8.1-3bz591661.el6.x86_64.rpm # rpm -qa|grep libvirt libvirt-python-0.8.1-3bz591661.el6.x86_64 libvirt-java-devel-0.4.2-2.el6.noarch libvirt-devel-0.8.1-3bz591661.el6.x86_64 libvirt-client-0.8.1-3bz591661.el6.x86_64 libvirt-java-0.4.2-2.el6.noarch libvirt-debuginfo-0.8.1-3bz591661.el6.x86_64 libvirt-0.8.1-3bz591661.el6.x86_64 restart NetworkManager on both source and target machines. # service NetworkManager restart on target machine: # more /etc/hosts 10.66.70.12 dhcp-66-70-12.nay.redhat.com localhost.localdomain localhost 127.0.0.1 dhcp-66-70-12.nay.redhat.com localhost.localdomain localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 on source machine: # more /etc/hosts 10.66.70.12 dhcp-66-70-12.nay.redhat.com localhost.localdomain localhost 127.0.0.1 dhcp-66-70-43.nay.redhat.com localhost.localdomain localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # virsh migrate --live mig qemu+ssh://10.66.70.12/system root.70.12's password: error: internal error canonical hostname pointed to localhost, but this is not allowed (In reply to comment #9) > # virsh migrate --live mig qemu+ssh://10.66.70.12/system > root.70.12's password: > error: internal error canonical hostname pointed to localhost, but this is not > allowed Hm, that doesn't seem right, we aren't even calling that function anymore. Did you restart libvirtd on both the source and destination after upgrading the packages? If not, then that's the problem. Please restart libvirtd on both and re-test. Thanks, Chris Lalancette 1. erase current version # yum erase libvirt libvirt-client 2. install from http://people.redhat.com/clalance/bz589864/ # ls libvirt-0.8.1-3bz591661.el6.x86_64.rpm libvirt-client-0.8.1-3bz591661.el6.x86_64.rpm libvirt-debuginfo-0.8.1-3bz591661.el6.x86_64.rpm libvirt-devel-0.8.1-3bz591661.el6.x86_64.rpm libvirt-python-0.8.1-3bz591661.el6.x86_64.rpm # rpm -i libvirt* 3. restart libvirtd # service libvirtd restart Stopping libvirtd daemon: [Failed] Starting libvirtd daemon: [ OK ] 4. execute migrate # virsh migrate --live mig qemu+ssh://10.66.70.12/system root.70.12's password: error: internal error canonical hostname pointed to localhost, but this is not allowed 5. reboot system # reboot # rpm -qa|grep libvirt libvirt-0.8.1-3bz591661.el6.x86_64 libvirt-client-0.8.1-3bz591661.el6.x86_64 libvirt-devel-0.8.1-3bz591661.el6.x86_64 libvirt-debuginfo-0.8.1-3bz591661.el6.x86_64 libvirt-python-0.8.1-3bz591661.el6.x86_64 # service libvirtd status libvirtd (pid 2855) is running... # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] # virsh migrate --live mig qemu+ssh://10.66.70.12/system root.70.12's password: error: internal error canonical hostname pointed to localhost, but this is not allowed I'm very sorry. I uploaded the wrong RPM's to that location. I couldn't understand why we were still getting the same error message until I noticed that the names on the packages were: libvirt-0.8.1-3bz591661.el6.x86_64 So I apologize. I've now uploaded what should be the correct packages to: http://people.redhat.com/clalance/bz589864/ Could you re-test with those? Thanks, Chris Lalancette Test with libvirt-0.8.1-3bz591661.el6.x86_64 no error 'error: internal error canonical hostname pointed to localhost, but this is not allowed' output , but the guest seems dead after migration, cannot respond any input. (In reply to comment #13) > Test with libvirt-0.8.1-3bz591661.el6.x86_64 > > no error 'error: internal error canonical hostname pointed to localhost, but > this is not allowed' output , but the guest seems dead after migration, cannot > respond any input. I will track the issue in bug 578889 using the latest version of libvirt-0.8.1-6.el6. Bug 578889 - QEMU security driver kills disk access after migration (In reply to comment #13) > Test with libvirt-0.8.1-3bz591661.el6.x86_64 > > no error 'error: internal error canonical hostname pointed to localhost, but > this is not allowed' output , but the guest seems dead after migration, cannot > respond any input. Great! The fact that the guest can't respond is a different bug (as you point out); libvirt is done with it by that time. So this means that my patch to fix the libvirt portion of it at least works around the problem, even if it is not the greatest solution. I'll get it queued up for RHEL. Thanks, Chris Lalancette Oops. While discussing it upstream, it seems like we want to make some changes to the actual patch. I have one more libvirt package for you to test: http://people.redhat.com/clalance/bz589864-bz591839/ Can you please verify that it still fixes the problem for you? Thank you, Chris Lalancette I install the packages on 2 machines, one does work fine, but another doesnot. libvirtd dead after start it for a while. # cat /var/log/messages May 21 10:05:07 dhcp-66-70-43 kernel: libvirtd[3730]: segfault at 100000004 ip 0000000100000004 sp 00007f1ea5c79cd0 error 14 in ld-2.12.so[3321400000+1e000] May 21 10:05:09 dhcp-66-70-43 abrt[3774]: saved core dump of pid 3726 (/usr/sbin/libvirtd) to /var/cache/abrt/ccpp-1274407507-3726.new/coredump (90828800 bytes) May 21 10:05:09 dhcp-66-70-43 abrtd: Directory 'ccpp-1274407507-3726' creation detected May 21 10:05:09 dhcp-66-70-43 abrtd: Package 'libvirt' isn't signed with proper key May 21 10:05:09 dhcp-66-70-43 abrtd: Corrupted or bad crash /var/cache/abrt/ccpp-1274407507-3726 (res:5), deleting Hm, that's unfortunate. Even worse is that it looks like abrtd deleted the core after it was done :(. I took a look at the code and I can't see where my new code would cause a crash, so I am sort of at a loss here. Here's what I would suggest: 1) On the machine that had the problem, edit /etc/abrt/abrt.conf and set OpenGPGCheck = "no". Hopefully that will keep abrtd from deleting the core next time. Remember to restart abrtd after making this change. 2) Install the libvirt -debuginfo package from the link above (I've now uploaded the debuginfo there). 3) Run the test again; hopefully abrtd will collect the crash in something like /var/cache/abrt/ccpp-<number>/coredump 4) Run "gdb /usr/sbin/libvirtd /var/cache/abrt/ccpp-<number>/coredump" 5) Run "thread apply all bt". Then give me the output from step 5). That should hopefully give me a stack trace of where things failed, so I can debug it further. Thanks, Chris Lalancette (gdb) thread apply all bt Thread 7 (Thread 10564): #0 0x000000332240b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000004331f6 in virCondWait () #2 0x0000000000419a65 in qemudWorker () #3 0x0000003322407761 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003321ce14fd in clone () from /lib64/libc.so.6 Thread 6 (Thread 10559): #0 0x0000003322407fbd in pthread_join () from /lib64/libpthread.so.0 #1 0x000000000041cc03 in main () Thread 5 (Thread 10565): #0 0x000000332240b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000004331f6 in virCondWait () #2 0x0000000000419a65 in qemudWorker () #3 0x0000003322407761 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003321ce14fd in clone () from /lib64/libc.so.6 Thread 4 (Thread 10563): #0 0x000000332240b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000004331f6 in virCondWait () #2 0x0000000000419a65 in qemudWorker () #3 0x0000003322407761 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003321ce14fd in clone () from /lib64/libc.so.6 Thread 3 (Thread 10560): #0 0x0000003321cd7e03 in poll () from /lib64/libc.so.6 #1 0x0000000000416bf5 in virEventRunOnce () #2 0x0000000000418b46 in qemudOneLoop () #3 0x0000000000418e13 in qemudRunLoop () ---Type <return> to continue, or q <return> to quit--- #4 0x0000003322407761 in start_thread () from /lib64/libpthread.so.0 #5 0x0000003321ce14fd in clone () from /lib64/libc.so.6 Thread 2 (Thread 10562): #0 0x000000332240b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000004331f6 in virCondWait () #2 0x0000000000419a65 in qemudWorker () #3 0x0000003322407761 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003321ce14fd in clone () from /lib64/libc.so.6 Thread 1 (Thread 10561): #0 0x0000003321d108e3 in xdr_string_internal () from /lib64/libc.so.6 #1 0x000000000042b05e in xdr_remote_nonnull_string () #2 0x0000003321d11f70 in xdr_reference_internal () from /lib64/libc.so.6 #3 0x0000003321d11f31 in xdr_pointer () from /lib64/libc.so.6 #4 0x000000000042ac33 in xdr_remote_string () #5 0x000000000042ae69 in xdr_remote_open_args () #6 0x0000000000428347 in remoteDispatchClientCall () #7 0x0000000000428823 in remoteDispatchClientRequest () #8 0x0000000000419af8 in qemudWorker () #9 0x0000003322407761 in start_thread () from /lib64/libpthread.so.0 #10 0x0000003321ce14fd in clone () from /lib64/libc.so.6 (gdb) reproduce it through the following simple command: # virsh list --all Id Name State ---------------------------------- # virsh list --all Id Name State ---------------------------------- #####list --all show nothing but no error reported. # virsh pool-list --all Name State Autostart ----------------------------------------- # virsh pool-list --all error: server closed connection: error: failed to connect to the hypervisor #####pool-list will lead to the libvirtd dead. the correct result should be like this: # virsh list --all Id Name State ---------------------------------- - demo shut off # virsh pool-list --all Name State Autostart ----------------------------------------- default active yes additional info: # virsh pool-list --all 13:44:04.727: debug : virInitialize:341 : register drivers 13:44:04.727: debug : virRegisterDriver:929 : registering Test as driver 0 13:44:04.727: debug : virRegisterNetworkDriver:735 : registering Test as network driver 0 13:44:04.727: debug : virRegisterInterfaceDriver:766 : registering Test as interface driver 0 13:44:04.727: debug : virRegisterStorageDriver:797 : registering Test as storage driver 0 13:44:04.727: debug : virRegisterDeviceMonitor:828 : registering Test as device driver 0 13:44:04.727: debug : virRegisterSecretDriver:859 : registering Test as secret driver 0 13:44:04.727: debug : virRegisterNWFilterDriver:890 : registering Test as network filter driver 0 13:44:04.727: debug : virRegisterDriver:929 : registering ESX as driver 1 13:44:04.727: debug : virRegisterInterfaceDriver:766 : registering ESX as interface driver 1 13:44:04.727: debug : virRegisterNetworkDriver:735 : registering ESX as network driver 1 13:44:04.727: debug : virRegisterStorageDriver:797 : registering ESX as storage driver 1 13:44:04.728: debug : virRegisterDeviceMonitor:828 : registering ESX as device driver 1 13:44:04.728: debug : virRegisterSecretDriver:859 : registering ESX as secret driver 1 13:44:04.728: debug : virRegisterNWFilterDriver:890 : registering ESX as network filter driver 1 13:44:04.728: debug : virRegisterDriver:929 : registering remote as driver 2 13:44:04.728: debug : virRegisterNetworkDriver:735 : registering remote as network driver 2 13:44:04.728: debug : virRegisterInterfaceDriver:766 : registering remote as interface driver 2 13:44:04.728: debug : virRegisterStorageDriver:797 : registering remote as storage driver 2 13:44:04.728: debug : virRegisterDeviceMonitor:828 : registering remote as device driver 2 13:44:04.728: debug : virRegisterSecretDriver:859 : registering remote as secret driver 2 13:44:04.728: debug : virRegisterNWFilterDriver:890 : registering remote as network filter driver 2 13:44:04.728: debug : virConnectOpenAuth:1471 : name=(null), auth=0x7fee59515ac0, flags=0 13:44:04.728: debug : do_open:1208 : no name, allowing driver auto-select 13:44:04.728: debug : do_open:1216 : trying driver 0 (Test) ... 13:44:04.728: debug : do_open:1222 : driver 0 Test returned DECLINED 13:44:04.728: debug : do_open:1216 : trying driver 1 (ESX) ... 13:44:04.728: debug : do_open:1222 : driver 1 ESX returned DECLINED 13:44:04.728: debug : do_open:1216 : trying driver 2 (remote) ... 13:44:04.728: debug : remoteOpen:1093 : Auto-probe remote URI 13:44:04.728: debug : doRemoteOpen:570 : proceeding with name = 13:44:04.728: debug : remoteIO:9748 : Do proc=66 serial=0 length=28 wait=(nil) 13:44:04.728: debug : remoteIO:9823 : We have the buck 66 0x7fee532ca010 0x7fee532ca010 13:44:04.728: debug : remoteIODecodeMessageLength:9179 : Got length, now need 64 total (60 more) 13:44:04.728: debug : remoteIOEventLoop:9674 : Giving up the buck 66 0x7fee532ca010 (nil) 13:44:04.728: debug : remoteIO:9854 : All done with our call 66 (nil) 0x7fee532ca010 13:44:04.728: debug : remoteIO:9748 : Do proc=1 serial=1 length=40 wait=(nil) 13:44:04.728: debug : remoteIO:9823 : We have the buck 1 0x2184cc0 0x2184cc0 13:44:05.832: debug : remoteIOEventLoop:9695 : Giving up the buck due to I/O error 1 0x2184cc0 (nil) 13:44:05.832: debug : do_open:1222 : driver 2 remote returned ERROR 13:44:05.832: debug : virUnrefConnect:294 : unref connection 0x2181730 1 13:44:05.832: debug : virReleaseConnect:249 : release connection 0x2181730 error: server closed connection: error: failed to connect to the hypervisor (In reply to comment #20) > reproduce it through the following simple command: > > # virsh list --all > Id Name State > ---------------------------------- > > # virsh list --all > Id Name State > ---------------------------------- > #####list --all show nothing but no error reported. > > # virsh pool-list --all > Name State Autostart > ----------------------------------------- > > # virsh pool-list --all > error: server closed connection: > error: failed to connect to the hypervisor > #####pool-list will lead to the libvirtd dead. Hm, unfortunately I wasn't able to reproduce this failure. Every time I do virsh list or virsh pool-list, it does the correct thing. I probably have some configuration on my machine different than yours, which is causing the discrepancy. Can you either: 1) Give me access to this machine? (I'll need the IP address and root password), or 2) Upload the abrtd core you've collected somewhere, so I can take a look at what caused the crash. Thanks, Chris Lalancette OK, thanks for the access information. With that, I was able to log in and poke around. I think the problem ended up being that I built the package locally on one of my test boxes and not in brew (which was having problems at the time). With the original package I gave you, running your reproducer in Comment #20 caused the crash. I then rebuilt the same package in brew (which is now working), installed it on your machine, and it seems to be working now. If you could now try to test out the original problem here (which is the live migration problem), that would be great. Thanks again, Chris Lalancette Re-test the live migration with the packages(from http://people.redhat.com/clalance/bz589864-bz591839/) again, it does work fine, of course the hostname have been added to 127.0.0.1 :-) Yuan Dan (In reply to comment #24) > Re-test the live migration with the packages(from > http://people.redhat.com/clalance/bz589864-bz591839/) again, it does work fine, > of course the hostname have been added to 127.0.0.1 :-) Perfect, thank you very much! Chris Lalancette This is now upstream as: 0117b7da68fdb7435655318881fc925d43396e26 Chris Lalancette ------- Comment From edpollar.com 2010-05-26 13:05 EDT------- reverse mirror of RHBZ589864 - [RHEL6]: Migration failure with error 'internal error canonical hostname pointed to localhost' https://bugzilla.redhat.com/show_bug.cgi?id=589864 libvirt-0_8_1-8_el6 has been built in RHEL-6-candidate with the fix. Dave ------- Comment From tpnoonan.com 2010-06-11 11:10 EDT------- will be in snapshot7, may be in prebeta2 build verified PASSED with libvirt-0.8.1-8.el6. ------- Comment From dkalaker.com 2010-06-16 08:31 EDT-------
> verified PASSED with libvirt-0.8.1-8.el6.
Hello Redhat,
I have verified the migration on rhel6 pre beta2 with the network manager installed and the migration works fine.
Thanks!!!
Deepti.
------- Comment From edpollar.com 2010-06-22 17:16 EDT------- looks like this has been tested both at IBM and Red Hat. I am going to set it to closed, if ther eis a problem we can reopen. Verified with libvirt-0.8.1-27.el6.x86_64 & qemu-kvm-0.12.1.2-2.113.el6.x86_64. On source: # cat /etc/hosts 10.66.93.181 dhcp-93-181.nay.redhat.com dhcp-93-181 # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 dhcp-93-181.nay.redhat.com dhcp-93-181 localhost6.localdomain6 localhost6 # service NetworkManager restart Stopping NetworkManager daemon: [ OK ] Setting network parameters... [ OK ] Starting NetworkManager daemon: [ OK ] # virsh start rhel6 Domain rhel6 started # virsh list --all Id Name State ---------------------------------- 1 rhel6 running # virsh migrate --live rhel6 qemu+ssh://10.66.65.186/system root.65.186's password: On target: # cat /etc/hosts 10.66.65.186 dhcp-65-186.nay.redhat.com # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 dhcp-65-186.nay.redhat.com localhost6.localdomain6 localhost6 # service NetworkManager restart Stopping NetworkManager daemon: [ OK ] Setting network parameters... [ OK ] Starting NetworkManager daemon: [ OK ] # virsh list Id Name State ---------------------------------- 4 rhel6 running Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |