Bug 1392295 - RFE: Ability to live migrate instance after moving existing ceph-mon service on different nodes
Summary: RFE: Ability to live migrate instance after moving existing ceph-mon service ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ga
: 10.0 (Newton)
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-07 06:55 UTC by Marius Cornea
Modified: 2016-11-11 15:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-11 15:54:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
Source & destination qemu logs (from comment #7) (10.60 KB, text/plain)
2016-11-07 17:49 UTC, Kashyap Chamarthy
no flags Details

Description Marius Cornea 2016-11-07 06:55:44 UTC
Description of problem:
After moving the ceph-mon services on different nodes I cannot live migrate an instance. The compute node where the instance is running is failing with  error connecting: Connection timed out error message as it's trying to reach the old nodes where the ceph-mon service was initially running. Note that the Ceph cluster reports a HEALTH_OK state and new instances can be deployed./etc/ceph/ceph.conf on the compute nodes also references only the new nodes running the ceph-mon service.   

Version-Release number of selected component (if applicable):
openstack-nova-compute-14.0.1-5.el7ost.noarch

How reproducible:
1/1 

Steps to Reproduce:
1. Deploy overcloud with 3 x monolithic controllers with ceph storage. Check ceph cluster health:
     cluster d825caf0-a446-11e6-91fe-525400a81fbf
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=10.0.0.146:6789/0,overcloud-controller-1=10.0.0.142:6789/0,overcloud-controller-2=10.0.0.139:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e29: 6 osds: 6 up, 6 in
            flags sortbitwise
      pgmap v68: 224 pgs, 6 pools, 218 MB data, 33 objects
            1510 MB used, 118 GB

2. Run an overcloud instance

3. Deploy additional 2 nodes of a new role running the CephMON service

4. Remove the initial ceph mons running on the controllers:
sudo systemctl stop ceph-mon.target; sudo ceph mon remove controller-0
sudo systemctl stop ceph-mon.target; sudo ceph mon remove controller-1
sudo systemctl stop ceph-mon.target; sudo ceph mon remove controller-2

5. Make sure that the cluster health looks ok:

cluster d825caf0-a446-11e6-91fe-525400a81fbf
     health HEALTH_OK
     monmap e6: 2 mons at {overcloud-serviceapi-0=10.0.0.154:6789/0,overcloud-serviceapi-1=10.0.0.153:6789/0}
            election epoch 24, quorum 0,1 overcloud-serviceapi-1,overcloud-serviceapi-0
     osdmap e33: 6 osds: 6 up, 6 in
            flags sortbitwise
      pgmap v1426: 224 pgs, 6 pools, 3183 MB data, 4866 objects
            9564 MB used, 110 GB / 119 GB avail
                 224 active+clean

6. Live migrate the instance started on step 2

Actual results:
Live migration fails:

2016-11-07 06:29:16.831 6204 ERROR nova.virt.libvirt.driver [req-23f7dd7b-2087-4998-886b-2cd7da1f1bda edb5c79dd9fb4813991048b50cad4ae7 f9615fbeb4fe4bdb87b73d5d004ba876 - - -] [instance: 8c9915f4-84c3-44f1-b409-f593e385f1d2] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2016-11-07T06:29:16.611698Z qemu-kvm: -drive file=rbd:vms/8c9915f4-84c3-44f1-b409-f593e385f1d2_disk:id=openstack:auth_supported=cephx\;none:mon_host=10.0.0.139\:6789\;10.0.0.142\:6789\;10.0.0.146\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap: error connecting: Connection timed out

Expected results:
Live migration succeds and QEMU uses the new Ceph MONs instead of the old ones.

Additional info:
I tried manually restarting libvirtd and openstack-nova-compute but I couldn't make it work. Please let me know if there's any other step that I missed. Thank you.

Comment 1 Marius Cornea 2016-11-07 06:59:17 UTC
ceph.conf on the compute node:

[global]
osd_pool_default_min_size = 1
auth_service_required = cephx
mon_initial_members = overcloud-serviceapi-0,overcloud-serviceapi-1,overcloud-serviceapi-2
fsid = d825caf0-a446-11e6-91fe-525400a81fbf
cluster_network = 192.168.0.18/25
auth_supported = cephx
auth_cluster_required = cephx
mon_host = 10.0.0.154,10.0.0.153,10.0.0.157
auth_client_required = cephx
public_network = 10.0.0.144/25

Comment 2 Giulio Fidente 2016-11-07 10:17:50 UTC
I think we need to restart the qemu process, by stopping/restarting the VM qemu should reinit its rbd connection with the new config settings.

Comment 3 Marius Cornea 2016-11-07 10:30:28 UTC
Yep, it looks so, after doing nova stop/start I no longer got the old MONs timeout error message.

Comment 4 Eoghan Glynn 2016-11-07 12:40:05 UTC
@mcornea: can you clarify the apparent contradiction between comment #0:

"I tried manually restarting libvirtd and openstack-nova-compute but I couldn't make it work."

and comment #3:

"after doing nova stop/start I no longer got the old MONs timeout error message"

i.e. what's the exact different between manually restarting openstack-nova-compute and nova stop/start?

Comment 5 Eoghan Glynn 2016-11-07 12:43:05 UTC
@mcornea: can you attach the full QEMU log file from both source and destination hosts?

Comment 6 Marius Cornea 2016-11-07 12:45:58 UTC
(In reply to Eoghan Glynn from comment #4)
> @mcornea: can you clarify the apparent contradiction between comment #0:
> 
> "I tried manually restarting libvirtd and openstack-nova-compute but I
> couldn't make it work."
> 
> and comment #3:
> 
> "after doing nova stop/start I no longer got the old MONs timeout error
> message"
> 
> i.e. what's the exact different between manually restarting
> openstack-nova-compute and nova stop/start?

nova stop/start refers to the instance: nova stop $instance; nova start $instance  while restarting openstack-nova-compute is systemctl restart openstack-nova-compute

Comment 7 Marius Cornea 2016-11-07 17:24:04 UTC
(In reply to Eoghan Glynn from comment #5)
> @mcornea: can you attach the full QEMU log file from both source and
> destination hosts?

The QEMU logs:

http://paste.openstack.org/show/588289/

Comment 9 Dr. David Alan Gilbert 2016-11-07 17:48:28 UTC
From the destination in that pastebin:
2016-11-07 16:31:05.351+0000: starting up libvirt version: 2.0.0, package: 10.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-09-21-10:15:26, x86-038.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-27.el7), hostname: overcloud-compute-1.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000004,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-instance-00000004/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+vme,+ss,+vmx,+osxsave,+f16c,+rdrand,+hypervisor,+arat,+tsc_adjust,+xsaveopt,+pdpe1gb,+abm,+rtm,+hle -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid fe2952db-6dc8-44c2-b26a-0f0300065d21 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.1-5.el7ost,serial=135cfcf4-8659-45b3-ab94-bb5027185027,uuid=fe2952db-6dc8-44c2-b26a-0f0300065d21,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-5-instance-00000004/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=2gomEsaWRmx2CEa5VR2wxUgUionIYSJ0h/mly+/xEfE=,keyid=masterKey0,iv=P24YU3se79jC+QMXAhHdig==,format=base64 -drive 'file=rbd:vms/fe2952db-6dc8-44c2-b26a-0f0300065d21_disk:id=openstack:auth_supported=cephx\;none:mon_host=10.0.0.138\:6789\;10.0.0.141\:6789\;10.0.0.149\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -object secret,id=virtio-disk1-secret0,data=iorN1kMCSA3jz/7wgQyPkKSxqEeNkNb/asu4rF96CyU=,keyid=masterKey0,iv=ETKHh1NgX/xU8uCKB2vzWQ==,format=base64 -drive 'file=rbd:volumes/volume-30f8f80c-8bb3-4d1a-ab7c-ec906aad8517:id=openstack:auth_supported=cephx\;none:mon_host=10.0.0.140\:6789\;10.0.0.142\:6789\;10.0.0.155\:6789,file.password-secret=virtio-disk1-secret0,format=raw,if=none,id=drive-virtio-disk1,serial=30f8f80c-8bb3-4d1a-ab7c-ec906aad8517,cache=none' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:7f:cc:23,bus=pci.0,addr=0x3 -add-fd set=2,fd=38 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:4 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/4 (label charserial1)
2016-11-07T16:36:05.457888Z qemu-kvm: -drive file=rbd:vms/fe2952db-6dc8-44c2-b26a-0f0300065d21_disk:id=openstack:auth_supported=cephx\;none:mon_host=10.0.0.138\:6789\;10.0.0.141\:6789\;10.0.0.149\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap: error connecting: Connection timed out
2016-11-07 16:36:05.467+0000: shutting down


so I *think* that means it's the rbd connection timing out?

Comment 10 Kashyap Chamarthy 2016-11-07 17:49:41 UTC
Created attachment 1218161 [details]
Source & destination qemu logs (from comment #7)

Attaching them as a plain text file to this bug, because paste-bins expire.

Comment 11 Marius Cornea 2016-11-07 17:55:24 UTC
The rbd connection is timing out because these mons(10.0.0.138,10.0.0.141,10.0.0.149) are no longer part of the cluster, they were removed in step 4. Given that the current cluster status is ok I'd expect the rbd connection to be using the new MONs and be able to reach the cluster.

Comment 12 Daniel Berrangé 2016-11-11 15:11:18 UTC
The initial monitor hosts are queried by Nova when it first starts the guest. They are then put in the XML given to libvirt, which in turn passes them to QEMU. If you decomission those monitor hosts it is inevitably going break any existing QEMU guests, since their XML config will be pointing to hosts that no longer exist. This will certainly break live migration, since QEMU on the target host will be trying to connect to the same monitors it had on the source.

Dealing with decomissions of ceph monitors is not something Nova has ever attempted to address, so this is an RFE really.

Comment 13 Marius Cornea 2016-11-11 15:16:44 UTC
(In reply to Daniel Berrange from comment #12)
> Dealing with decomissions of ceph monitors is not something Nova has ever
> attempted to address, so this is an RFE really.

Thanks, marking it as an RFE in this case.

Comment 14 Eoghan Glynn 2016-11-11 15:54:52 UTC
This was further discussed on the compute DFG triage call today and the consensus was that moving the monitors in this way is operationally incorrect. If you want the flexibility to move around services like that, then that's what VIPs are for. If the monitor sat behind a virtual IP, then moving it would not require the highly awkward changes to static config as a knock-on impact, instead everything should continue to work.


Note You need to log in before you can comment on or make changes to this bug.