The code of OpenStack virtual machine hot migration is analyzed in detail

  • 2020-05-24 06:34:56
  • OfStack

Words virtual machine migration into thermal migration and migration and the heat transfer in the words of baidu is that: thermal transfer (Live Migration, also called dynamic migration, live migration), namely virtual machine save/restore (Save/Restore) : complete the entire virtual machine running status, at the same time can be quickly restored to the original hardware platform and even different hardware platforms. After recovery, the virtual machine is still running smoothly and the user will not notice any difference. OpenStack's virtual machine migration is based on Libvirt. Let's take a look at the specific code implementation of Openstack's virtual machine hot migration.

First of all, by API entry into nova/api openstack compute/contrib/admin_actions py


@wsgi.action('os-migrateLive')
  def _migrate_live(self, req, id, body):
    """Permit admins to (live) migrate a server to a new host."""
    context = req.environ["nova.context"]
    authorize(context, 'migrateLive')

    try:
      block_migration = body["os-migrateLive"]["block_migration"]
      disk_over_commit = body["os-migrateLive"]["disk_over_commit"]
      host = body["os-migrateLive"]["host"]
    except (TypeError, KeyError):
      msg = _("host, block_migration and disk_over_commit must "
          "be specified for live migration.")
      raise exc.HTTPBadRequest(explanation=msg)

    try:
      block_migration = strutils.bool_from_string(block_migration,
                            strict=True)
      disk_over_commit = strutils.bool_from_string(disk_over_commit,
                             strict=True)
    except ValueError as err:
      raise exc.HTTPBadRequest(explanation=str(err))

    try:
      instance = self.compute_api.get(context, id, want_objects=True)
      self.compute_api.live_migrate(context, instance, block_migration,
                     disk_over_commit, host)
    except (exception.ComputeServiceUnavailable,
        exception.InvalidHypervisorType,
        exception.UnableToMigrateToSelf,
        exception.DestinationHypervisorTooOld,
        exception.NoValidHost,
        exception.InvalidLocalStorage,
        exception.InvalidSharedStorage,
        exception.MigrationPreCheckError) as ex:
      raise exc.HTTPBadRequest(explanation=ex.format_message())
    except exception.InstanceNotFound as e:
      raise exc.HTTPNotFound(explanation=e.format_message())
    except exception.InstanceInvalidState as state_error:
      common.raise_http_conflict_for_instance_invalid_state(state_error,
          'os-migrateLive')
    except Exception:
      if host is None:
        msg = _("Live migration of instance %s to another host "
            "failed") % id
      else:
        msg = _("Live migration of instance %(id)s to host %(host)s "
            "failed") % {'id': id, 'host': host}
      LOG.exception(msg)
      # Return messages from scheduler
      raise exc.HTTPBadRequest(explanation=msg)

    return webob.Response(status_int=202)

Here you can see that line 1 corresponds to line 2 of the API document:


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}

Okay, so the source code actually does the migration in line 26, line 27:


self.compute_api.live_migrate(context, instance, block_migration,
                 disk_over_commit, host)

By this sentence into nova/compute/api py, source code is as follows:


@check_instance_cell
  @check_instance_state(vm_state=[vm_states.ACTIVE])
  def live_migrate(self, context, instance, block_migration,
           disk_over_commit, host_name):
    """Migrate a server lively to a new host."""
    LOG.debug(_("Going to try to live migrate instance to %s"),
         host_name or "another host", instance=instance)

    instance.task_state = task_states.MIGRATING
    instance.save(expected_task_state=[None])

    self.compute_task_api.live_migrate_instance(context, instance,
        host_name, block_migration=block_migration,
        disk_over_commit=disk_over_commit)

Line 2 is a decorator used to detect the state of the virtual machine and/or task before entering the API method. If the instance is in the wrong state, an exception will be thrown. The next live migration of virtual machines to the new host, and there he put the virtual machine state "migrating", and then by the 12 lines to nova/conductor/api py


def live_migrate_instance(self, context, instance, host_name,
                block_migration, disk_over_commit):
     scheduler_hint = {'host': host_name}
     self._manager.migrate_server(
       context, instance, scheduler_hint, True, False, None,
       block_migration, disk_over_commit, None)

Will host names in the dictionary scheduler_hint, then call nova/conductor/manager migrate_server py method,


def migrate_server(self, context, instance, scheduler_hint, live, rebuild,
      flavor, block_migration, disk_over_commit, reservations=None):
    if instance and not isinstance(instance, instance_obj.Instance):
      # NOTE(danms): Until v2 of the RPC API, we need to tolerate
      # old-world instance objects here
      attrs = ['metadata', 'system_metadata', 'info_cache',
           'security_groups']
      instance = instance_obj.Instance._from_db_object(
        context, instance_obj.Instance(), instance,
        expected_attrs=attrs)
    if live and not rebuild and not flavor:
      self._live_migrate(context, instance, scheduler_hint,
                block_migration, disk_over_commit)
    elif not live and not rebuild and flavor:
      instance_uuid = instance['uuid']
      with compute_utils.EventReporter(context, self.db,
                     'cold_migrate', instance_uuid):
        self._cold_migrate(context, instance, flavor,
                  scheduler_hint['filter_properties'],
                  reservations)
    else:
      raise NotImplementedError()

Because the nova conductor/api. py parameter is coming in


self._manager.migrate_server(
       context, instance, scheduler_hint, True, False, None,
       block_migration, disk_over_commit, None)

So live is True, rebuild is Flase,flavor is None, execute lines 12 and 13:


 if live and not rebuild and not flavor:
       self._live_migrate(context, instance, scheduler_hint,
                block_migration, disk_over_commit) 
 _live_migrate The code is as follows: 
def _live_migrate(self, context, instance, scheduler_hint,
           block_migration, disk_over_commit):
    destination = scheduler_hint.get("host")
    try:
      live_migrate.execute(context, instance, destination,
               block_migration, disk_over_commit)
    except (exception.NoValidHost,
        exception.ComputeServiceUnavailable,
        exception.InvalidHypervisorType,
        exception.InvalidCPUInfo,
        exception.UnableToMigrateToSelf,
        exception.DestinationHypervisorTooOld,
        exception.InvalidLocalStorage,
        exception.InvalidSharedStorage,
        exception.HypervisorUnavailable,
        exception.MigrationPreCheckError) as ex:
      with excutils.save_and_reraise_exception():
        #TODO(johngarbutt) - eventually need instance actions here
        request_spec = {'instance_properties': {
          'uuid': instance['uuid'], },
        }
        scheduler_utils.set_vm_state_and_notify(context,
            'compute_task', 'migrate_server',
            dict(vm_state=instance['vm_state'],
               task_state=None,
               expected_task_state=task_states.MIGRATING,),
            ex, request_spec, self.db)
    except Exception as ex:
      LOG.error(_('Migration of instance %(instance_id)s to host'
            ' %(dest)s unexpectedly failed.'),
            {'instance_id': instance['uuid'], 'dest': destination},
            exc_info=True)
      raise exception.MigrationError(reason=ex)

First, line 3 will host name assigned to destination, then perform the migration, is behind the abnormal capture, perform the migration of the code is divided into two parts, first part 1, in nova/conductor/tasks/live_migrate py around 184 lines of:


def execute(context, instance, destination,
      block_migration, disk_over_commit):
  task = LiveMigrationTask(context, instance,
               destination,
               block_migration,
               disk_over_commit)
  #TODO(johngarbutt) create a superclass that contains a safe_execute call
  return task.execute()

Create the superclass that contains the security execution callback, and then return the following function, part 2, to perform the migration, in line 54 or so:


def execute(self):
    self._check_instance_is_running()
    self._check_host_is_up(self.source)

    if not self.destination:
      self.destination = self._find_destination()
    else:
      self._check_requested_destination()

    #TODO(johngarbutt) need to move complexity out of compute manager
    return self.compute_rpcapi.live_migration(self.context,
        host=self.source,
        instance=self.instance,
        dest=self.destination,
        block_migration=self.block_migration,
        migrate_data=self.migrate_data)
        #TODO(johngarbutt) disk_over_commit?

Here are three parts:

If the host does not exist at present, then the scheduling algorithm will select one target host and conduct relevant detection to ensure real-time migration operation.

If the target host exists, the relevant detection operation can be directly carried out to ensure that the real-time migration operation can be carried out;

Perform the migration operation.

Before go into two parts, part of the code 3, in nova/compute/rpcapi py:


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}
0

Hot migration begins:


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}
1

In this method, a green thread is created to run method _live_migration to perform real-time migration. It mainly calls libvirt python interface method virDomainMigrateToURI to realize the migration of domain objects from the current host to the given target host.

spawn: create a green thread to run the method "func(*args, **kwargs)", which is to run the method _live_migration;

_live_migration: perform real-time migration; It mainly calls libvirt python interface method virDomainMigrateToURI to achieve the migration of domain objects from the current host to the given target host.

Then call the _live_migration method in the green thread:


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}
2

if block_migration:
         flaglist = CONF.libvirt.block_migration_flag.split(',')

This gets the list of block migration flags, block_migration_flag: this parameter defines the setting of the migration flag for block migration.


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}
4

This section gets a list of live migration flags, and the parameter live_migration_flag defines the migration flags for live migration.


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}
5

Retrieves the libvirt domain object based on the given instance name.


 {
  "os-migrateLive": {
    "host": "0443e9a1254044d8b99f35eace132080",
    "block_migration": false,
    "disk_over_commit": false
  }
}
6

Gets the time to wait for the real-time migration to complete.

The hot migration code section ends here.


Related articles: