Module#

A Module represents a class that can be sent to and used on remote clusters and environments. Modules can live on remote hardware and its class methods called remotely.

Module Factory Method#

runhouse.module(cls: [Type] = None, name: str | None = None, system: str | Cluster | None = None, env: str | Env | None = None, dryrun: bool = False)[source]#

Returns a Module object, which can be used to instantiate and interact with the class remotely.

The behavior of Modules (and subclasses thereof) is as follows:
  • Any callable public method of the module is intercepted and executed remotely over rpc, with exception of certain functions Python doesn’t make interceptable (e.g. __call__, __init__), and methods of the Module class (e.g. to, fetch, etc.). Properties and private methods are not intercepted, and will be executed locally.

  • Any method which executes remotely may be called normally, e.g. model.forward(x), or asynchronously, e.g. key = model.forward.run(x) (which returns a key to retrieve the result with cluster.get(key)), or with run_obj = model.train.remote(x), which runs synchronously but returns a remote object to avoid passing heavy results back over the network.

  • Setting attributes, both public and private, will be executed remotely, with the new values only being set in the remote module and not the local one. This excludes any methods or attribtes of the Module class proper (e.g. system or name), which will be set locally.

  • Attributes, private properties can be fetched with the remote property, and the full resource can be fetched using .fetch(), e.g. model.remote.weights, model.remote.__dict__, model.fetch().

  • When a module is sent to a cluster, it’s public attribtes are serialized, sent over, and repopulated in the remote instance. This means that any changes to the module’s attributes will not be reflected in the remote

Parameters:
  • cls – The class to instantiate.

  • name (Optional[str]) – Name to give the module object, to be reused later on.

  • env (Optional[str or Env]) – Environment in which the module should live on the cluster, if system is cluster.

  • dryrun (bool) – Whether to create the Blob if it doesn’t exist, or load a Blob object as a dryrun. (Default: False)

Returns:

The resulting module.

Return type:

Module

Example - creating a module by defining an rh.Module subclass:
>>> import runhouse as rh
>>> import transformers
>>>
>>> # Sample rh.Module class
>>> class Model(rh.Module):
>>>    def __init__(self, model_id, device="cpu", env=None):
>>>        # Note that the code here will be run in your local environment prior to being sent to
>>>        # to a cluster. For loading large models/datasets that are only meant to be used remotely,
>>>        # we recommend using lazy initialization (see tokenizer and model attributes below).
>>>        super().__init__(env=env)
>>>        self.model_id = model_id
>>>        self.device = device
>>>
>>>    @property
>>>    def tokenizer(self):
>>>        # Lazily initialize the tokenizer remotely only when it is needed
>>>        if not hasattr(self, '_tokenizer'):
>>>            self._tokenizer = transformers.AutoTokenizer.from_pretrained(self.model_id)
>>>        return self._tokenizer
>>>
>>>    @property
>>>    def model(self):
>>>        if not hasattr(self, '_model'):
>>>            self._model = transformers.AutoModel.from_pretrained(self.model_id).to(self.device)
>>>        return self._model
>>>
>>>    def predict(self, x):
>>>        x = self.tokenizer(x, return_tensors="pt")
>>>        return self.model(x)
>>> # Creating rh.Module instance
>>> model = Model(model_id="bert-base-uncased", device="cuda", env="my_env")
>>> model = model.to(system="my_gpu")
>>> model.predict("Hello world!")   # Runs on system in env
>>> tok = model.remote.tokenizer    # Returns remote tokenizer
>>> id = model.local.model_id       # Returns local model_id, if any
>>> model_id = model.model_id       # Returns local model_id (not remote)
>>> model.fetch()                   # Returns full remote module, including model and tokenizer
>>>
Example - creating a Module from an existing class, via the rh.module() factory method:
>>> other_model = Model(model_id="bert-base-uncased", device="cuda").to("my_gpu", "my_env")
>>>
>>> # Another method: Create a module instance from an existing non-Module class using rh.module()
>>> RemoteModel = rh.module(cls=BERTModel, env="my_env")
>>> remote_model = RemoteModel(model_id="bert-base-uncased", device="cuda").to(system="my_gpu")
>>> remote_model.predict("Hello world!")  # Runs on system in env
>>>
>>> # You can also call remote class methods
>>> other_model = RemoteModel.get_model_size("bert-base-uncased")
>>> # Loading a module
>>> my_local_module = rh.module(name="~/my_module")
>>> my_s3_module = rh.module(name="@/my_module")

Module Class#

class runhouse.Module(pointers: Tuple | None = None, signature: dict | None = None, endpoint: str | None = None, name: str | None = None, system: Cluster | str | None = None, env: Env | None = None, dryrun: bool = False, provenance: dict | None = None, **kwargs)[source]#
__init__(pointers: Tuple | None = None, signature: dict | None = None, endpoint: str | None = None, name: str | None = None, system: Cluster | str | None = None, env: Env | None = None, dryrun: bool = False, provenance: dict | None = None, **kwargs)[source]#

Runhouse Module object

endpoint(external: bool = False)[source]#

The endpoint of the module on the cluster. Returns an endpoint if one was manually set (e.g. if loaded down from a config). If not, request the endpoint from the Module’s system.

Parameters:

external – If True and getting the endpoint from the system, only return an endpoint if it’s externally accessible (i.e. not on localhost, not connected through as ssh tunnel). If False, return the endpoint even if it’s not externally accessible.

fetch(item: str | None = None, **kwargs)[source]#

Helper method to allow for access to remote state, both public and private. Fetching functions is not advised. system.get(module.name).resolved_state() is roughly equivalent to module.fetch().

Example

>>> my_module.fetch("my_property")
>>> my_module.fetch("my_private_property")
>>> MyRemoteClass = rh.module(my_class).to(system)
>>> MyRemoteClass(*args).fetch() # Returns a my_class instance, populated with the remote state
>>> my_blob.fetch() # Returns the data of the blob, due to overloaded ``resolved_state`` method
>>> class MyModule(rh.Module):
>>>     # ...
>>>
>>> MyModule(*args).to(system).fetch() # Returns the full remote module, including private and public state
async fetch_async(key: str, remote: bool = False, stream_logs: bool = False)[source]#

Async version of fetch. Can’t be a property like fetch because __getattr__ can’t be awaited.

Example

>>> await my_module.fetch_async("my_property")
>>> await my_module.fetch_async("_my_private_property")
get_or_to(system: str | Cluster, env: str | List[str] | Env | None = None, name: str | None = None)[source]#

Check if the module already exists on the cluster, and if so return the module object. If not, put the module on the cluster and return the remote module.

Example

>>> remote_df = Model().get_or_to(my_cluster, name="remote_model")
property local#

Helper property to allow for access to local properties, both public and private.

Example

>>> my_module.local.my_property
>>> my_module.local._my_private_property
>>> my_module.local.size = 14
method_signature(method)[source]#

Extracts the properties of a method that we want to preserve when sending the method over the wire.

openapi_spec(spec_name: str | None = None)[source]#

Generate an OpenAPI spec for the module.

TODO: This breaks if the module has type annotations that are classes, and not standard library or typing types.

Maybe we can do something using: kuimono/openapi-schema-pydantic to allow nested Pydantic models easily as schemas?

TODO: What happens if there is an empty function, will this work with an empty body even though it is marked as required?

refresh()[source]#

Update the resource in the object store.

property remote#

Helper property to allow for access to remote properties, both public and private. Returning functions is not advised.

Example

>>> my_module.remote.my_property
>>> my_module.remote._my_private_property
>>> my_module.remote.size = 14
rename(name: str)[source]#

Rename the module.

replicate(num_replicas=1, names=None, envs=None, parallel=False)[source]#

Replicate the module on the cluster in a new env and return the new modules.

resolve()[source]#

Specify that the module should resolve to a particular state when passed into a remote method. This is useful if you want to revert the module’s state to some “Runhouse-free” state once it is passed into a Runhouse-unaware function. For example, if you call a Runhouse-unaware function with .remote(), you will be returned a Blob which wraps your data. If you want to pass that Blob into another function that operates on the original data (e.g. a function that takes a numpy array), you can call my_second_fn(my_blob.resolve()), and my_blob will be replaced with the contents of its .data on the cluster before being passed into my_second_fn.

Resolved state is defined by the resolved_state method. By default, modules created with the rh.module factory constructor will be resolved to their original non-module-wrapped class (or best attempt). Modules which are defined as a subclass of Module will be returned as-is, as they have no other “original class.”

Example

>>> my_module = rh.module(my_class)
>>> my_remote_fn(my_module.resolve()) # my_module will be replaced with the original class `my_class`
>>> my_result_blob = my_remote_fn.call.remote(args)
>>> my_other_remote_fn(my_result_blob.resolve()) # my_result_blob will be replaced with its data
resolved_state()[source]#

Return the resolved state of the module. By default, this is the original class of the module if it was created with the module factory constructor.

save(name: str | None = None, overwrite: bool = True, folder: str | None = None)[source]#

Register the resource and save to local working_dir config and RNS config store.

async set_async(key: str, value)[source]#

Async version of property setter.

Example

>>> await my_module.set_async("my_property", my_value)
>>> await my_module.set_async("_my_private_property", my_value)
share(*args, visibility=None, **kwargs)[source]#

Grant access to the resource for a list of users (or a single user). If a user has a Runhouse account they will receive an email notifying them of their new access. If the user does not have a Runhouse account they will also receive instructions on creating one, after which they will be able to have access to the Resource. If visibility is set to public, users will not be notified.

Note

You can only grant access to other users if you have write access to the resource.

Parameters:
  • users (Union[str, list], optional) – Single user or list of user emails and / or runhouse account usernames. If none are provided and visibility is set to public, resource will be made publicly available to all users.

  • access_level (ResourceAccess, optional) – Access level to provide for the resource. Defaults to read.

  • visibility (ResourceVisibility, optional) – Type of visibility to provide for the shared resource. Defaults to private.

  • notify_users (bool, optional) – Whether to send an email notification to users who have been given access. Note: This is relevant for resources which are not shareable. Defaults to True.

  • headers (dict, optional) – Request headers to provide for the request to RNS. Contains the user’s auth token. Example: {"Authorization": f"Bearer {token}"}

Returns:

added_users:

Users who already have a Runhouse account and have been granted access to the resource.

new_users:

Users who do not have Runhouse accounts and received notifications via their emails.

valid_users:

Set of valid usernames and emails from users parameter.

Return type:

Tuple(Dict, Dict, Set)

Example

>>> # Write access to the resource for these specific users.
>>> # Visibility will be set to private (users can search for and view resource in Den dashboard)
>>> my_resource.share(users=["username1", "[email protected]"], access_level='write')
>>> # Make resource public, with read access to the resource for all users
>>> my_resource.share(visibility='public')
to(system: str | Cluster, env: str | List[str] | Env | None = None, name: str | None = None, force_install: bool = False)[source]#

Put a copy of the module on the destination system and env, and return the new module.

Example

>>> local_module = rh.module(my_class)
>>> cluster_module = local_module.to("my_cluster")