Blob#

A Blob is a Runhouse primitive that represents an entity for storing data and lives inside of a Folder.

Blob Factory Method#

runhouse.blob(data: [Any] = None, name: str | None = None, path: str | Path | None = None, system: str | None = None, env: str | Env | None = None, data_config: Dict | None = None, load: bool = True, dryrun: bool = False)[source]#

Returns a Blob object, which can be used to interact with the resource at the given path

Parameters:
  • data – Blob data. The data to persist either on the cluster or in the filesystem.

  • name (Optional[str]) – Name to give the blob object, to be reused later on.

  • path (Optional[str or Path]) – Path (or path) to the blob object. Specfying a path will force the blob to be saved to the filesystem rather than persist in the cluster’s object store.

  • system (Optional[str or Cluster]) – File system or cluster name. If providing a file system this must be one of: [file, github, sftp, ssh, s3, gs, azure]. We are working to add additional file system support. If providing a cluster, this must be a cluster object or name, and whether the data is saved to the object store or filesystem depends on whether a path is specified.

  • env (Optional[Env or str]) – Environment for the blob. If left empty, defaults to base environment. (Default: None)

  • data_config (Optional[Dict]) – The data config to pass to the underlying fsspec handler (in the case of saving the the filesystem).

  • load (bool) – Whether to try to load the Blob object from RNS. (Default: True)

  • dryrun (bool) – Whether to create the Blob if it doesn’t exist, or load a Blob object as a dryrun. (Default: False)

Returns:

The resulting blob.

Return type:

Blob

Example

>>> import runhouse as rh
>>> import json
>>>
>>> data = list(range(50)
>>> serialized_data = json.dumps(data)
>>>
>>> # Local blob with name and no path (saved to Runhouse object store)
>>> rh.blob(name="@/my-blob", data=data)
>>>
>>> # Remote blob with name and no path (saved to cluster's Runhouse object store)
>>> rh.blob(name="@/my-blob", data=data, system=my_cluster)
>>>
>>> # Remote blob with name, filesystem, and no path (saved to filesystem with default path)
>>> rh.blob(name="@/my-blob", data=serialized_data, system="s3")
>>>
>>> # Remote blob with name and path (saved to remote filesystem)
>>> rh.blob(name='@/my-blob', data=serialized_data, path='/runhouse-tests/my_blob.pickle', system='s3')
>>>
>>> # Local blob with path and no system (saved to local filesystem)
>>> rh.blob(data=serialized_data, path=str(Path.cwd() / "my_blob.pickle"))
>>> # Loading a blob
>>> my_local_blob = rh.blob(name="~/my_blob")
>>> my_s3_blob = rh.blob(name="@/my_blob")

Blob Class#

class runhouse.Blob(name: str | None = None, system: Cluster | str | None = None, env: Env | None = None, dryrun: bool = False, **kwargs)[source]#
__init__(name: str | None = None, system: Cluster | str | None = None, env: Env | None = None, dryrun: bool = False, **kwargs)[source]#

Runhouse Blob object

Note

To build a Blob, please use the factory method blob().

exists_in_system()[source]#

Check whether the blob exists in the file system

Example

>>> blob = rh.blob(data)
>>> blob.exists_in_system()
resolved_state(_state_dict=None)[source]#

Return the resolved state of the blob, which is the data.

Primarily used to define the behavior of the fetch method.

Example

>>> blob = rh.blob(data)
>>> blob.resolved_state()
rm()[source]#

Delete the blob from wherever it’s stored.

Example

>>> blob = rh.blob(data)
>>> blob.rm()
to(system: str | Cluster, env: str | Env | None = None, path: str | None = None, data_config: dict | None = None)[source]#

Return a copy of the blob on the destination system, and optionally path.

Example

>>> local_blob = rh.blob(data)
>>> s3_blob = blob.to("s3")
>>> cluster_blob = blob.to(my_cluster)
write(data)[source]#

Save the underlying blob to its cluster’s store.

Example

>>> rh.blob(data).write()