API Reference¶

adlfs.AzureBlobFileSystem provides an interface for Azure Blob Storage.

class adlfs.AzureBlobFileSystem(*args, **kwargs)¶

Bases: AsyncFileSystem

Access Azure Datalake Gen2 and Azure Storage if it were a file system using Multiprotocol Access

Parameters:

account_name: str: The storage account name. This is used to authenticate requests signed with an account key and to construct the storage endpoint. It is required unless a connection string is given, or if a custom domain is used with anonymous authentication.
account_key: str: The storage account key. This is used for shared key authentication. If any of account key, sas token or client_id is specified, anonymous access will be used.
sas_token: str: A shared access signature token to use to authenticate requests instead of the account key. If account key and sas token are both specified, account key will be used to sign. If any of account key, sas token or client_id are specified, anonymous access will be used.
request_session: Session: The session object to use for http requests.
connection_string: str: If specified, this will override all other parameters besides request session. See http://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string/ for the connection string format.
credential: azure.core.credentials_async.AsyncTokenCredential or SAS token: The credentials with which to authenticate. Optional if the account URL already has a SAS token. Can include an instance of TokenCredential class from azure.identity.aio.
blocksize: int: The block size to use for download/upload operations. Defaults to 50 MiB
client_id: str: Client ID to use when authenticating using an AD Service Principal client/secret.
client_secret: str: Client secret to use when authenticating using an AD Service Principal client/secret.
tenant_id: str: Tenant ID to use when authenticating using an AD Service Principal client/secret.
anon: boolean, optional: The value to use for whether to attempt anonymous access if no other credential is passed. By default (None), the AZURE_STORAGE_ANON environment variable is checked. False values (false, 0, f) will resolve to False and anonymous access will not be attempted. Otherwise the value for anon resolves to True.
default_fill_cache: bool = True: Whether to use cache filling with open by default
default_cache_type: string (‘bytes’): If given, the default cache_type value used for “open()”. Set to none if no caching is desired. Docs in fsspec
version_awarebool (False): Whether to support blob versioning. If enable this will require the user to have the necessary permissions for dealing with versioned blobs.
assume_container_exists: Optional[bool] (None): Set this to true to not check for existence of containers at all, assuming they exist. None (default) means to warn in case of a failure when checking for existence of a container False throws if retrieving container properties fails, which might happen if your authentication is only valid at the storage container level, and not the storage account level.
max_concurrency:: The number of concurrent connections to use when uploading or downloading a blob. If None it will be inferred from fsspec.asyn._get_batch_size().
timeout: int: Sets the server-side timeout when uploading or downloading a blob.
connection_timeout: int: The number of seconds the client will wait to establish a connection to the server when uploading or downloading a blob.
read_timeout: int: The number of seconds the client will wait, between consecutive read operations, for a response from the server while uploading or downloading a blob.
account_host: str: The storage account host. This string is the entire url to the for the storage after the https://, i.e. “https://{account_host}”. This parameter is only required for Azure clouds where account urls do not end with “blob.core.windows.net”. Note that the account_name parameter is still required.
Pass on to fsspec:
skip_instance_cache: to control reuse of instances
use_listings_cache, listings_expiry_time, max_paths: to control reuse of directory listings

Attributes:

fsid: Persistent filesystem id that can be used to compare filesystems across sessions.
loop
transaction: A context within which files are committed together upon exit

Methods

`cat`(path[, recursive, on_error])	Fetch (potentially multiple) paths' contents Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded on_error : "raise", "omit", "return" If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if "return", all keys are included in the output, but the value will be bytes or an exception instance.
`cat_file`(path[, start, end])	Get the content of a file
`cat_ranges`(paths, starts, ends[, max_gap, ...])	Get the contents of byte ranges from one or more files
`checksum`(path)	Unique value for current version of file
`clear_instance_cache`()	Clear the cache of filesystem instances.
`copy`(path1, path2[, recursive, maxdepth, ...])	Copy within two locations in the filesystem
`cp`(path1, path2, **kwargs)	Alias of AbstractFileSystem.copy.
`cp_file`(path1, path2, **kwargs)	Copy the file at path1 to path2
`created`(path)	Return the created timestamp of a file as a datetime.datetime
`current`()	Return the most recently instantiated FileSystem
`delete`(path[, recursive, maxdepth])	Alias of AbstractFileSystem.rm.
`disk_usage`(path[, total, maxdepth])	Alias of AbstractFileSystem.du.
`do_connect`()	Connect to the BlobServiceClient, using user-specified connection details.
`download`(rpath, lpath[, recursive])	Alias of FilesystemSpec.get.
`du`(path[, total, maxdepth, withdirs])	Space used by files and optionally directories within a path
`end_transaction`()	Finish write transaction, non-context version
`exists`(path)	Is there a file at the given path
`expand_path`(path[, recursive, maxdepth, ...])	Turn one or more globs or directories into a list of all matching paths to files or directories.
`find`(path[, maxdepth, withdirs, detail])	List all files below path.
`from_dict`(dct)	Recreate a filesystem instance from dictionary representation.
`from_json`(blob)	Recreate a filesystem instance from JSON representation.
`get`(rpath, lpath[, recursive, callback, ...])	Copy file(s) to local.
`get_file`(rpath, lpath[, recursive, ...])	Copy single file remote to local
`get_mapper`([root, check, create, ...])	Create key/value store based on this file-system
`glob`(path[, maxdepth])	Find files by glob-matching.
`head`(path[, size])	Get the first `size` bytes from file
`info`(path, **kwargs)	Give details of entry at path
`invalidate_cache`([path])	Discard any cached directory information
`isdir`(path)	Is this entry directory-like?
`isfile`(path)	Is this entry file-like?
`lexists`(path, **kwargs)	If there is a file at the given path (including broken links)
`listdir`(path[, detail])	Alias of AbstractFileSystem.ls.
`ls`(path[, detail])	List objects at path.
`makedir`(path[, exist_ok])	Create directory entry at path
`makedirs`(path[, exist_ok])	Recursively make directories
`mkdir`(path[, create_parents, delimiter])	Mkdir is a no-op for creating anything except top-level containers.
`mkdirs`(path[, exist_ok])	Alias of AbstractFileSystem.makedirs.
`modified`(path)	Return the modified timestamp of a file as a datetime.datetime
`move`(path1, path2, **kwargs)	Alias of AbstractFileSystem.mv.
`mv`(path1, path2[, recursive, maxdepth])	Move file(s) from one location to another
`open`(path[, mode, block_size, ...])	Return a file-like object from the filesystem
`pipe`(path[, value])	Put value into path
`pipe_file`(path, value[, overwrite, ...])	Set the bytes of given file
`put`(lpath, rpath[, recursive, callback, ...])	Copy file(s) from local.
`put_file`(lpath, rpath[, delimiter, ...])	Copy single file to remote
`read_block`(fn, offset, length[, delimiter])	Read a block of bytes from
`read_bytes`(path[, start, end])	Alias of AbstractFileSystem.cat_file.
`read_text`(path[, encoding, errors, newline])	Get the contents of the file as a string.
`rename`(path1, path2, **kwargs)	Alias of AbstractFileSystem.mv.
`rm`(path[, recursive, maxdepth, delimiter, ...])	Delete files.
`rm_file`(path)	Delete a file
`rmdir`(path[, delimiter])	Remove a directory, if empty
`sign`(path[, expiration])	Create a signed URL representing the given path.
`size`(path)	Size in bytes of file
`sizes`(paths)	Size in bytes of each file in a list of paths
`split_path`(path[, delimiter, return_container])	Normalize ABFS path string into bucket and key.
`start_transaction`()	Begin write transaction for deferring files, non-context version
`stat`(path, **kwargs)	Alias of AbstractFileSystem.info.
`tail`(path[, size])	Get the last `size` bytes from file
`to_dict`(*[, include_password])	JSON-serializable dictionary representation of this filesystem instance.
`to_json`(*[, include_password])	JSON representation of this filesystem instance.
`touch`(path[, truncate])	Create empty file, or update timestamp
`transaction_type`	alias of `Transaction`
`tree`([path, recursion_limit, max_display, ...])	Return a tree-like structure of the filesystem starting from the given path as a string.
`ukey`(path)	Hash of file properties, to tell if it has changed
`unstrip_protocol`(name)	Format FS-specific path to generic, including protocol
`upload`(lpath, rpath[, recursive])	Alias of FilesystemSpec.put.
`walk`(path[, maxdepth, topdown, on_error])	Return all files under the given path.
`write_bytes`(path, value, **kwargs)	Alias of AbstractFileSystem.pipe_file.
`write_text`(path, value[, encoding, errors, ...])	Write the text to the given file.

getxattr
open_async
setxattrs
url

Examples

Authentication with an account_key

>>> abfs = AzureBlobFileSystem(account_name="XXXX", account_key="XXXX")
>>> abfs.ls('')

Authentication with an Azure ServicePrincipal

>>> abfs = AzureBlobFileSystem(account_name="XXXX", tenant_id=TENANT_ID,
...                            client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
>>> abfs.ls('')

Authentication with DefaultAzureCredential

>>> abfs = AzureBlobFileSystem(account_name="XXXX", anon=False)
>>> abfs.ls('')

Read files as

>>> ddf = dd.read_csv('abfs://container_name/folder/*.csv', storage_options={
...     'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID,
...     'client_secret': CLIENT_SECRET})
... })

Sharded Parquet & csv files can be read as:

>>> ddf = dd.read_csv('abfs://container_name/folder/*.csv', storage_options={
...                   'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY})
>>> ddf = dd.read_parquet('abfs://container_name/folder.parquet', storage_options={
...                       'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY,})

cat(path, recursive=False, on_error='raise', **kwargs)¶: Fetch (potentially multiple) paths’ contents Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded on_error : “raise”, “omit”, “return”

If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.

cp_file(path1, path2, **kwargs)¶: Copy the file at path1 to path2

created(path: str) → datetime¶: Return the created timestamp of a file as a datetime.datetime

do_connect()¶

Connect to the BlobServiceClient, using user-specified connection details. Tries credentials first, then connection string and finally account key

Raises:

ValueError if none of the connection details are available

download(rpath, lpath, recursive=False, **kwargs)¶: Alias of FilesystemSpec.get.

exists(path)¶: Is there a file at the given path

expand_path(path, recursive=False, maxdepth=None, skip_noexist=True)¶

Turn one or more globs or directories into a list of all matching paths to files or directories.

kwargs are passed to glob or find, which may in turn call ls

get_file(rpath, lpath, recursive=False, delimiter='/', callback=None, max_concurrency=None, **kwargs)¶: Copy single file remote to local

invalidate_cache(path=None)¶

Discard any cached directory information

Parameters:

path: string or None: If None, clear all listings cached else listings at or under given path.

isdir(path)¶: Is this entry directory-like?

isfile(path)¶: Is this entry file-like?

makedir(path, exist_ok=False)¶

Create directory entry at path

Parameters:

path: str: The path to create
delimiter: str: Delimiter to use when splitting the path
exist_ok: bool: If False (default), raise an error if the directory already exists.

mkdir(path, create_parents=True, delimiter='/', **kwargs)¶

Mkdir is a no-op for creating anything except top-level containers. This aligns to the Azure Blob Filesystem flat hierarchy

Parameters:

path: str: The path to create
create_parents: bool: If True (default), create the Azure Container if it does not exist
delimiter: str: Delimiter to use when splitting the path

modified(path: str) → datetime¶: Return the modified timestamp of a file as a datetime.datetime

pipe_file(path, value, overwrite=True, max_concurrency=None, **kwargs)¶: Set the bytes of given file

put_file(lpath, rpath, delimiter='/', overwrite=True, callback=None, max_concurrency=None, **kwargs)¶

Copy single file to remote

Parameters:

lpath – Path to local file
rpath – Path to remote file
delimitier – Filepath delimiter
overwrite – Boolean (True). Whether to overwrite any existing file (True) or raise if one already exists (False).

rm(path: str | List[str], recursive: bool = False, maxdepth: int | None = None, delimiter: str = '/', expand_path: bool = True, **kwargs)¶

Delete files.

Parameters:

path: str or list of str: File(s) to delete.
recursive: bool: Defaults to False. If file(s) are directories, recursively delete contents and then also remove the directory. Only used if expand_path.
maxdepth: int or None: Defaults to None. Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible. Only used if expand_path.
expand_path: bool: Defaults to True. If False, self._expand_path call will be skipped. This is more efficient when you don’t need the operation.

rmdir(path: str, delimiter='/', **kwargs)¶: Remove a directory, if empty

sign(path, expiration=100, **kwargs)¶: Create a signed URL representing the given path.

size(path)¶: Size in bytes of file

split_path(path, delimiter='/', return_container: bool = False, **kwargs) → Tuple[str, str, str | None]¶

Normalize ABFS path string into bucket and key.

Parameters:

pathstring: Input path, like abfs://my_container/path/to/file
delimiter: string: Delimiter used to split the path
return_container: bool

Examples

>>> split_path("abfs://my_container/path/to/file")
['my_container', 'path/to/file']

>>> split_path("abfs://my_container/path/to/versioned_file?versionid=some_version_id")
['my_container', 'path/to/versioned_file', 'some_version_id']

upload(lpath, rpath, recursive=False, **kwargs)¶: Alias of FilesystemSpec.put.

class adlfs.AzureBlobFile(fs: AzureBlobFileSystem, path: str, mode: str = 'rb', block_size='default', autocommit: bool = True, cache_type: str = 'bytes', cache_options: dict = {}, metadata=None, version_id: str | None = None, **kwargs)¶

Bases: AbstractBufferedFile

File-like operations on Azure Blobs

Attributes:

closed
details
full_name

Methods

`close`()	Close file and azure client.
`commit`()	Move from temp to final destination
`connect_client`()	Connect to the Asynchronous BlobServiceClient, using user-specified connection details.
`discard`()	Throw away temporary file
`fileno`(/)	Return underlying file descriptor if one exists.
`flush`([force])	Write buffered data to backend store.
`info`()	File information about this path
`isatty`(/)	Return whether this is an 'interactive' stream.
`read`([length])	Return data from cache, or fetch pieces as necessary
`readable`()	Whether opened for reading
`readinto`(b)	mirrors builtin file's readinto method
`readline`()	Read until and including the first occurrence of newline character
`readlines`()	Return all data, split by the newline character, including the newline character
`readuntil`([char, blocks])	Return data between current position and first occurrence of char
`seek`(loc[, whence])	Set current file location
`seekable`()	Whether is seekable (only in read mode)
`tell`()	Current file location
`truncate`([size])	Truncate file to size bytes.
`writable`()	Whether opened for writing
`write`(data)	Write data to buffer.
`writelines`(lines, /)	Write a list of lines to stream.

readinto1

close()¶: Close file and azure client.

connect_client()¶

Connect to the Asynchronous BlobServiceClient, using user-specified connection details. Tries credentials first, then connection string and finally account key

Raises:

ValueError if none of the connection details are available