API Reference#

adlfs.AzureBlobFileSystem provides an interface for Azure Blob Storage.

class adlfs.AzureBlobFileSystem(*args, **kwargs)#

Bases: AsyncFileSystem

Access Azure Datalake Gen2 and Azure Storage if it were a file system using Multiprotocol Access

Parameters:
account_name: str

The storage account name. This is used to authenticate requests signed with an account key and to construct the storage endpoint. It is required unless a connection string is given, or if a custom domain is used with anonymous authentication.

account_key: str

The storage account key. This is used for shared key authentication. If any of account key, sas token or client_id is specified, anonymous access will be used.

sas_token: str

A shared access signature token to use to authenticate requests instead of the account key. If account key and sas token are both specified, account key will be used to sign. If any of account key, sas token or client_id are specified, anonymous access will be used.

request_session: Session

The session object to use for http requests.

connection_string: str

If specified, this will override all other parameters besides request session. See http://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string/ for the connection string format.

credential: azure.core.credentials_async.AsyncTokenCredential or SAS token

The credentials with which to authenticate. Optional if the account URL already has a SAS token. Can include an instance of TokenCredential class from azure.identity.aio.

blocksize: int

The block size to use for download/upload operations. Defaults to hardcoded value of BlockBlobService.MAX_BLOCK_SIZE

client_id: str

Client ID to use when authenticating using an AD Service Principal client/secret.

client_secret: str

Client secret to use when authenticating using an AD Service Principal client/secret.

tenant_id: str

Tenant ID to use when authenticating using an AD Service Principal client/secret.

anon: boolean, optional

The value to use for whether to attempt anonymous access if no other credential is passed. By default (None), the AZURE_STORAGE_ANON environment variable is checked. False values (false, 0, f) will resolve to False and anonymous access will not be attempted. Otherwise the value for anon resolves to True.

default_fill_cache: bool = True

Whether to use cache filling with open by default

default_cache_type: string (‘bytes’)

If given, the default cache_type value used for “open()”. Set to none if no caching is desired. Docs in fsspec

version_awarebool (False)

Whether to support blob versioning. If enable this will require the user to have the necessary permissions for dealing with versioned blobs.

assume_container_exists: Optional[bool] (None)

Set this to true to not check for existence of containers at all, assuming they exist. None (default) means to warn in case of a failure when checking for existence of a container False throws if retrieving container properties fails, which might happen if your authentication is only valid at the storage container level, and not the storage account level.

max_concurrency:

The number of concurrent connections to use when uploading or downloading a blob. If None it will be inferred from fsspec.asyn._get_batch_size().

timeout: int

Sets the server-side timeout when uploading or downloading a blob.

connection_timeout: int

The number of seconds the client will wait to establish a connection to the server when uploading or downloading a blob.

read_timeout: int

The number of seconds the client will wait, between consecutive read operations, for a response from the server while uploading or downloading a blob.

Pass on to fsspec:
skip_instance_cache: to control reuse of instances
use_listings_cache, listings_expiry_time, max_paths: to control reuse of directory listings

Examples

Authentication with an account_key

>>> abfs = AzureBlobFileSystem(account_name="XXXX", account_key="XXXX")
>>> abfs.ls('')

Authentication with an Azure ServicePrincipal

>>> abfs = AzureBlobFileSystem(account_name="XXXX", tenant_id=TENANT_ID,
...                            client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
>>> abfs.ls('')

Authentication with DefaultAzureCredential

>>> abfs = AzureBlobFileSystem(account_name="XXXX", anon=False)
>>> abfs.ls('')

Read files as

>>> ddf = dd.read_csv('abfs://container_name/folder/*.csv', storage_options={
...     'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID,
...     'client_secret': CLIENT_SECRET})
... })

Sharded Parquet & csv files can be read as:

>>> ddf = dd.read_csv('abfs://container_name/folder/*.csv', storage_options={
...                   'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY})
>>> ddf = dd.read_parquet('abfs://container_name/folder.parquet', storage_options={
...                       'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY,})
Attributes:
fsid

Persistent filesystem id that can be used to compare filesystems across sessions.

loop
transaction

A context within which files are committed together upon exit

Methods

cat(path[, recursive, on_error])

Fetch (potentially multiple) paths' contents Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded on_error : "raise", "omit", "return" If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if "return", all keys are included in the output, but the value will be bytes or an exception instance.

cat_file(path[, start, end])

Get the content of a file

cat_ranges(paths, starts, ends[, max_gap, ...])

Get the contents of byte ranges from one or more files

checksum(path)

Unique value for current version of file

clear_instance_cache()

Clear the cache of filesystem instances.

copy(path1, path2[, recursive, maxdepth, ...])

Copy within two locations in the filesystem

cp(path1, path2, **kwargs)

Alias of AbstractFileSystem.copy.

cp_file(path1, path2, **kwargs)

Copy the file at path1 to path2

created(path)

Return the created timestamp of a file as a datetime.datetime

current()

Return the most recently instantiated FileSystem

delete(path[, recursive, maxdepth])

Alias of AbstractFileSystem.rm.

disk_usage(path[, total, maxdepth])

Alias of AbstractFileSystem.du.

do_connect()

Connect to the BlobServiceClient, using user-specified connection details.

download(rpath, lpath[, recursive])

Alias of FilesystemSpec.get.

du(path[, total, maxdepth, withdirs])

Space used by files and optionally directories within a path

end_transaction()

Finish write transaction, non-context version

exists(path)

Is there a file at the given path

expand_path(path[, recursive, maxdepth, ...])

Turn one or more globs or directories into a list of all matching paths to files or directories.

find(path[, maxdepth, withdirs, detail])

List all files below path.

from_json(blob)

Recreate a filesystem instance from JSON representation

get(rpath, lpath[, recursive, callback, ...])

Copy file(s) to local.

get_file(rpath, lpath[, recursive, ...])

Copy single file remote to local

get_mapper([root, check, create, ...])

Create key/value store based on this file-system

glob(path[, maxdepth])

Find files by glob-matching.

head(path[, size])

Get the first size bytes from file

info(path, **kwargs)

Give details of entry at path

invalidate_cache([path])

Discard any cached directory information

isdir(path)

Is this entry directory-like?

isfile(path)

Is this entry file-like?

lexists(path, **kwargs)

If there is a file at the given path (including broken links)

listdir(path[, detail])

Alias of AbstractFileSystem.ls.

ls(path[, detail])

List objects at path.

makedir(path[, exist_ok])

Create directory entry at path

makedirs(path[, exist_ok])

Recursively make directories

mkdir(path[, create_parents, delimiter])

Mkdir is a no-op for creating anything except top-level containers.

mkdirs(path[, exist_ok])

Alias of AbstractFileSystem.makedirs.

modified(path)

Return the modified timestamp of a file as a datetime.datetime

move(path1, path2, **kwargs)

Alias of AbstractFileSystem.mv.

mv(path1, path2[, recursive, maxdepth])

Move file(s) from one location to another

open(path[, mode, block_size, ...])

Return a file-like object from the filesystem

pipe(path[, value])

Put value into path

pipe_file(path, value[, overwrite, ...])

Set the bytes of given file

put(lpath, rpath[, recursive, callback, ...])

Copy file(s) from local.

put_file(lpath, rpath[, delimiter, ...])

Copy single file to remote

read_block(fn, offset, length[, delimiter])

Read a block of bytes from

read_bytes(path[, start, end])

Alias of AbstractFileSystem.cat_file.

read_text(path[, encoding, errors, newline])

Get the contents of the file as a string.

rename(path1, path2, **kwargs)

Alias of AbstractFileSystem.mv.

rm(path[, recursive, maxdepth, delimiter, ...])

Delete files.

rm_file(path)

Delete a file

rmdir(path[, delimiter])

Remove a directory, if empty

sign(path[, expiration])

Create a signed URL representing the given path.

size(path)

Size in bytes of file

sizes(paths)

Size in bytes of each file in a list of paths

split_path(path[, delimiter, return_container])

Normalize ABFS path string into bucket and key.

start_transaction()

Begin write transaction for deferring files, non-context version

stat(path, **kwargs)

Alias of AbstractFileSystem.info.

tail(path[, size])

Get the last size bytes from file

to_json()

JSON representation of this filesystem instance

touch(path[, truncate])

Create empty file, or update timestamp

transaction_type

alias of Transaction

ukey(path)

Hash of file properties, to tell if it has changed

unstrip_protocol(name)

Format FS-specific path to generic, including protocol

upload(lpath, rpath[, recursive])

Alias of FilesystemSpec.put.

walk(path[, maxdepth, topdown, on_error])

Return all files belows path

write_bytes(path, value, **kwargs)

Alias of AbstractFileSystem.pipe_file.

write_text(path, value[, encoding, errors, ...])

Write the text to the given file.

getxattr

open_async

setxattrs

url

cat(path, recursive=False, on_error='raise', **kwargs)#

Fetch (potentially multiple) paths’ contents Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded on_error : “raise”, “omit”, “return”

If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.

cp_file(path1, path2, **kwargs)#

Copy the file at path1 to path2

created(path: str) datetime#

Return the created timestamp of a file as a datetime.datetime

do_connect()#

Connect to the BlobServiceClient, using user-specified connection details. Tries credentials first, then connection string and finally account key

Raises:
ValueError if none of the connection details are available
download(rpath, lpath, recursive=False, **kwargs)#

Alias of FilesystemSpec.get.

exists(path)#

Is there a file at the given path

expand_path(path, recursive=False, maxdepth=None, skip_noexist=True)#

Turn one or more globs or directories into a list of all matching paths to files or directories.

kwargs are passed to glob or find, which may in turn call ls

get_file(rpath, lpath, recursive=False, delimiter='/', callback=None, max_concurrency=None, **kwargs)#

Copy single file remote to local

invalidate_cache(path=None)#

Discard any cached directory information

Parameters:
path: string or None

If None, clear all listings cached else listings at or under given path.

isdir(path)#

Is this entry directory-like?

isfile(path)#

Is this entry file-like?

makedir(path, exist_ok=False)#

Create directory entry at path

Parameters:
path: str

The path to create

delimiter: str

Delimiter to use when splitting the path

exist_ok: bool

If False (default), raise an error if the directory already exists.

mkdir(path, create_parents=True, delimiter='/', **kwargs)#

Mkdir is a no-op for creating anything except top-level containers. This aligns to the Azure Blob Filesystem flat hierarchy

Parameters:
path: str

The path to create

create_parents: bool

If True (default), create the Azure Container if it does not exist

delimiter: str

Delimiter to use when splitting the path

modified(path: str) datetime#

Return the modified timestamp of a file as a datetime.datetime

pipe_file(path, value, overwrite=True, max_concurrency=None, **kwargs)#

Set the bytes of given file

put_file(lpath, rpath, delimiter='/', overwrite=True, callback=None, max_concurrency=None, **kwargs)#

Copy single file to remote

Parameters:
  • lpath – Path to local file

  • rpath – Path to remote file

  • delimitier – Filepath delimiter

  • overwrite – Boolean (True). Whether to overwrite any existing file (True) or raise if one already exists (False).

rm(path: str | List[str], recursive: bool = False, maxdepth: int | None = None, delimiter: str = '/', expand_path: bool = True, **kwargs)#

Delete files.

Parameters:
path: str or list of str

File(s) to delete.

recursive: bool

Defaults to False. If file(s) are directories, recursively delete contents and then also remove the directory. Only used if expand_path.

maxdepth: int or None

Defaults to None. Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible. Only used if expand_path.

expand_path: bool

Defaults to True. If False, self._expand_path call will be skipped. This is more efficient when you don’t need the operation.

rmdir(path: str, delimiter='/', **kwargs)#

Remove a directory, if empty

sign(path, expiration=100, **kwargs)#

Create a signed URL representing the given path.

size(path)#

Size in bytes of file

split_path(path, delimiter='/', return_container: bool = False, **kwargs) Tuple[str, str, str | None]#

Normalize ABFS path string into bucket and key.

Parameters:
pathstring

Input path, like abfs://my_container/path/to/file

delimiter: string

Delimiter used to split the path

return_container: bool

Examples

>>> split_path("abfs://my_container/path/to/file")
['my_container', 'path/to/file']
>>> split_path("abfs://my_container/path/to/versioned_file?versionid=some_version_id")
['my_container', 'path/to/versioned_file', 'some_version_id']
upload(lpath, rpath, recursive=False, **kwargs)#

Alias of FilesystemSpec.put.

class adlfs.AzureBlobFile(fs: AzureBlobFileSystem, path: str, mode: str = 'rb', block_size='default', autocommit: bool = True, cache_type: str = 'bytes', cache_options: dict = {}, metadata=None, version_id: str | None = None, **kwargs)#

Bases: AbstractBufferedFile

File-like operations on Azure Blobs

Attributes:
closed
details
full_name

Methods

close()

Close file and azure client.

commit()

Move from temp to final destination

connect_client()

Connect to the Asynchronous BlobServiceClient, using user-specified connection details.

discard()

Throw away temporary file

fileno(/)

Return underlying file descriptor if one exists.

flush([force])

Write buffered data to backend store.

info()

File information about this path

isatty(/)

Return whether this is an 'interactive' stream.

read([length])

Return data from cache, or fetch pieces as necessary

readable()

Whether opened for reading

readinto(b)

mirrors builtin file's readinto method

readline()

Read until first occurrence of newline character

readlines()

Return all data, split by the newline character

readuntil([char, blocks])

Return data between current position and first occurrence of char

seek(loc[, whence])

Set current file location

seekable()

Whether is seekable (only in read mode)

tell()

Current file location

truncate([size])

Truncate file to size bytes.

writable()

Whether opened for writing

write(data)

Write data to buffer.

writelines(lines, /)

Write a list of lines to stream.

readinto1

close()#

Close file and azure client.

connect_client()#

Connect to the Asynchronous BlobServiceClient, using user-specified connection details. Tries credentials first, then connection string and finally account key

Raises:
ValueError if none of the connection details are available