API Reference¶
adlfs.AzureBlobFileSystem
provides an interface for Azure Blob Storage.
- class adlfs.AzureBlobFileSystem(*args, **kwargs)¶
Bases:
AsyncFileSystem
Access Azure Datalake Gen2 and Azure Storage if it were a file system using Multiprotocol Access
- Parameters:
- account_name: str
The storage account name. This is used to authenticate requests signed with an account key and to construct the storage endpoint. It is required unless a connection string is given, or if a custom domain is used with anonymous authentication.
- account_key: str
The storage account key. This is used for shared key authentication. If any of account key, sas token or client_id is specified, anonymous access will be used.
- sas_token: str
A shared access signature token to use to authenticate requests instead of the account key. If account key and sas token are both specified, account key will be used to sign. If any of account key, sas token or client_id are specified, anonymous access will be used.
- request_session: Session
The session object to use for http requests.
- connection_string: str
If specified, this will override all other parameters besides request session. See http://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string/ for the connection string format.
- credential: azure.core.credentials_async.AsyncTokenCredential or SAS token
The credentials with which to authenticate. Optional if the account URL already has a SAS token. Can include an instance of TokenCredential class from azure.identity.aio.
- blocksize: int
The block size to use for download/upload operations. Defaults to hardcoded value of
BlockBlobService.MAX_BLOCK_SIZE
- client_id: str
Client ID to use when authenticating using an AD Service Principal client/secret.
- client_secret: str
Client secret to use when authenticating using an AD Service Principal client/secret.
- tenant_id: str
Tenant ID to use when authenticating using an AD Service Principal client/secret.
- anon: boolean, optional
The value to use for whether to attempt anonymous access if no other credential is passed. By default (
None
), theAZURE_STORAGE_ANON
environment variable is checked. False values (false
,0
,f
) will resolve to False and anonymous access will not be attempted. Otherwise the value foranon
resolves toTrue
.- default_fill_cache: bool = True
Whether to use cache filling with open by default
- default_cache_type: string (‘bytes’)
If given, the default cache_type value used for “open()”. Set to none if no caching is desired. Docs in fsspec
- version_awarebool (False)
Whether to support blob versioning. If enable this will require the user to have the necessary permissions for dealing with versioned blobs.
- assume_container_exists: Optional[bool] (None)
Set this to true to not check for existence of containers at all, assuming they exist. None (default) means to warn in case of a failure when checking for existence of a container False throws if retrieving container properties fails, which might happen if your authentication is only valid at the storage container level, and not the storage account level.
- max_concurrency:
The number of concurrent connections to use when uploading or downloading a blob. If None it will be inferred from fsspec.asyn._get_batch_size().
- timeout: int
Sets the server-side timeout when uploading or downloading a blob.
- connection_timeout: int
The number of seconds the client will wait to establish a connection to the server when uploading or downloading a blob.
- read_timeout: int
The number of seconds the client will wait, between consecutive read operations, for a response from the server while uploading or downloading a blob.
- account_host: str
The storage account host. This string is the entire url to the for the storage after the https://, i.e. “https://{account_host}”. This parameter is only required for Azure clouds where account urls do not end with “blob.core.windows.net”. Note that the account_name parameter is still required.
- Pass on to fsspec:
- skip_instance_cache: to control reuse of instances
- use_listings_cache, listings_expiry_time, max_paths: to control reuse of directory listings
- Attributes:
fsid
Persistent filesystem id that can be used to compare filesystems across sessions.
- loop
transaction
A context within which files are committed together upon exit
Methods
cat
(path[, recursive, on_error])Fetch (potentially multiple) paths' contents Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded on_error : "raise", "omit", "return" If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if "return", all keys are included in the output, but the value will be bytes or an exception instance.
cat_file
(path[, start, end])Get the content of a file
cat_ranges
(paths, starts, ends[, max_gap, ...])Get the contents of byte ranges from one or more files
checksum
(path)Unique value for current version of file
clear_instance_cache
()Clear the cache of filesystem instances.
copy
(path1, path2[, recursive, maxdepth, ...])Copy within two locations in the filesystem
cp
(path1, path2, **kwargs)Alias of AbstractFileSystem.copy.
cp_file
(path1, path2, **kwargs)Copy the file at path1 to path2
created
(path)Return the created timestamp of a file as a datetime.datetime
current
()Return the most recently instantiated FileSystem
delete
(path[, recursive, maxdepth])Alias of AbstractFileSystem.rm.
disk_usage
(path[, total, maxdepth])Alias of AbstractFileSystem.du.
Connect to the BlobServiceClient, using user-specified connection details.
download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get.
du
(path[, total, maxdepth, withdirs])Space used by files and optionally directories within a path
end_transaction
()Finish write transaction, non-context version
exists
(path)Is there a file at the given path
expand_path
(path[, recursive, maxdepth, ...])Turn one or more globs or directories into a list of all matching paths to files or directories.
find
(path[, maxdepth, withdirs, detail])List all files below path.
from_dict
(dct)Recreate a filesystem instance from dictionary representation.
from_json
(blob)Recreate a filesystem instance from JSON representation.
get
(rpath, lpath[, recursive, callback, ...])Copy file(s) to local.
get_file
(rpath, lpath[, recursive, ...])Copy single file remote to local
get_mapper
([root, check, create, ...])Create key/value store based on this file-system
glob
(path[, maxdepth])Find files by glob-matching.
head
(path[, size])Get the first
size
bytes from fileinfo
(path, **kwargs)Give details of entry at path
invalidate_cache
([path])Discard any cached directory information
isdir
(path)Is this entry directory-like?
isfile
(path)Is this entry file-like?
lexists
(path, **kwargs)If there is a file at the given path (including broken links)
listdir
(path[, detail])Alias of AbstractFileSystem.ls.
ls
(path[, detail])List objects at path.
makedir
(path[, exist_ok])Create directory entry at path
makedirs
(path[, exist_ok])Recursively make directories
mkdir
(path[, create_parents, delimiter])Mkdir is a no-op for creating anything except top-level containers.
mkdirs
(path[, exist_ok])Alias of AbstractFileSystem.makedirs.
modified
(path)Return the modified timestamp of a file as a datetime.datetime
move
(path1, path2, **kwargs)Alias of AbstractFileSystem.mv.
mv
(path1, path2[, recursive, maxdepth])Move file(s) from one location to another
open
(path[, mode, block_size, ...])Return a file-like object from the filesystem
pipe
(path[, value])Put value into path
pipe_file
(path, value[, overwrite, ...])Set the bytes of given file
put
(lpath, rpath[, recursive, callback, ...])Copy file(s) from local.
put_file
(lpath, rpath[, delimiter, ...])Copy single file to remote
read_block
(fn, offset, length[, delimiter])Read a block of bytes from
read_bytes
(path[, start, end])Alias of AbstractFileSystem.cat_file.
read_text
(path[, encoding, errors, newline])Get the contents of the file as a string.
rename
(path1, path2, **kwargs)Alias of AbstractFileSystem.mv.
rm
(path[, recursive, maxdepth, delimiter, ...])Delete files.
rm_file
(path)Delete a file
rmdir
(path[, delimiter])Remove a directory, if empty
sign
(path[, expiration])Create a signed URL representing the given path.
size
(path)Size in bytes of file
sizes
(paths)Size in bytes of each file in a list of paths
split_path
(path[, delimiter, return_container])Normalize ABFS path string into bucket and key.
start_transaction
()Begin write transaction for deferring files, non-context version
stat
(path, **kwargs)Alias of AbstractFileSystem.info.
tail
(path[, size])Get the last
size
bytes from fileto_dict
(*[, include_password])JSON-serializable dictionary representation of this filesystem instance.
to_json
(*[, include_password])JSON representation of this filesystem instance.
touch
(path[, truncate])Create empty file, or update timestamp
transaction_type
alias of
Transaction
ukey
(path)Hash of file properties, to tell if it has changed
unstrip_protocol
(name)Format FS-specific path to generic, including protocol
upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put.
walk
(path[, maxdepth, topdown, on_error])Return all files belows path
write_bytes
(path, value, **kwargs)Alias of AbstractFileSystem.pipe_file.
write_text
(path, value[, encoding, errors, ...])Write the text to the given file.
getxattr
open_async
setxattrs
url
Examples
Authentication with an account_key
>>> abfs = AzureBlobFileSystem(account_name="XXXX", account_key="XXXX") >>> abfs.ls('')
Authentication with an Azure ServicePrincipal
>>> abfs = AzureBlobFileSystem(account_name="XXXX", tenant_id=TENANT_ID, ... client_id=CLIENT_ID, client_secret=CLIENT_SECRET) >>> abfs.ls('')
Authentication with DefaultAzureCredential
>>> abfs = AzureBlobFileSystem(account_name="XXXX", anon=False) >>> abfs.ls('')
Read files as
>>> ddf = dd.read_csv('abfs://container_name/folder/*.csv', storage_options={ ... 'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, ... 'client_secret': CLIENT_SECRET}) ... })
Sharded Parquet & csv files can be read as:
>>> ddf = dd.read_csv('abfs://container_name/folder/*.csv', storage_options={ ... 'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}) >>> ddf = dd.read_parquet('abfs://container_name/folder.parquet', storage_options={ ... 'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY,})
- cat(path, recursive=False, on_error='raise', **kwargs)¶
Fetch (potentially multiple) paths’ contents Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded on_error : “raise”, “omit”, “return”
If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
- cp_file(path1, path2, **kwargs)¶
Copy the file at path1 to path2
- created(path: str) datetime ¶
Return the created timestamp of a file as a datetime.datetime
- do_connect()¶
Connect to the BlobServiceClient, using user-specified connection details. Tries credentials first, then connection string and finally account key
- Raises:
- ValueError if none of the connection details are available
- download(rpath, lpath, recursive=False, **kwargs)¶
Alias of FilesystemSpec.get.
- exists(path)¶
Is there a file at the given path
- expand_path(path, recursive=False, maxdepth=None, skip_noexist=True)¶
Turn one or more globs or directories into a list of all matching paths to files or directories.
kwargs are passed to
glob
orfind
, which may in turn callls
- get_file(rpath, lpath, recursive=False, delimiter='/', callback=None, max_concurrency=None, **kwargs)¶
Copy single file remote to local
- invalidate_cache(path=None)¶
Discard any cached directory information
- Parameters:
- path: string or None
If None, clear all listings cached else listings at or under given path.
- isdir(path)¶
Is this entry directory-like?
- isfile(path)¶
Is this entry file-like?
- makedir(path, exist_ok=False)¶
Create directory entry at path
- Parameters:
- path: str
The path to create
- delimiter: str
Delimiter to use when splitting the path
- exist_ok: bool
If False (default), raise an error if the directory already exists.
- mkdir(path, create_parents=True, delimiter='/', **kwargs)¶
Mkdir is a no-op for creating anything except top-level containers. This aligns to the Azure Blob Filesystem flat hierarchy
- Parameters:
- path: str
The path to create
- create_parents: bool
If True (default), create the Azure Container if it does not exist
- delimiter: str
Delimiter to use when splitting the path
- modified(path: str) datetime ¶
Return the modified timestamp of a file as a datetime.datetime
- pipe_file(path, value, overwrite=True, max_concurrency=None, **kwargs)¶
Set the bytes of given file
- put_file(lpath, rpath, delimiter='/', overwrite=True, callback=None, max_concurrency=None, **kwargs)¶
Copy single file to remote
- Parameters:
lpath – Path to local file
rpath – Path to remote file
delimitier – Filepath delimiter
overwrite – Boolean (True). Whether to overwrite any existing file (True) or raise if one already exists (False).
- rm(path: str | List[str], recursive: bool = False, maxdepth: int | None = None, delimiter: str = '/', expand_path: bool = True, **kwargs)¶
Delete files.
- Parameters:
- path: str or list of str
File(s) to delete.
- recursive: bool
Defaults to False. If file(s) are directories, recursively delete contents and then also remove the directory. Only used if expand_path.
- maxdepth: int or None
Defaults to None. Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible. Only used if expand_path.
- expand_path: bool
Defaults to True. If False, self._expand_path call will be skipped. This is more efficient when you don’t need the operation.
- rmdir(path: str, delimiter='/', **kwargs)¶
Remove a directory, if empty
- sign(path, expiration=100, **kwargs)¶
Create a signed URL representing the given path.
- size(path)¶
Size in bytes of file
- split_path(path, delimiter='/', return_container: bool = False, **kwargs) Tuple[str, str, str | None] ¶
Normalize ABFS path string into bucket and key.
- Parameters:
- pathstring
Input path, like abfs://my_container/path/to/file
- delimiter: string
Delimiter used to split the path
- return_container: bool
Examples
>>> split_path("abfs://my_container/path/to/file") ['my_container', 'path/to/file']
>>> split_path("abfs://my_container/path/to/versioned_file?versionid=some_version_id") ['my_container', 'path/to/versioned_file', 'some_version_id']
- upload(lpath, rpath, recursive=False, **kwargs)¶
Alias of FilesystemSpec.put.
- class adlfs.AzureBlobFile(fs: AzureBlobFileSystem, path: str, mode: str = 'rb', block_size='default', autocommit: bool = True, cache_type: str = 'bytes', cache_options: dict = {}, metadata=None, version_id: str | None = None, **kwargs)¶
Bases:
AbstractBufferedFile
File-like operations on Azure Blobs
- Attributes:
- closed
- details
- full_name
Methods
close
()Close file and azure client.
commit
()Move from temp to final destination
Connect to the Asynchronous BlobServiceClient, using user-specified connection details.
discard
()Throw away temporary file
fileno
(/)Return underlying file descriptor if one exists.
flush
([force])Write buffered data to backend store.
info
()File information about this path
isatty
(/)Return whether this is an 'interactive' stream.
read
([length])Return data from cache, or fetch pieces as necessary
readable
()Whether opened for reading
readinto
(b)mirrors builtin file's readinto method
readline
()Read until first occurrence of newline character
readlines
()Return all data, split by the newline character
readuntil
([char, blocks])Return data between current position and first occurrence of char
seek
(loc[, whence])Set current file location
seekable
()Whether is seekable (only in read mode)
tell
()Current file location
truncate
([size])Truncate file to size bytes.
writable
()Whether opened for writing
write
(data)Write data to buffer.
writelines
(lines, /)Write a list of lines to stream.
readinto1
- close()¶
Close file and azure client.
- connect_client()¶
Connect to the Asynchronous BlobServiceClient, using user-specified connection details. Tries credentials first, then connection string and finally account key
- Raises:
- ValueError if none of the connection details are available