adlfs¶
adlfs provides an fsspec-compatible interface to Azure Blob storage, Azure Data Lake Storage Gen2, and Azure Data Lake Storage Gen1.
Installation¶
adlfs can be installed using pip
pip install adlfs
or conda from the conda-forge channel
conda install -c conda-forge adlfs
fsspec protocols¶
adlfs registers the following protocols with fsspec.
protocol |
filesystem |
|---|---|
|
|
|
|
|
|
Authentication¶
The AzureBlobFileSystem implementation uses the azure.storage.blob library internally. For the most
part, you can authenticate with Azure using any of the methods it supports.
For anonymous authentication, simply provide the storage account name:
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest")
For operations to succeed, the storage container must allow anonymous access.
For authenticated access, you have several options:
Using a
SAS_TOKENUsing an account key
Using a managed identity
Regardless of the method your using, you provide the values using the credential argument.
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest", credential=SAS_TOKEN)
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest", credential=ACCOUNT_KEY)
>>> fs = adlfs.AzureBlobFileSystem(
... account_name="ai4edataeuwest",
... credential=azure.identity.DefaultAzureCredential()
... )
Additionally, some methods will include the account URL and authentication credentials in a connection string. To use this, provide just connection_string:
>>> fs = adlfs.AzureBlobFileSystem(connection_string=CONNECTION_STRING)
Usage¶
See the fsspec documentation on usage.
Note that adlfs generally uses just the “name” portion of an account name. For example, you would provide
account_name="ai4edataeuwest" rather than account_name="https://ai4edataeuwest.blob.core.windows.net".
When working with Azure Blob Storage, the container name is included in path operations. For example,
to list all the files or directories in the top-level of a storage container, you would call fs.ls("<container_name>"):
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest")
>>> fs.ls("gbif")
['gbif/occurrence']
Note: When uploading a blob (with write_bytes or write_text) you can injects kwargs directly into upload_blob method:
>>> from azure.storage.blob import ContentSettings
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest")
>>> fs.write_bytes(path="path", value=data, overwrite=True, **{"content_settings": ContentSettings(content_type="application/json", content_encoding="br")})