adlfs¶
adlfs
provides an fsspec
-compatible interface to Azure Blob storage, Azure Data Lake Storage Gen2, and Azure Data Lake Storage Gen1.
Installation¶
adlfs
can be installed using pip
pip install adlfs
or conda from the conda-forge channel
conda install -c conda-forge adlfs
fsspec
protocols¶
adlfs
registers the following protocols with fsspec
.
protocol |
filesystem |
---|---|
|
|
|
|
|
|
Authentication¶
The AzureBlobFileSystem
implementation uses the azure.storage.blob
library internally. For the most
part, you can authenticate with Azure using any of the methods it supports.
For anonymous authentication, simply provide the storage account name:
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest")
For operations to succeed, the storage container must allow anonymous access.
For authenticated access, you have several options:
Using a
SAS_TOKEN
Using an account key
Using a managed identity
Regardless of the method your using, you provide the values using the credential
argument.
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest", credential=SAS_TOKEN)
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest", credential=ACCOUNT_KEY)
>>> fs = adlfs.AzureBlobFileSystem(
... account_name="ai4edataeuwest",
... credential=azure.identity.DefaultAzureCredential()
... )
Additionally, some methods will include the account URL and authentication credentials in a connection string. To use this, provide just connection_string
:
>>> fs = adlfs.AzureBlobFileSystem(connection_string=CONNECTION_STRING)
Usage¶
See the fsspec documentation on usage.
Note that adlfs
generally uses just the “name” portion of an account name. For example, you would provide
account_name="ai4edataeuwest"
rather than account_name="https://ai4edataeuwest.blob.core.windows.net"
.
When working with Azure Blob Storage, the container name is included in path operations. For example,
to list all the files or directories in the top-level of a storage container, you would call fs.ls("<container_name>")
:
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest")
>>> fs.ls("gbif")
['gbif/occurrence']
Note: When uploading a blob (with write_bytes
or write_text
) you can injects kwargs directly into upload_blob
method:
>>> from azure.storage.blob import ContentSettings
>>> fs = adlfs.AzureBlobFileSystem(account_name="ai4edataeuwest")
>>> fs.write_bytes(path="path", value=data, overwrite=True, **{"content_settings": ContentSettings(content_type="application/json", content_encoding="br")})