Python Integration
The Oceanum Python library includes a storage module that follows the fsspec specification, providing seamless integration with Oceanum Storage in your Python scripts and notebooks.
Installation
Install the Oceanum library:
pip install oceanum
Authentication
The storage module uses your Datamesh token for authentication. You can either:
- Set the
DATAMESH_TOKENenvironment variable:
export DATAMESH_TOKEN="your-datamesh-token"
- Or pass the token directly to functions:
from oceanum import storage
storage.ls("/", token="your-datamesh-token")
To obtain your Datamesh token, see the Token documentation.
Simple Functions
The storage module provides simple functions for common operations:
List Files
from oceanum import storage
# List root directory
files = storage.ls("/")
for f in files:
print(f)
# List with details
files = storage.ls("/my-folder", detail=True)
for f in files:
print(f"{f['name']} - {f['size']} bytes")
# Recursive listing
files = storage.ls("/my-folder", recursive=True)
Upload Files
from oceanum import storage
# Upload a single file
storage.put("local_file.nc", "/remote/path/file.nc")
# Upload a directory recursively
storage.put("./local_folder", "/remote/folder", recursive=True)
Download Files
from oceanum import storage
# Download a single file
storage.get("/remote/path/file.nc", "local_file.nc")
# Download a directory recursively
storage.get("/remote/folder", "./local_folder", recursive=True)
Delete Files
from oceanum import storage
# Delete a file
storage.rm("/remote/path/old_file.nc")
# Delete a directory recursively
storage.rm("/remote/folder", recursive=True)
Check Files
from oceanum import storage
# Check if path exists
if storage.exists("/remote/path/file.nc"):
print("File exists")
# Check if path is a file
if storage.isfile("/remote/path/file.nc"):
print("It's a file")
# Check if path is a directory
if storage.isdir("/remote/folder"):
print("It's a directory")
FileSystem Class
For more control, use the FileSystem class directly:
from oceanum.storage import FileSystem
# Initialize with token
fs = FileSystem(token="your-datamesh-token")
# List files
files = fs.ls("/my-folder")
# Get file info
info = fs.info("/my-folder/file.nc")
print(f"Size: {info['size']}, Modified: {info['mtime']}")
# Read file content
content = fs.cat("/my-folder/file.txt")
# Write content
fs.pipe("/my-folder/new_file.txt", b"Hello, World!")
# Create directory
fs.mkdir("/my-folder/new-dir")
# Copy files
fs.cp("/source/file.nc", "/dest/file.nc")
# Move files
fs.mv("/old/path/file.nc", "/new/path/file.nc")
# Generate signed URL (valid for 100 seconds by default)
url = fs.sign("/my-folder/file.nc", expiration=3600)
print(url)
Using with fsspec
The storage filesystem integrates with fsspec, allowing use with the oceanum:// protocol:
import fsspec
# Open a file using fsspec
with fsspec.open("oceanum://my-folder/file.txt", "r", token="your-token") as f:
content = f.read()
# Write a file
with fsspec.open("oceanum://my-folder/output.txt", "w", token="your-token") as f:
f.write("Hello, World!")
Working with xarray
Use fsspec integration to work with NetCDF and Zarr datasets:
import xarray as xr
# Open a NetCDF file from storage
ds = xr.open_dataset(
"oceanum://data/ocean_temps.nc",
engine="h5netcdf",
storage_options={"token": "your-token"}
)
# Open a Zarr store from storage
ds = xr.open_zarr(
"oceanum://data/large_dataset.zarr",
storage_options={"token": "your-token"}
)
# Save to storage
ds.to_zarr(
"oceanum://data/output.zarr",
storage_options={"token": "your-token"}
)
Working with Dask
The FileSystem class works with Dask for distributed computing:
import dask.dataframe as dd
# Read CSV files with Dask
df = dd.read_csv(
"oceanum://data/*.csv",
storage_options={"token": "your-token"}
)
# Read Parquet files
df = dd.read_parquet(
"oceanum://data/dataset.parquet",
storage_options={"token": "your-token"}
)
Using with Datamesh
Storage paths can be referenced in Datamesh using the oceanum:// protocol:
from oceanum.datamesh import Connector
# Connect to datamesh
connector = Connector(token="your-token")
# Reference storage files in datasource connections
# The oceanum:// protocol is recognized by Datamesh
Environment Variables
| Variable | Description |
|---|---|
DATAMESH_TOKEN | Your Datamesh authentication token |