Skip to main content

Python Integration

The Oceanum Python library includes a storage module that follows the fsspec specification, providing seamless integration with Oceanum Storage in your Python scripts and notebooks.

Installation

Install the Oceanum library:

pip install oceanum

Authentication

The storage module uses your Datamesh token for authentication. You can either:

  1. Set the DATAMESH_TOKEN environment variable:
export DATAMESH_TOKEN="your-datamesh-token"
  1. Or pass the token directly to functions:
from oceanum import storage

storage.ls("/", token="your-datamesh-token")

To obtain your Datamesh token, see the Token documentation.

Simple Functions

The storage module provides simple functions for common operations:

List Files

from oceanum import storage

# List root directory
files = storage.ls("/")
for f in files:
print(f)

# List with details
files = storage.ls("/my-folder", detail=True)
for f in files:
print(f"{f['name']} - {f['size']} bytes")

# Recursive listing
files = storage.ls("/my-folder", recursive=True)

Upload Files

from oceanum import storage

# Upload a single file
storage.put("local_file.nc", "/remote/path/file.nc")

# Upload a directory recursively
storage.put("./local_folder", "/remote/folder", recursive=True)

Download Files

from oceanum import storage

# Download a single file
storage.get("/remote/path/file.nc", "local_file.nc")

# Download a directory recursively
storage.get("/remote/folder", "./local_folder", recursive=True)

Delete Files

from oceanum import storage

# Delete a file
storage.rm("/remote/path/old_file.nc")

# Delete a directory recursively
storage.rm("/remote/folder", recursive=True)

Check Files

from oceanum import storage

# Check if path exists
if storage.exists("/remote/path/file.nc"):
print("File exists")

# Check if path is a file
if storage.isfile("/remote/path/file.nc"):
print("It's a file")

# Check if path is a directory
if storage.isdir("/remote/folder"):
print("It's a directory")

FileSystem Class

For more control, use the FileSystem class directly:

from oceanum.storage import FileSystem

# Initialize with token
fs = FileSystem(token="your-datamesh-token")

# List files
files = fs.ls("/my-folder")

# Get file info
info = fs.info("/my-folder/file.nc")
print(f"Size: {info['size']}, Modified: {info['mtime']}")

# Read file content
content = fs.cat("/my-folder/file.txt")

# Write content
fs.pipe("/my-folder/new_file.txt", b"Hello, World!")

# Create directory
fs.mkdir("/my-folder/new-dir")

# Copy files
fs.cp("/source/file.nc", "/dest/file.nc")

# Move files
fs.mv("/old/path/file.nc", "/new/path/file.nc")

# Generate signed URL (valid for 100 seconds by default)
url = fs.sign("/my-folder/file.nc", expiration=3600)
print(url)

Using with fsspec

The storage filesystem integrates with fsspec, allowing use with the oceanum:// protocol:

import fsspec

# Open a file using fsspec
with fsspec.open("oceanum://my-folder/file.txt", "r", token="your-token") as f:
content = f.read()

# Write a file
with fsspec.open("oceanum://my-folder/output.txt", "w", token="your-token") as f:
f.write("Hello, World!")

Working with xarray

Use fsspec integration to work with NetCDF and Zarr datasets:

import xarray as xr

# Open a NetCDF file from storage
ds = xr.open_dataset(
"oceanum://data/ocean_temps.nc",
engine="h5netcdf",
storage_options={"token": "your-token"}
)

# Open a Zarr store from storage
ds = xr.open_zarr(
"oceanum://data/large_dataset.zarr",
storage_options={"token": "your-token"}
)

# Save to storage
ds.to_zarr(
"oceanum://data/output.zarr",
storage_options={"token": "your-token"}
)

Working with Dask

The FileSystem class works with Dask for distributed computing:

import dask.dataframe as dd

# Read CSV files with Dask
df = dd.read_csv(
"oceanum://data/*.csv",
storage_options={"token": "your-token"}
)

# Read Parquet files
df = dd.read_parquet(
"oceanum://data/dataset.parquet",
storage_options={"token": "your-token"}
)

Using with Datamesh

Storage paths can be referenced in Datamesh using the oceanum:// protocol:

from oceanum.datamesh import Connector

# Connect to datamesh
connector = Connector(token="your-token")

# Reference storage files in datasource connections
# The oceanum:// protocol is recognized by Datamesh

Environment Variables

VariableDescription
DATAMESH_TOKENYour Datamesh authentication token