Skip to main content
Technology & EngineeringAzure Services330 lines

Blob Storage

Azure Blob Storage for scalable object storage and data lake scenarios

Quick Summary31 lines
You are an expert in Azure Blob Storage for building scalable object storage solutions, data lakes, and content delivery systems on Microsoft Azure.

## Key Points

- **Storage Account**: Top-level namespace providing a unique DNS endpoint
- **Container**: Organizational unit within a storage account (similar to a directory)
- **Blob Types**: Block blobs (general-purpose), Append blobs (log/append workloads), Page blobs (VM disks)
- **Access Tiers**: Hot (frequent access), Cool (infrequent, 30-day minimum), Cold (rare, 90-day minimum), Archive (offline, 180-day minimum)
- **Redundancy**: LRS, ZRS, GRS, RA-GRS, GZRS, RA-GZRS
1. **Use managed identity and RBAC**: Assign `Storage Blob Data Contributor` or `Storage Blob Data Reader` roles instead of using account keys.
2. **Disable public blob access**: Set `--allow-blob-public-access false` at the account level unless you specifically need public containers.
3. **Use access tiers appropriately**: Move infrequently accessed data to Cool or Cold. Use lifecycle management policies for automatic tiering.
4. **Enable soft delete and versioning**: Protect against accidental deletion. Soft delete retains deleted blobs for a configurable period.
5. **Use AzCopy for bulk transfers**: For large-scale data movement, AzCopy outperforms SDK-based uploads.
6. **Use immutability policies for compliance**: Legal hold or time-based retention prevents deletion or modification.
7. **Choose the right redundancy**: Use ZRS for single-region high availability, GRS/GZRS for cross-region disaster recovery.

## Quick Example

```bash
az role assignment create \
     --role "Storage Blob Data Contributor" \
     --assignee <principal-id> \
     --scope "/subscriptions/<sub-id>/resourceGroups/myRG/providers/Microsoft.Storage/storageAccounts/mystorageacct"
```

```bash
azcopy copy "/local/path/*" "https://mystorageacct.blob.core.windows.net/uploads/?<SAS>" --recursive
```
skilldb get azure-services-skills/Blob StorageFull skill: 330 lines
Paste into your CLAUDE.md or agent config

Azure Blob Storage — Cloud Services

You are an expert in Azure Blob Storage for building scalable object storage solutions, data lakes, and content delivery systems on Microsoft Azure.

Core Philosophy

Blob Storage is Azure's universal data store -- the place where everything from user-uploaded photos to multi-terabyte data lake datasets lives. Its power lies in its simplicity: blobs go in, blobs come out, at virtually unlimited scale. The key architectural decision is not whether to use Blob Storage (you almost certainly should), but how to secure it, tier it, and organize it so that costs stay proportional to value and sensitive data stays protected.

Managed identity and RBAC should replace connection strings wherever possible. Storage account keys are all-or-nothing: anyone with the key has full read-write access to every container and blob. Managed identity with roles like Storage Blob Data Reader or Storage Blob Data Contributor provides scoped, auditable access that can be revoked instantly. For temporary external access, use User Delegation SAS tokens (backed by Azure AD) rather than Account Key SAS tokens.

Access tiers are cost controls that should be automated, not manual. Data that is hot today is rarely hot a month from now. Lifecycle management policies that automatically transition blobs from Hot to Cool to Cold to Archive -- and eventually delete them -- are the difference between a storage bill that scales linearly with time and one that stays proportional to actively used data. Set lifecycle policies on day one, not after your first bill shock.

Anti-Patterns

  • Storing account keys in application code or config files -- Account keys grant unrestricted access to the entire storage account. Use managed identity with RBAC assignments, and store any necessary keys in Key Vault.
  • Leaving blobs in Hot tier indefinitely -- Hot tier is the most expensive storage class. Data accessed less than monthly should be in Cool or Cold tier. Use lifecycle management policies to automate transitions.
  • Allowing public blob access at the account level -- Set --allow-blob-public-access false by default. Public containers are a common source of data breaches. Use SAS tokens or Azure CDN for controlled public access.
  • Downloading large blobs into memory -- Loading multi-gigabyte blobs into a buffer exhausts application memory. Use streaming APIs (downloadToFile, downloadStream) for files larger than a few hundred megabytes.
  • Skipping soft delete and versioning on critical containers -- Without these, an accidental delete or overwrite is permanent. Soft delete retains deleted blobs for a configurable period; versioning preserves every overwrite.

Overview

Azure Blob Storage is Microsoft's massively scalable object storage service for unstructured data. It is optimized for storing text, binary data, images, documents, streaming media, and backup/archive data. Blob Storage also serves as the foundation for Azure Data Lake Storage Gen2.

Key concepts:

  • Storage Account: Top-level namespace providing a unique DNS endpoint
  • Container: Organizational unit within a storage account (similar to a directory)
  • Blob Types: Block blobs (general-purpose), Append blobs (log/append workloads), Page blobs (VM disks)
  • Access Tiers: Hot (frequent access), Cool (infrequent, 30-day minimum), Cold (rare, 90-day minimum), Archive (offline, 180-day minimum)
  • Redundancy: LRS, ZRS, GRS, RA-GRS, GZRS, RA-GZRS

Setup & Configuration

Create a storage account

# Create a general-purpose v2 storage account with ZRS redundancy
az storage account create \
  --name mystorageacct \
  --resource-group myResourceGroup \
  --location eastus \
  --sku Standard_ZRS \
  --kind StorageV2 \
  --access-tier Hot \
  --min-tls-version TLS1_2 \
  --allow-blob-public-access false

# Create a container
az storage container create \
  --name uploads \
  --account-name mystorageacct \
  --auth-mode login

# Enable soft delete for blob protection
az storage account blob-service-properties update \
  --account-name mystorageacct \
  --resource-group myResourceGroup \
  --enable-delete-retention true \
  --delete-retention-days 30

# Enable versioning
az storage account blob-service-properties update \
  --account-name mystorageacct \
  --resource-group myResourceGroup \
  --enable-versioning true

Enable Data Lake Storage Gen2 (hierarchical namespace)

az storage account create \
  --name mydatalakeacct \
  --resource-group myResourceGroup \
  --location eastus \
  --sku Standard_ZRS \
  --kind StorageV2 \
  --enable-hierarchical-namespace true

Core Patterns

Upload and download blobs (JavaScript SDK)

import { BlobServiceClient, BlockBlobClient } from "@azure/storage-blob";
import { DefaultAzureCredential } from "@azure/identity";
import { Readable } from "stream";

// Use managed identity / DefaultAzureCredential (preferred)
const blobServiceClient = new BlobServiceClient(
  `https://mystorageacct.blob.core.windows.net`,
  new DefaultAzureCredential()
);

const containerClient = blobServiceClient.getContainerClient("uploads");

// Upload a file
async function uploadFile(blobName: string, filePath: string): Promise<string> {
  const blockBlobClient = containerClient.getBlockBlobClient(blobName);

  await blockBlobClient.uploadFile(filePath, {
    blobHTTPHeaders: { blobContentType: "application/pdf" },
    metadata: { uploadedBy: "my-service", environment: "production" },
    tags: { department: "engineering" },
    tier: "Hot",
  });

  return blockBlobClient.url;
}

// Upload from a stream
async function uploadStream(blobName: string, stream: Readable, contentLength: number) {
  const blockBlobClient = containerClient.getBlockBlobClient(blobName);

  await blockBlobClient.uploadStream(stream, 4 * 1024 * 1024, 20, {
    blobHTTPHeaders: { blobContentType: "application/octet-stream" },
  });
}

// Download a blob to buffer
async function downloadBlob(blobName: string): Promise<Buffer> {
  const blobClient = containerClient.getBlobClient(blobName);
  return blobClient.downloadToBuffer();
}

// Download a blob to file
async function downloadToFile(blobName: string, destPath: string) {
  const blobClient = containerClient.getBlobClient(blobName);
  await blobClient.downloadToFile(destPath);
}

Generate SAS tokens for secure temporary access

import {
  BlobSASPermissions,
  generateBlobSASQueryParameters,
  StorageSharedKeyCredential,
  SASProtocol,
} from "@azure/storage-blob";

function generateReadOnlySasUrl(
  accountName: string,
  accountKey: string,
  containerName: string,
  blobName: string,
  expiresInMinutes: number = 60
): string {
  const sharedKeyCredential = new StorageSharedKeyCredential(accountName, accountKey);

  const sasToken = generateBlobSASQueryParameters(
    {
      containerName,
      blobName,
      permissions: BlobSASPermissions.parse("r"), // read-only
      startsOn: new Date(),
      expiresOn: new Date(Date.now() + expiresInMinutes * 60 * 1000),
      protocol: SASProtocol.Https,
    },
    sharedKeyCredential
  ).toString();

  return `https://${accountName}.blob.core.windows.net/${containerName}/${blobName}?${sasToken}`;
}

// Prefer User Delegation SAS (uses Azure AD, no account key needed)
async function generateUserDelegationSasUrl(
  blobServiceClient: BlobServiceClient,
  containerName: string,
  blobName: string
): Promise<string> {
  const delegationKey = await blobServiceClient.getUserDelegationKey(
    new Date(),
    new Date(Date.now() + 3600 * 1000)
  );

  const sasToken = generateBlobSASQueryParameters(
    {
      containerName,
      blobName,
      permissions: BlobSASPermissions.parse("r"),
      startsOn: new Date(),
      expiresOn: new Date(Date.now() + 3600 * 1000),
    },
    delegationKey,
    blobServiceClient.accountName
  ).toString();

  return `https://${blobServiceClient.accountName}.blob.core.windows.net/${containerName}/${blobName}?${sasToken}`;
}

List blobs with pagination and prefix filtering

async function listBlobsByPrefix(prefix: string, maxResults: number = 100) {
  const blobs: string[] = [];

  for await (const blob of containerClient.listBlobsFlat({
    prefix,
    includeMetadata: true,
  })) {
    blobs.push(blob.name);
    if (blobs.length >= maxResults) break;
  }

  return blobs;
}

// List blobs in a virtual directory hierarchy
async function listBlobHierarchy(prefix: string) {
  const result: { directories: string[]; files: string[] } = { directories: [], files: [] };

  for await (const item of containerClient.listBlobsByHierarchy("/", { prefix })) {
    if (item.kind === "prefix") {
      result.directories.push(item.name);
    } else {
      result.files.push(item.name);
    }
  }

  return result;
}

Lifecycle management policies

# Create a lifecycle policy to tier and delete old blobs
az storage account management-policy create \
  --account-name mystorageacct \
  --resource-group myResourceGroup \
  --policy '{
    "rules": [
      {
        "name": "archiveOldBlobs",
        "enabled": true,
        "type": "Lifecycle",
        "definition": {
          "actions": {
            "baseBlob": {
              "tierToCool": { "daysAfterModificationGreaterThan": 30 },
              "tierToArchive": { "daysAfterModificationGreaterThan": 90 },
              "delete": { "daysAfterModificationGreaterThan": 365 }
            },
            "snapshot": {
              "delete": { "daysAfterCreationGreaterThan": 90 }
            }
          },
          "filters": {
            "blobTypes": ["blockBlob"],
            "prefixMatch": ["logs/", "backups/"]
          }
        }
      }
    ]
  }'

Event-driven processing with Event Grid

# Subscribe to blob creation events
az eventgrid event-subscription create \
  --name blobCreatedSubscription \
  --source-resource-id "/subscriptions/<sub-id>/resourceGroups/myResourceGroup/providers/Microsoft.Storage/storageAccounts/mystorageacct" \
  --endpoint "https://myfunctionapp.azurewebsites.net/api/blobHandler" \
  --included-event-types Microsoft.Storage.BlobCreated \
  --subject-begins-with "/blobServices/default/containers/uploads/"

Best Practices

  1. Use managed identity and RBAC: Assign Storage Blob Data Contributor or Storage Blob Data Reader roles instead of using account keys.

    az role assignment create \
      --role "Storage Blob Data Contributor" \
      --assignee <principal-id> \
      --scope "/subscriptions/<sub-id>/resourceGroups/myRG/providers/Microsoft.Storage/storageAccounts/mystorageacct"
    
  2. Disable public blob access: Set --allow-blob-public-access false at the account level unless you specifically need public containers.

  3. Use access tiers appropriately: Move infrequently accessed data to Cool or Cold. Use lifecycle management policies for automatic tiering.

  4. Enable soft delete and versioning: Protect against accidental deletion. Soft delete retains deleted blobs for a configurable period.

  5. Use AzCopy for bulk transfers: For large-scale data movement, AzCopy outperforms SDK-based uploads.

    azcopy copy "/local/path/*" "https://mystorageacct.blob.core.windows.net/uploads/?<SAS>" --recursive
    
  6. Use immutability policies for compliance: Legal hold or time-based retention prevents deletion or modification.

  7. Choose the right redundancy: Use ZRS for single-region high availability, GRS/GZRS for cross-region disaster recovery.

Common Pitfalls

  • Storing account keys in code or config: Use managed identity or Key Vault references. Rotate keys regularly if you must use them.

  • Not handling large file uploads correctly: For files over 256MB, use staged block uploads or uploadStream. The SDK handles this automatically with uploadFile but set appropriate concurrency.

  • Ignoring access tier costs: Archive tier is cheap for storage but expensive and slow (hours) for retrieval. Ensure you understand the rehydration cost before archiving.

  • Flat namespace performance: With millions of blobs in a flat listing, prefix-based filtering is essential. Consider Data Lake Storage Gen2 for true hierarchical directory operations.

  • Missing CORS for browser-based access: Configure CORS rules on the storage account for direct browser uploads.

    az storage cors add \
      --account-name mystorageacct \
      --services b \
      --methods GET PUT \
      --origins "https://myapp.com" \
      --allowed-headers "*" \
      --max-age 3600
    
  • Not using content-type headers: Always set blobContentType on upload. Without it, browsers may not handle downloads correctly.

  • Forgetting network rules: If you restrict the storage account to a VNET, ensure your services (Functions, App Service) have VNET integration or use private endpoints.

Install this skill directly: skilldb add azure-services-skills

Get CLI access →