Data Lifecycle Audit
Verify that data is handled correctly across its entire lifecycle: creation, update, duplication, archival, export, import, and deletion. Data lifecycle bugs are among the most common causes of storage waste, data loss, inconsistent state, and compliance violations. This audit ensures that every data operation leaves the system in a consistent state. ## Key Points 1. Create Project X with: - 10+ assets (with files in storage) - 5+ scenes with asset assignments - 3+ collaborators - Custom settings/configuration - Tags and metadata 2. Duplicate Project X to create Project X-Copy. 3. Inspect every linked record. - [ ] All records are duplicated (not referenced from original). - [ ] X-Copy has new IDs for all records (no shared IDs with original). - [ ] Editing an asset in X-Copy does not affect the original. - [ ] Deleting X-Copy does not affect the original.
skilldb get production-audit-skills/data-lifecycle-auditFull skill: 533 linesData Lifecycle Audit
Purpose
Verify that data is handled correctly across its entire lifecycle: creation, update, duplication, archival, export, import, and deletion. Data lifecycle bugs are among the most common causes of storage waste, data loss, inconsistent state, and compliance violations. This audit ensures that every data operation leaves the system in a consistent state.
Scope
| Lifecycle Phase | What We Test |
|---|---|
| Creation | Are all related records created atomically? |
| Update | Are related records updated consistently? |
| Duplication | Are all linked records deep-copied correctly? |
| Archival | Is archived data accessible but excluded from active queries? |
| Restoration | Does restore return data to its pre-archive state? |
| Export | Does exported data match the source completely? |
| Import | Does imported data recreate the original correctly? |
| Deletion | Are all related records and files cleaned up? |
| Versioning | Are previous versions preserved when assets are regenerated? |
Risk Pattern Table
| Pattern | What It Hits | Risk | Symptom |
|---|---|---|---|
| Orphaned files on deletion | Storage | HIGH | Files accumulate in storage after records deleted; costs grow |
| Incomplete cascade delete | Data integrity | HIGH | Parent deleted, children remain; broken references |
| Shallow copy on duplication | Data integrity | HIGH | Duplicate shares references with original; edit one, both change |
| Missing files on export | Data portability | MEDIUM | Export contains metadata but not actual files |
| ID collision on import | Data integrity | HIGH | Imported records overwrite existing records with same IDs |
| No versioning on regenerate | Data loss | HIGH | Regenerating asset overwrites previous version permanently |
| Archive still in active queries | Performance, UX | MEDIUM | Archived items appear in listings, slow down queries |
| Restore loses state | Data integrity | MEDIUM | Restored item loses collaborators, tags, or linked records |
| Broken foreign keys | Data integrity | HIGH | Deletion removes parent but children reference non-existent ID |
| Inconsistent soft delete | Data integrity | MEDIUM | Some queries filter deleted records, others don't; ghost data |
Concrete Test Cases
TEST-DL-001: Duplicate Large Project, Inspect Linked Records
Objective: Verify that duplicating a project creates independent, complete copies of all linked data.
Steps:
- Create Project X with:
- 10+ assets (with files in storage)
- 5+ scenes with asset assignments
- 3+ collaborators
- Custom settings/configuration
- Tags and metadata
- Duplicate Project X to create Project X-Copy.
- Inspect every linked record.
Verification Checklist:
Record Type | Copied? | Independent? | Files Copied? |
--------------------|---------|-------------|---------------|
Project metadata | [ ] | [ ] | N/A |
Assets | [ ] | [ ] | [ ] |
Scenes | [ ] | [ ] | N/A |
Scene-asset links | [ ] | [ ] | N/A |
Settings | [ ] | [ ] | N/A |
Tags | [ ] | [ ] | N/A |
Collaborators | [ ] | [ ] | N/A |
Permissions/ACLs | [ ] | [ ] | N/A |
Pass Criteria:
- All records are duplicated (not referenced from original).
- X-Copy has new IDs for all records (no shared IDs with original).
- Editing an asset in X-Copy does not affect the original.
- Deleting X-Copy does not affect the original.
- Files are either deep-copied or copy-on-write (not shared mutable references).
- Internal references within X-Copy point to X-Copy's records (not original's).
Deep Copy Verification:
-- Check: do any of X-Copy's records reference X's IDs?
SELECT * FROM scenes
WHERE project_id = 'x-copy-id'
AND asset_references && (SELECT array_agg(id) FROM assets WHERE project_id = 'original-x-id');
-- Result should be EMPTY (no cross-references)
-- Check: are file paths different?
SELECT a.file_path as original, b.file_path as copy
FROM assets a JOIN assets b ON a.name = b.name
WHERE a.project_id = 'original-x-id' AND b.project_id = 'x-copy-id';
-- Paths should differ (or use copy-on-write with versioned paths)
TEST-DL-002: Delete Project, Verify Storage Cleanup
Objective: Verify that deleting a project removes all associated files from storage.
Steps:
- Create Project X with 10 assets (files in cloud storage).
- Record all storage paths for Project X's files.
- Delete Project X.
- Wait for cleanup job (if async).
- Check each storage path.
Pass Criteria:
- All asset files deleted from storage (or marked for deferred deletion).
- All thumbnail/preview files deleted.
- All temporary/intermediate files deleted.
- Database records deleted (or soft-deleted).
- No orphaned storage files remain.
- Storage space reclaimed (verify with bucket size if possible).
Cleanup Timing:
| Cleanup Type | Acceptable Delay | Implementation |
|-------------|-----------------|----------------|
| Database records | Immediate (sync) | CASCADE DELETE or application logic |
| Storage files | < 24 hours (async OK) | Cleanup job triggered by delete event |
| CDN cache | < 1 hour | Cache purge on delete |
| Search index | < 5 minutes | Index update on delete |
| Analytics data | Retained (anonymized) | Separate from operational delete |
Orphan Detection Query:
-- Find storage references with no parent record
SELECT file_path FROM assets WHERE project_id NOT IN (SELECT id FROM projects);
-- These are orphaned records
-- Find storage files with no DB reference (requires storage listing)
-- Compare: storage bucket listing vs assets.file_path values
-- Diff = orphaned files
TEST-DL-003: Archive and Restore Cycle
Objective: Verify that archiving and restoring a project preserves all data.
Steps:
- Create Project X with full data (assets, scenes, collaborators, settings).
- Record the complete state (snapshot).
- Archive Project X.
- Verify Project X is excluded from active queries.
- Restore Project X.
- Compare restored state with snapshot.
Pass Criteria:
- Archived project does not appear in project listings.
- Archived project does not appear in search results.
- Archived project's assets do not appear in global asset search.
- Archived project IS accessible via direct URL (read-only) or admin panel.
- Restored project matches pre-archive state completely:
- All assets present and accessible
- All scenes intact
- All collaborators restored
- All settings preserved
- All tags/metadata preserved
- Archive/restore cycle can be repeated without data loss.
Archive Implementation Check:
[ ] Soft delete with is_archived flag (not hard delete)
[ ] All listing queries filter by is_archived = false
[ ] Direct access queries do NOT filter by is_archived (allows admin access)
[ ] Archive timestamp recorded
[ ] Restore clears archive flag and timestamp
[ ] Archived resources do not consume active quotas
TEST-DL-004: Regenerate Asset, Confirm Versioning
Objective: Verify that regenerating an existing asset preserves the previous version.
Steps:
- Generate Asset A in Project X. Record: file path, content hash, metadata.
- Regenerate Asset A (same prompt or modified prompt).
- After regeneration completes, check:
Pass Criteria:
- New version is the active/current version.
- Previous version is preserved and accessible.
- Version history is visible (UI or API).
- User can revert to previous version.
- Storage contains both versions (old not overwritten).
- Version metadata includes: timestamp, prompt/parameters, generation method.
Fail Criteria:
- Previous version permanently overwritten.
- No version history exists.
- Old file deleted from storage on regeneration.
Version Storage Pattern:
Good: Versioned paths
v1: /projects/x/assets/hero/v1/hero.png
v2: /projects/x/assets/hero/v2/hero.png
Current pointer: assets.current_version_id -> v2
Good: Append-only versions table
| asset_id | version | file_path | created_at | is_current |
| asset_1 | 1 | /path/v1 | 2024-01-01 | false |
| asset_1 | 2 | /path/v2 | 2024-01-15 | true |
Bad: Overwrite in place
/projects/x/assets/hero.png <- always overwritten, no history
TEST-DL-005: Export and Import Comparison
Objective: Verify that exporting and importing a project produces an identical copy.
Steps:
- Create Project X with comprehensive data.
- Export Project X (file download or API endpoint).
- Inspect the export package.
- Import the export into a new project.
- Compare original and imported project.
Export Package Verification:
[ ] Metadata included (project name, settings, configuration)
[ ] All assets included (files, not just references)
[ ] Scene definitions included
[ ] Scene-to-asset mappings included
[ ] Tags and custom metadata included
[ ] Export format is documented
[ ] Export is self-contained (no external dependencies)
[ ] Sensitive data excluded (API keys, tokens, internal IDs)
Import Verification:
[ ] All exported data is imported
[ ] New IDs generated (no collision with existing data)
[ ] Internal references updated to new IDs
[ ] Files stored in new locations (not overwriting existing)
[ ] Import is atomic (all-or-nothing; partial import is rolled back)
[ ] Duplicate import does not create duplicates (idempotent or warned)
Comparison Matrix:
| Field | Original | Imported | Match? |
|-------|----------|----------|--------|
| Project name | "My Project" | "My Project (Imported)" | Acceptable |
| Asset count | 10 | 10 | [ ] |
| Scene count | 5 | 5 | [ ] |
| Asset file hashes | [hash1, hash2, ...] | [hash1, hash2, ...] | [ ] |
| Settings | {...} | {...} | [ ] |
| Tags | [tag1, tag2] | [tag1, tag2] | [ ] |
| Collaborators | [user1, user2] | NOT imported | Expected |
TEST-DL-006: Cascade Delete Completeness
Objective: Verify that deleting a parent entity removes or handles all child entities.
Steps:
- Map all parent-child relationships in the data model.
- For each parent entity type, create a parent with children.
- Delete the parent.
- Check for orphaned children.
Relationship Map Template:
| Parent | Child | Relationship | On Delete | Verified |
|--------|-------|-------------|-----------|----------|
| Project | Asset | 1:N | CASCADE | [ ] |
| Project | Scene | 1:N | CASCADE | [ ] |
| Scene | SceneAsset | 1:N | CASCADE | [ ] |
| Asset | AssetVersion | 1:N | CASCADE | [ ] |
| Asset | StorageFile | 1:1 | DELETE FILE | [ ] |
| Project | Collaborator | N:N | REMOVE LINK | [ ] |
| User | Project (owned) | 1:N | TRANSFER/ARCHIVE | [ ] |
| User | Collaborator | 1:N | REMOVE LINK | [ ] |
| User | Session | 1:N | CASCADE | [ ] |
Orphan Detection After Delete:
-- Check for orphaned assets (no parent project)
SELECT id FROM assets WHERE project_id NOT IN (SELECT id FROM projects);
-- Check for orphaned scenes
SELECT id FROM scenes WHERE project_id NOT IN (SELECT id FROM projects);
-- Check for orphaned storage references
SELECT id, file_path FROM assets WHERE file_path IS NOT NULL
AND NOT EXISTS (SELECT 1 FROM storage_files WHERE path = assets.file_path);
-- Check for dangling foreign keys (if no CASCADE)
SELECT * FROM scene_assets WHERE scene_id NOT IN (SELECT id FROM scenes);
SELECT * FROM scene_assets WHERE asset_id NOT IN (SELECT id FROM assets);
TEST-DL-007: Update Consistency Across Related Records
Objective: Verify that updating a record also updates all derived/cached/related data.
Steps:
- Update an asset's name/metadata.
- Check: is the name updated everywhere it appears?
- In the asset record
- In scene references
- In export manifests
- In search index
- In CDN/cache
Pass Criteria:
- All denormalized copies are updated (or invalidated).
- Search index is updated within 5 minutes.
- CDN/cache is invalidated for updated content.
- Related records referencing by name (not ID) are updated.
Post-Audit Checklist
[ ] Duplication creates independent copies (no shared references)
[ ] Deletion removes all related records and storage files
[ ] Cascade relationships are defined and tested for every parent-child pair
[ ] Archive excludes from active queries but preserves data
[ ] Restore returns data to pre-archive state completely
[ ] Regeneration preserves previous versions
[ ] Export includes all data (files + metadata)
[ ] Import generates new IDs and updates internal references
[ ] Export/import round-trip produces equivalent data
[ ] Orphan detection runs periodically
[ ] Storage cleanup job exists and runs on schedule
[ ] Soft delete is consistent (all queries respect it)
[ ] Data integrity checks are automated
[ ] Foreign key constraints enforce referential integrity
What Earlier Audits Miss
Standard testing verifies CRUD operations work. This audit matters because:
- Unit tests test create and delete independently. They never test create-then-delete-then-check-for-orphans.
- Integration tests verify happy-path lifecycle but miss edge cases: what happens when you duplicate a project that was partially generated? What about deleting a project mid-export?
- Code reviews catch missing CASCADE constraints but miss that storage files are not cleaned up by the database cascade.
- QA testing rarely checks storage buckets after deletion. Files accumulate silently for months.
- Schema validation ensures foreign keys exist but not that the ON DELETE action is correct (CASCADE vs SET NULL vs RESTRICT).
This would be called a Data Lifecycle Audit -- specifically testing whether all related records, files, and references are correctly handled under creation, duplication, update, archival, export, import, and deletion operations.
Extended Orphan Detection Scripts
Storage Orphan Detection
# List all files in storage bucket
gsutil ls -r gs://your-bucket/assets/ > storage_files.txt
# List all file paths referenced in database
psql -c "SELECT file_path FROM assets WHERE file_path IS NOT NULL
UNION ALL
SELECT thumbnail_path FROM assets WHERE thumbnail_path IS NOT NULL
UNION ALL
SELECT file_path FROM asset_versions WHERE file_path IS NOT NULL" \
-t -A > db_references.txt
# Find orphaned storage files (in storage but not in DB)
comm -23 <(sort storage_files.txt) <(sort db_references.txt) > orphaned_files.txt
wc -l orphaned_files.txt
# If count > 0: orphaned files exist -- calculate wasted storage cost
# Find broken references (in DB but not in storage)
comm -13 <(sort storage_files.txt) <(sort db_references.txt) > broken_refs.txt
wc -l broken_refs.txt
# If count > 0: database references point to missing files
Comprehensive Orphan Detection SQL
-- 1. Assets with no parent project
SELECT a.id, a.name, a.file_path, a.created_at
FROM assets a
LEFT JOIN projects p ON a.project_id = p.id
WHERE p.id IS NULL;
-- 2. Scene-asset links pointing to deleted assets or scenes
SELECT sa.id, sa.scene_id, sa.asset_id
FROM scene_assets sa
LEFT JOIN scenes s ON sa.scene_id = s.id
LEFT JOIN assets a ON sa.asset_id = a.id
WHERE s.id IS NULL OR a.id IS NULL;
-- 3. Asset versions orphaned from their parent asset
SELECT av.id, av.asset_id, av.version, av.file_path
FROM asset_versions av
LEFT JOIN assets a ON av.asset_id = a.id
WHERE a.id IS NULL;
-- 4. Collaborator records for deleted users or projects
SELECT c.id, c.user_id, c.project_id
FROM collaborators c
LEFT JOIN users u ON c.user_id = u.id
LEFT JOIN projects p ON c.project_id = p.id
WHERE u.id IS NULL OR p.id IS NULL;
-- 5. Jobs referencing deleted entities
SELECT j.id, j.status, j.entity_type, j.entity_id, j.created_at
FROM jobs j
WHERE j.entity_type = 'project' AND j.entity_id NOT IN (SELECT id FROM projects)
OR j.entity_type = 'asset' AND j.entity_id NOT IN (SELECT id FROM assets);
-- 6. Soft-deleted records still referenced by active records
SELECT 'active_scene_refs_deleted_asset' as issue, COUNT(*)
FROM scene_assets sa
JOIN assets a ON sa.asset_id = a.id
WHERE a.deleted_at IS NOT NULL AND sa.deleted_at IS NULL;
-- 7. Storage size by orphan status
SELECT
CASE WHEN p.id IS NULL THEN 'orphaned' ELSE 'valid' END as status,
COUNT(*) as file_count,
SUM(a.file_size_bytes) / 1024 / 1024 as total_mb
FROM assets a
LEFT JOIN projects p ON a.project_id = p.id
WHERE a.file_path IS NOT NULL
GROUP BY CASE WHEN p.id IS NULL THEN 'orphaned' ELSE 'valid' END;
Automation Opportunities
| Test | Automatable? | Method |
|---|---|---|
| TEST-DL-001: Duplication integrity | YES | Integration test: duplicate, compare record counts and IDs |
| TEST-DL-002: Delete cleanup | PARTIAL | Test DB cleanup (auto); verify storage cleanup (scheduled script) |
| TEST-DL-003: Archive/restore | YES | Integration test: archive, assert excluded from listings, restore, assert present |
| TEST-DL-004: Versioning | YES | Integration test: regenerate, assert both versions exist |
| TEST-DL-005: Export/import | YES | Integration test: export, import, compare field-by-field |
| TEST-DL-006: Cascade delete | YES | Integration test: delete parent, assert no orphaned children |
| TEST-DL-007: Update consistency | PARTIAL | Integration test for DB; manual for CDN/search index |
| Orphan detection | YES | Scheduled cron job running the SQL queries above; alert on count > 0 |
# Cron job for orphan detection (run daily)
# crontab: 0 2 * * * /opt/scripts/check_orphans.sh
#!/bin/bash
ORPHAN_COUNT=$(psql -t -A -c "
SELECT COUNT(*) FROM assets
WHERE project_id NOT IN (SELECT id FROM projects WHERE deleted_at IS NULL)
AND deleted_at IS NULL;
")
if [ "$ORPHAN_COUNT" -gt "0" ]; then
echo "ALERT: Found $ORPHAN_COUNT orphaned assets" | \
curl -X POST -d @- https://hooks.slack.com/your-webhook
fi
Reusable Audit Report Template
# Data Lifecycle Audit Report
## System: _______________
## Date: YYYY-MM-DD
## Auditor: _______________
## Entity Relationship Map
[List all parent-child relationships and ON DELETE behavior]
## Test Results
| Test ID | Description | Result | Evidence |
|---------|-------------|--------|----------|
| TEST-DL-001 | Duplication integrity | PASS/FAIL | ___ records shared between original and copy |
| TEST-DL-002 | Delete cleanup | PASS/FAIL | ___ orphaned files after deletion |
| TEST-DL-003 | Archive/restore | PASS/FAIL | ___ fields lost on restore |
| TEST-DL-004 | Versioning | PASS/FAIL | Previous version preserved: yes/no |
| TEST-DL-005 | Export/import | PASS/FAIL | ___ fields mismatched |
| TEST-DL-006 | Cascade delete | PASS/FAIL | ___ orphaned children |
| TEST-DL-007 | Update consistency | PASS/FAIL | ___ stale references found |
## Orphan Detection Results
| Check | Count | Storage Impact |
|-------|-------|---------------|
| Orphaned assets | ___ | ___ MB |
| Broken file references | ___ | N/A |
| Orphaned scenes | ___ | N/A |
| Dangling foreign keys | ___ | N/A |
## Score: PASS / PARTIAL / FAIL
Priority Targeting
Run this audit FIRST if:
- Storage costs are growing faster than user activity
- Users report "phantom" or missing data after operations
- Duplication feature was recently added
- Deletion shows inconsistent behavior
- Export/import feature exists or is planned
- Soft delete was recently implemented
- Database migrations have modified table relationships
Install this skill directly: skilldb add production-audit-skills