Citemark API

A Node.js/TypeScript API server for managing AI workflows in the Citemark application. Built with Fastify and PostgreSQL, featuring JWT authentication and role-based access control.

Features

Authentication & Authorization: JWT-based authentication with role-based permissions
User Management: User registration, login, and profile management
Organization Management: Multi-tenant organization support
Project Management: Project creation and management within organizations
Role-Based Access Control: Granular permissions system with custom roles
AI Workflow Management: Core functionality for managing AI workflows
Skills Framework: AI agent execution system with VoltAgent integration, workflow orchestration, MCP support, and shadow mode execution

Technology Stack

Runtime: Node.js with TypeScript
Framework: Fastify
Database: PostgreSQL
Authentication: JWT (JSON Web Tokens)
Security: Helmet, rate limiting, secure sessions
Error Tracking: Sentry integration for production monitoring
Testing: Vitest with comprehensive unit and integration tests

API Endpoints

Authentication (`/auth`)

User registration and login
JWT token management
Password reset functionality

Users (`/users`)

User profile management
User listing and search
User role assignments

Organizations (`/organizations`)

Organization creation and management
Multi-tenant organization support
Organization member management

Projects (`/projects`)

Project creation within organizations
Project management and configuration
Project member assignments

Roles & Permissions (`/roles`, `/permissions`)

Role-based access control
Permission management
User role assignments

Files (`/files`)

File upload to projects and organizations
File download and management
Duplicate Detection: Three-tier detection (exact path+filename, content hash, filename) to prevent duplicate file processing
Duplicate Handling Options: When duplicates are detected, users can copy, move, upload as new, replace, or replace with new chunks
Text Extraction Integration: Automatically triggers text extraction via message queue when new files are uploaded (not triggered for copies/moves)
Signed URLs:
- Project scope: POST /file-management/projects/:projectId/files/signed-url (checks project access)
- Organization scope: POST /file-management/organizations/:organizationId/files/signed-url (checks org membership; no project required). Used by the PDF viewer when no project is selected or for org-level files. Returns 401 (no/invalid token), 403 (not an org member), 404 (file or bucket missing), 200 with signedUrl/expiresAt on success.

Documents (`/documents`)

Document Status Streaming: GET /documents/:documentId/status/stream - Server-Sent Events (SSE) stream for real-time document extraction status updates
Document Status Polling: GET /documents/:documentId/status - REST endpoint to fetch current document extraction status (used as fallback when SSE fails)
Document Retry: POST /documents/:documentId/retry - Re-queue a failed document for full extraction processing (both fast and Docling phases)
Docling Retry: POST /documents/:documentId/retry-docling - Re-queue a document for Docling enhanced extraction only (Phase 2). Only available when docling_extraction_status = 'docling_failed'
Status Updates: Real-time progress updates published every 10% during extraction (10%, 20%, 30%, etc.)
Progress Metadata: Includes percent_complete, pages_processed, total_pages, chunks_stored, and stage information
Two-Phase Status: Supports fast_extraction_status and docling_extraction_status fields for tracking separate extraction phases

Capabilities

GET /capabilities — returns environment-level feature availability for the frontend (auth required)

Skills (`/skills`)

Skills Framework: AI agent execution system using VoltAgent executor
Skill Creation: Create skills via POST /skills/author API endpoint (Expert users). Skills are created directly in the Citemark app, not just in VoltAgent Console.
Skill Versions: Create draft versions via POST /skills/:skillId/versions, update via PATCH /skills/:skillId/versions/:versionId, and publish via POST /skills/:skillId/versions/:versionId/publish
Skill Execution: POST /skills/:skillId/run — Execute skills with SSE streaming using VoltAgent executor
Tool Steps: Tool steps use LLM-driven invocation where the LLM decides whether to call tools based on constructed prompts
Workflow Management: Suspend/resume workflows with checkpoint support for human-in-the-loop review
MCP Integration: Model Context Protocol support for external tool access
VoltAgent Docker Service: Separate Docker container (voltagent service in docker-compose.yml) running standalone Hono server for VoltAgent Console integration
See docs/planning/voltagent-integration-plan.md for detailed architecture

Conversations (`/conversations`)

Per-user, per-organization conversation persistence (PostgreSQL + localStorage hybrid on FE)
Endpoints:
- GET /conversations — list conversations (paginated)
- POST /conversations — create (enforces per-user/org cap)
- GET /conversations/:id — fetch with messages (cursor-based pagination)
- PATCH /conversations/:id — rename
- DELETE /conversations/:id — soft delete
- POST /conversations/:id/messages — add message
- POST /conversations/sync — bulk sync (idempotent upsert, returns maxAcceptedTimestamp)
- GET /conversations/count — returns current count and cap
Security: JWT auth + org ownership checks + RLS (app.current_user_id, app.current_org_id) + per-route rate limits keyed by verified user/org.
Sync guardrails: max 100 messages per request; conflict-safe upserts; updated_at only bumps on actual inserts/updates or deletes.

Getting Started

Prerequisites

Docker Desktop (no Node.js required on host)
PostgreSQL database (can be containerized or external)

Development Setup

This service runs in a Docker container with bind mounts for hot reloading.

Start containers from the root directory:
```
docker compose up -d
```
Access the API:
- Via API Gateway (recommended): http://localhost:3333
- Direct access:
  - API: http://localhost:3000
  - API Docs: http://localhost:3000/docs

Hot Reloading

Code changes on the host are immediately reflected in the container. TypeScript files are automatically recompiled using tsx watch:

Edit files in workspace-api/ on your host machine
Changes are automatically detected by tsx watch
Server restarts automatically with new code

Environment Variables

Configured in root .env file (unified for all services). See root example.env for all available variables.

Database Authentication

The application uses standard PostgreSQL username/password authentication with SSL:

Environment Variables Required:
- POSTGRES_HOST: Database host
- POSTGRES_PORT: Database port (default: 25060)
- POSTGRES_USER: Database username (e.g., doadmin)
- POSTGRES_PASSWORD: Database password
- POSTGRES_DATABASE: Database name (default: coda_app)
- POSTGRES_SSLMODE: SSL mode (default: require)
- POSTGRES_CA_CERT: Optional path to CA certificate for SSL verification
Usage: Set these variables in your .env file for both development and production

API Documentation (Swagger/OpenAPI)

The API exposes OpenAPI docs via Swagger when enabled.
Default behavior: enabled in non-production or when WORKSPACE_API_DOCS_ENABLED=true.
UI route prefix is configurable with WORKSPACE_API_DOCS_ROUTE (default /docs).

Environment variables:

WORKSPACE_API_DOCS_ENABLED=true   # Force enable in any environment (default true in non-prod)
WORKSPACE_API_DOCS_ROUTE=/docs    # Route for Swagger UI

Once the server is running:

Swagger UI: http://localhost:3000/docs
Raw OpenAPI JSON: http://localhost:3000/docs/json

Database Setup

The database schema is managed via database/postgres_run.ts script:

Full Setup (Schema + Seed Data):

. database/init.sh

Or using npm:

npm run db:init

Schema Only:

npm run db:schema

Seed Data Only:

npm run db:seed

Schema Features:

Idempotent Execution: Can be run multiple times safely - existing constraints and objects are skipped
Dual Schema Support: Creates both staging (app_data_*) and test (test_app_data_*) schemas
Extracted Constraints: All constraints are defined separately with existence checks for safe re-runs

Required Environment Variables

Set up the database:

. database/init.sh

Build and start the application:

npm run build
npm start

Development Guidelines

Code Style

Use arrow functions wherever possible
Extract environment variables at the top of scripts
Pass objects to functions and destructure parameters instead of multiple arguments
Export constants using export const
Use single-line syntax when it improves readability

API Structure

API endpoints are organized in versioned folders (e.g., /routes/)
New environment variables should be added to /.env with placeholder values
Follow the established patterns for authentication and error handling

Example Function Structure

// Good: Arrow function with object destructuring
export const createUser = async ({ email, firstName, lastName, ...additionalData }) => {
  // Implementation
}

// Good: Single-line conditional
if (userExists) return existingUser

// Good: Environment variable extraction
const { NODE_ENV, DATABASE_URL } = process.env

Storage Integration

The application uses S3-compatible object storage for file storage. Storage operations use environment variables for consistent configuration.

Storage Configuration

Set the following environment variables in your .env file:

SPACES_ENDPOINT=your-spaces-endpoint
SPACES_REGION=nyc3
SPACES_KEY_ID=your-access-key-id
SPACES_SECRET_KEY=your-secret-access-key
SPACES_BUCKET_PREFIX=coda-org-

Available Scripts

Testing and Validation

npm run test:storage - Test object storage operations
npm run test:db - Test database connection

Scripts

npm run build - Compile TypeScript to JavaScript
npm start - Start the production server
npm run db:init or . database/init.sh - Set up database schema and initial data (full setup)
npm run db:schema - Set up database schema only
npm run db:seed - Insert seed data only

Testing Scripts

npm test - Run all tests in watch mode
npm run test:run - Run all tests once
npm run test:unit - Run unit tests only (no database required)
npm run test:integration - Run integration tests (requires PostgreSQL)
npm run test:coverage - Run tests with coverage report
npm run test:auth - Run authentication tests
npm run test:users - Run user model tests

Database

The application uses PostgreSQL with a comprehensive schema for:

User management and authentication
Organization and project structure
Role-based permissions system
AI workflow data

Schema Management

The database schema is defined in database/postgres.sql and deployed via database/postgres_run.ts:

Schema Structure:

Staging Schema: app_data_* tables (production data)
Test Schema: test_app_data_* tables (test data)
Idempotent Constraints: All constraints are extracted into separate ALTER TABLE statements with existence checks

Deployment:

# Full setup (schema + seed data)
npm run db:init

# Schema only
npm run db:schema

# Seed data only
npm run db:seed

Connection Configuration:

Development: Uses password authentication when POSTGRES_USER and POSTGRES_PASSWORD are set
Production: Uses IAM authentication with service account credentials
Automatic Selection: Authentication method is automatically selected based on environment variables

See the Database Authentication section above for detailed connection setup.

Error Tracking with Sentry

The application includes Sentry integration for production error monitoring and performance tracking.

Configuration

Sentry is configured via environment variables in your .env file:

# Enable/disable Sentry
SENTRY_ENABLED=true

# Your Sentry DSN (get this from your Sentry project settings)
SENTRY_DSN=https://your-dsn-here@o4510131393986560.ingest.us.sentry.io/4510131396280320

# Optional: Release version for tracking
SENTRY_RELEASE=v1.0.0

Features

Automatic Error Capture: All unhandled errors are automatically sent to Sentry
Performance Monitoring: Request tracing and performance metrics
Custom Error Context: Request body, query parameters, and user context are included
Environment-Aware: Different sampling rates for development vs production
Security: PII data is only sent in non-production environments

Testing Sentry Integration

In development mode, you can test Sentry error tracking by visiting:

GET /test-sentry

This endpoint will throw a test error that should appear in your Sentry dashboard.

Production Considerations

Set SENTRY_ENABLED=true in production
Configure appropriate sampling rates for your traffic volume
Set up Sentry release tracking for better error correlation
Monitor Sentry quotas and adjust sampling rates as needed

Testing

This project includes a comprehensive testing suite built with Vitest:

Test Structure

Unit Tests (tests/unit/) - Test individual functions with mocks
Integration Tests (tests/integration/) - Test API endpoints and database operations
Test Utilities (tests/utils/) - Database helpers, mocks, and fixtures
Test Fixtures (tests/fixtures/) - Consistent test data

Key Features

Database Testing - Real PostgreSQL with automatic cleanup
Authentication Testing - Complete auth flow testing
CRUD Testing - Database operations with proper cleanup
Coverage Reporting - 70% minimum coverage requirement
CI/CD Integration - Automated testing on GitHub Actions

Running Tests

# Unit tests (no database required)
npm run test:unit

# Integration tests (requires PostgreSQL)
npm run test:integration

# All tests with coverage
npm run test:coverage

File Upload & Duplicate Handling

The file upload system includes intelligent duplicate detection and handling to optimize storage and processing.

Duplicate Detection

The system uses a three-tier duplicate detection approach, checking in this order:

Exact Path + Filename Match: Checks if a file with the same path and filename already exists
- If found, also checks if the content hash matches
- If content matches: Returns success immediately (file already uploaded)
- If content doesn't match: Treated as filename duplicate
Organization-Wide Content Hash Match: If no exact match, checks for same content anywhere in the organization
- Compares SHA-256 content hash across all documents in the organization
- Detects if duplicate exists in a different location
Organization-Wide Filename Match: If no content match, checks for same filename anywhere in the organization
- Compares original filename across all documents in the organization
- Detects if a file with the same name exists in a different location

Duplicate Types:

exact_match: Same path and filename (with optional content match indicator)
content: Same file content (hash match) anywhere in the organization
filename: Same filename anywhere in the organization (different content)

Handling Duplicates

When a duplicate is detected, the API returns a 409 Conflict response with duplicate information. The frontend presents the user with options based on the duplicate type:

Exact Match with Same Content

When a file with the same path, filename, and content is found:

No upload needed: Returns success immediately with existing file information
No user action required: Frontend shows a simple notice that the file is already uploaded
Processing status: completed (already processed)

Content Match (Different Location)

When a file with the same content exists in a different location:

Copy File (?copy=true): Creates a copy of the file in the new location
- The file is copied in object storage with all metadata intact
- The existing document record is updated to point to the new location (same document ID)
- No text extraction is triggered - uses the same document record and chunks
- The original file remains in its original location
Move File (?move=true): Moves the file to the new location
- The file is moved in object storage (original location deleted)
- The existing document record is updated to point to the new location (same document ID)
- No text extraction is triggered - uses the same document record and chunks
- The original file is removed from its original location

Filename Match (Different Location)

When a file with the same filename exists anywhere in the organization:

Upload as New (?upload_as_new=true): Uploads with a modified filename (e.g., filename-1.pdf)
- Creates a new document record
- Text extraction is triggered for the new file
- Original file remains unchanged
Replace Original (?replace=true): Replaces the existing file
- Old file version is preserved by object storage versioning (automatic)
- The old document record is archived in the database
- A new file is uploaded and processed
- Text extraction is triggered for the new file
Duplicate & Replace (?replace_with_new_chunks=true): Uploads new file, processes it, then updates old document
- New file is uploaded and processed (chunked)
- After successful processing, old document record is updated to point to new file
- Old chunks are deleted and replaced with new chunks
- Old file version is preserved by object storage versioning (automatic)
- Preserves old document ID - keeps existing references intact

API Endpoints

Project Upload:

POST /projects/:projectId/upload?copy=true|move=true|replace=true|upload_as_new=true|replace_with_new_chunks=true&subfolders=folder1/folder2

Organization Upload:

POST /organizations/:organizationId/upload?copy=true|move=true|replace=true|upload_as_new=true|replace_with_new_chunks=true&subfolders=folder1/folder2

Note: Files are stored using only user-created folder structure (no automatic organization/ or projects/ prefixes). The subfolders query parameter accepts a path string (e.g., folder1/folder2) to specify the target folder structure.

Response Format

Duplicate Detected (409 Conflict):

{
  "success": false,
  "error": "duplicate_file",
  "duplicateType": "exact_match" | "content" | "filename",
  "contentMatches": true,
  "existingFile": {
    "documentId": "uuid",
    "storagePath": "bucket/path/to/file",
    "filename": "original-filename.pdf",
    "uploadedAt": "2025-01-15T10:30:00Z",
    "projectId": "uuid" | null
  },
  "isDifferentLocation": true,
  "targetLocation": {
    "projectId": "uuid" | null,
    "storagePath": "gs://bucket/target/path"
  }
}

Successful Exact Match (Same Content):

{
  "success": true,
  "file": {
    "id": "document-uuid",
    "filename": "original-filename.pdf",
    "fileUrl": "bucket/path/to/file",
    "fileSize": 1024000,
    "contentType": "application/pdf",
    "processingStatus": "completed"
  }
}

Successful Copy:

{
  "success": true,
  "file": {
    "id": "document-uuid",
    "filename": "unique-filename.pdf",
    "fileUrl": "gs://bucket/new/path",
    "fileSize": 1024000,
    "contentType": "application/pdf",
    "processingStatus": "queued"
  }
}

Successful Move:

{
  "success": true,
  "file": {
    "id": "document-uuid",
    "filename": "unique-filename.pdf",
    "fileUrl": "gs://bucket/new/path",
    "fileSize": 1024000,
    "contentType": "application/pdf",
    "processingStatus": "queued"
  }
}

Successful Replace:

{
  "success": true,
  "file": {
    "id": "new-document-uuid",
    "filename": "unique-filename.pdf",
    "fileUrl": "gs://bucket/new/path",
    "fileSize": 1024000,
    "contentType": "application/pdf",
    "processingStatus": "queued"
  }
}

Successful Replace with New Chunks:

{
  "success": true,
  "file": {
    "id": "old-document-uuid",
    "filename": "unique-filename.pdf",
    "fileUrl": "gs://bucket/new/path",
    "fileSize": 1024000,
    "contentType": "application/pdf",
    "processingStatus": "completed"
  }
}

Implementation Details

Hash Algorithm: SHA-256 content hashing for duplicate detection
Storage: Files stored in object storage with unique filenames to prevent conflicts
Database: Document records in text_extractor_documents table
Archiving: Replaced document records are archived in text_extractor_documents_archive table for audit trail. Old file versions are automatically preserved by object storage versioning.
Text Extraction:
- Only triggered for new uploads, not for copies/moves (which reuse existing document records)
- Triggered for replace operations (new file needs processing)
- Triggered for replace_with_new_chunks (new file is processed, then old document is updated)
Chunk Management:
- Copy/move operations preserve existing chunks (same document ID)
- Replace operations create new chunks (new document ID)
- Replace with new chunks updates old document to use new chunks (preserves document ID)

Contributing

Follow the established coding standards
Use TypeScript for all new code
Maintain API versioning structure
Add appropriate error handling and validation
ALWAYS add tests for new functionality - See .cursor/rules/testing.mdc
Update documentation for new features

License

ISC

Features​

Technology Stack​

API Endpoints​

Authentication (/auth)​

Users (/users)​

Organizations (/organizations)​

Projects (/projects)​

Roles & Permissions (/roles, /permissions)​

Files (/files)​

Documents (/documents)​

Capabilities​

Skills (/skills)​

Conversations (/conversations)​

Getting Started​

Prerequisites​

Development Setup​

Hot Reloading​

Environment Variables​

Database Authentication​

API Documentation (Swagger/OpenAPI)​

Database Setup​

Development Guidelines​

Code Style​

API Structure​

Example Function Structure​

Storage Integration​

Storage Configuration​

Available Scripts​

Testing and Validation​

Scripts​

Testing Scripts​

Database​

Schema Management​

Error Tracking with Sentry​

Configuration​

Features​

Testing Sentry Integration​

Production Considerations​

Testing​

Test Structure​

Key Features​

Running Tests​

File Upload & Duplicate Handling​

Duplicate Detection​

Handling Duplicates​

Exact Match with Same Content​

Content Match (Different Location)​

Filename Match (Different Location)​

API Endpoints​

Response Format​

Implementation Details​

Contributing​

License​

Features

Technology Stack

API Endpoints

Authentication (`/auth`)

Users (`/users`)

Organizations (`/organizations`)

Projects (`/projects`)

Roles & Permissions (`/roles`, `/permissions`)

Files (`/files`)

Documents (`/documents`)

Capabilities

Skills (`/skills`)

Conversations (`/conversations`)

Getting Started

Prerequisites

Development Setup

Hot Reloading

Environment Variables

Database Authentication

API Documentation (Swagger/OpenAPI)

Database Setup

Development Guidelines

Code Style

API Structure

Example Function Structure

Storage Integration

Storage Configuration

Available Scripts

Testing and Validation

Scripts

Testing Scripts

Database

Schema Management

Error Tracking with Sentry

Configuration

Features

Testing Sentry Integration

Production Considerations

Testing

Test Structure

Key Features

Running Tests

File Upload & Duplicate Handling

Duplicate Detection

Handling Duplicates

Exact Match with Same Content

Content Match (Different Location)

Filename Match (Different Location)

API Endpoints

Response Format

Implementation Details

Contributing

License