Skip to main content

Citemark API

A Node.js/TypeScript API server for managing AI workflows in the Citemark application. Built with Fastify and PostgreSQL, featuring JWT authentication and role-based access control.

Features

  • Authentication & Authorization: JWT-based authentication with role-based permissions
  • User Management: User registration, login, and profile management
  • Organization Management: Multi-tenant organization support
  • Project Management: Project creation and management within organizations
  • Role-Based Access Control: Granular permissions system with custom roles
  • AI Workflow Management: Core functionality for managing AI workflows
  • Skills Framework: AI agent execution system with VoltAgent integration, workflow orchestration, MCP support, and shadow mode execution

Technology Stack

  • Runtime: Node.js with TypeScript
  • Framework: Fastify
  • Database: PostgreSQL
  • Authentication: JWT (JSON Web Tokens)
  • Security: Helmet, rate limiting, secure sessions
  • Error Tracking: Sentry integration for production monitoring
  • Testing: Vitest with comprehensive unit and integration tests

API Endpoints

Authentication (/auth)

  • User registration and login
  • JWT token management
  • Password reset functionality

Users (/users)

  • User profile management
  • User listing and search
  • User role assignments

Organizations (/organizations)

  • Organization creation and management
  • Multi-tenant organization support
  • Organization member management

Projects (/projects)

  • Project creation within organizations
  • Project management and configuration
  • Project member assignments

Roles & Permissions (/roles, /permissions)

  • Role-based access control
  • Permission management
  • User role assignments

Files (/files)

  • File upload to projects and organizations
  • File download and management
  • Duplicate Detection: Three-tier detection (exact path+filename, content hash, filename) to prevent duplicate file processing
  • Duplicate Handling Options: When duplicates are detected, users can copy, move, upload as new, replace, or replace with new chunks
  • Text Extraction Integration: Automatically triggers text extraction via message queue when new files are uploaded (not triggered for copies/moves)
  • Signed URLs:
    • Project scope: POST /file-management/projects/:projectId/files/signed-url (checks project access)
    • Organization scope: POST /file-management/organizations/:organizationId/files/signed-url (checks org membership; no project required). Used by the PDF viewer when no project is selected or for org-level files. Returns 401 (no/invalid token), 403 (not an org member), 404 (file or bucket missing), 200 with signedUrl/expiresAt on success.

Documents (/documents)

  • Document Status Streaming: GET /documents/:documentId/status/stream - Server-Sent Events (SSE) stream for real-time document extraction status updates
  • Document Status Polling: GET /documents/:documentId/status - REST endpoint to fetch current document extraction status (used as fallback when SSE fails)
  • Document Retry: POST /documents/:documentId/retry - Re-queue a failed document for full extraction processing (both fast and Docling phases)
  • Docling Retry: POST /documents/:documentId/retry-docling - Re-queue a document for Docling enhanced extraction only (Phase 2). Only available when docling_extraction_status = 'docling_failed'
  • Status Updates: Real-time progress updates published every 10% during extraction (10%, 20%, 30%, etc.)
  • Progress Metadata: Includes percent_complete, pages_processed, total_pages, chunks_stored, and stage information
  • Two-Phase Status: Supports fast_extraction_status and docling_extraction_status fields for tracking separate extraction phases

Capabilities

  • GET /capabilities — returns environment-level feature availability for the frontend (auth required)

Skills (/skills)

  • Skills Framework: AI agent execution system using VoltAgent executor
  • Skill Creation: Create skills via POST /skills/author API endpoint (Expert users). Skills are created directly in the Citemark app, not just in VoltAgent Console.
  • Skill Versions: Create draft versions via POST /skills/:skillId/versions, update via PATCH /skills/:skillId/versions/:versionId, and publish via POST /skills/:skillId/versions/:versionId/publish
  • Skill Execution: POST /skills/:skillId/run — Execute skills with SSE streaming using VoltAgent executor
  • Tool Steps: Tool steps use LLM-driven invocation where the LLM decides whether to call tools based on constructed prompts
  • Workflow Management: Suspend/resume workflows with checkpoint support for human-in-the-loop review
  • MCP Integration: Model Context Protocol support for external tool access
  • VoltAgent Docker Service: Separate Docker container (voltagent service in docker-compose.yml) running standalone Hono server for VoltAgent Console integration
  • See docs/planning/voltagent-integration-plan.md for detailed architecture

Conversations (/conversations)

  • Per-user, per-organization conversation persistence (PostgreSQL + localStorage hybrid on FE)
  • Endpoints:
    • GET /conversations — list conversations (paginated)
    • POST /conversations — create (enforces per-user/org cap)
    • GET /conversations/:id — fetch with messages (cursor-based pagination)
    • PATCH /conversations/:id — rename
    • DELETE /conversations/:id — soft delete
    • POST /conversations/:id/messages — add message
    • POST /conversations/sync — bulk sync (idempotent upsert, returns maxAcceptedTimestamp)
    • GET /conversations/count — returns current count and cap
  • Security: JWT auth + org ownership checks + RLS (app.current_user_id, app.current_org_id) + per-route rate limits keyed by verified user/org.
  • Sync guardrails: max 100 messages per request; conflict-safe upserts; updated_at only bumps on actual inserts/updates or deletes.

Getting Started

Prerequisites

  • Docker Desktop (no Node.js required on host)
  • PostgreSQL database (can be containerized or external)

Development Setup

This service runs in a Docker container with bind mounts for hot reloading.

  1. Start containers from the root directory:

    docker compose up -d
  2. Access the API:

Hot Reloading

Code changes on the host are immediately reflected in the container. TypeScript files are automatically recompiled using tsx watch:

  • Edit files in workspace-api/ on your host machine
  • Changes are automatically detected by tsx watch
  • Server restarts automatically with new code

Environment Variables

Configured in root .env file (unified for all services). See root example.env for all available variables.

Database Authentication

The application uses standard PostgreSQL username/password authentication with SSL:

  • Environment Variables Required:

    • POSTGRES_HOST: Database host
    • POSTGRES_PORT: Database port (default: 25060)
    • POSTGRES_USER: Database username (e.g., doadmin)
    • POSTGRES_PASSWORD: Database password
    • POSTGRES_DATABASE: Database name (default: coda_app)
    • POSTGRES_SSLMODE: SSL mode (default: require)
    • POSTGRES_CA_CERT: Optional path to CA certificate for SSL verification
  • Usage: Set these variables in your .env file for both development and production

API Documentation (Swagger/OpenAPI)

  • The API exposes OpenAPI docs via Swagger when enabled.
  • Default behavior: enabled in non-production or when WORKSPACE_API_DOCS_ENABLED=true.
  • UI route prefix is configurable with WORKSPACE_API_DOCS_ROUTE (default /docs).

Environment variables:

WORKSPACE_API_DOCS_ENABLED=true   # Force enable in any environment (default true in non-prod)
WORKSPACE_API_DOCS_ROUTE=/docs # Route for Swagger UI

Once the server is running:

  • Swagger UI: http://localhost:3000/docs
  • Raw OpenAPI JSON: http://localhost:3000/docs/json

Database Setup

The database schema is managed via database/postgres_run.ts script:

Full Setup (Schema + Seed Data):

. database/init.sh

Or using npm:

npm run db:init

Schema Only:

npm run db:schema

Seed Data Only:

npm run db:seed

Schema Features:

  • Idempotent Execution: Can be run multiple times safely - existing constraints and objects are skipped
  • Dual Schema Support: Creates both staging (app_data_*) and test (test_app_data_*) schemas
  • Extracted Constraints: All constraints are defined separately with existence checks for safe re-runs

Required Environment Variables

  1. Set up the database:
. database/init.sh
  1. Build and start the application:
npm run build
npm start

Development Guidelines

Code Style

  • Use arrow functions wherever possible
  • Extract environment variables at the top of scripts
  • Pass objects to functions and destructure parameters instead of multiple arguments
  • Export constants using export const
  • Use single-line syntax when it improves readability

API Structure

  • API endpoints are organized in versioned folders (e.g., /routes/)
  • New environment variables should be added to /.env with placeholder values
  • Follow the established patterns for authentication and error handling

Example Function Structure

// Good: Arrow function with object destructuring
export const createUser = async ({ email, firstName, lastName, ...additionalData }) => {
// Implementation
}

// Good: Single-line conditional
if (userExists) return existingUser

// Good: Environment variable extraction
const { NODE_ENV, DATABASE_URL } = process.env

Storage Integration

The application uses S3-compatible object storage for file storage. Storage operations use environment variables for consistent configuration.

Storage Configuration

Set the following environment variables in your .env file:

SPACES_ENDPOINT=your-spaces-endpoint
SPACES_REGION=nyc3
SPACES_KEY_ID=your-access-key-id
SPACES_SECRET_KEY=your-secret-access-key
SPACES_BUCKET_PREFIX=coda-org-

Available Scripts

Testing and Validation

  • npm run test:storage - Test object storage operations
  • npm run test:db - Test database connection

Scripts

  • npm run build - Compile TypeScript to JavaScript
  • npm start - Start the production server
  • npm run db:init or . database/init.sh - Set up database schema and initial data (full setup)
  • npm run db:schema - Set up database schema only
  • npm run db:seed - Insert seed data only

Testing Scripts

  • npm test - Run all tests in watch mode
  • npm run test:run - Run all tests once
  • npm run test:unit - Run unit tests only (no database required)
  • npm run test:integration - Run integration tests (requires PostgreSQL)
  • npm run test:coverage - Run tests with coverage report
  • npm run test:auth - Run authentication tests
  • npm run test:users - Run user model tests

Database

The application uses PostgreSQL with a comprehensive schema for:

  • User management and authentication
  • Organization and project structure
  • Role-based permissions system
  • AI workflow data

Schema Management

The database schema is defined in database/postgres.sql and deployed via database/postgres_run.ts:

Schema Structure:

  • Staging Schema: app_data_* tables (production data)
  • Test Schema: test_app_data_* tables (test data)
  • Idempotent Constraints: All constraints are extracted into separate ALTER TABLE statements with existence checks

Deployment:

# Full setup (schema + seed data)
npm run db:init

# Schema only
npm run db:schema

# Seed data only
npm run db:seed

Connection Configuration:

  • Development: Uses password authentication when POSTGRES_USER and POSTGRES_PASSWORD are set
  • Production: Uses IAM authentication with service account credentials
  • Automatic Selection: Authentication method is automatically selected based on environment variables

See the Database Authentication section above for detailed connection setup.

Error Tracking with Sentry

The application includes Sentry integration for production error monitoring and performance tracking.

Configuration

Sentry is configured via environment variables in your .env file:

# Enable/disable Sentry
SENTRY_ENABLED=true

# Your Sentry DSN (get this from your Sentry project settings)
SENTRY_DSN=https://your-dsn-here@o4510131393986560.ingest.us.sentry.io/4510131396280320

# Optional: Release version for tracking
SENTRY_RELEASE=v1.0.0

Features

  • Automatic Error Capture: All unhandled errors are automatically sent to Sentry
  • Performance Monitoring: Request tracing and performance metrics
  • Custom Error Context: Request body, query parameters, and user context are included
  • Environment-Aware: Different sampling rates for development vs production
  • Security: PII data is only sent in non-production environments

Testing Sentry Integration

In development mode, you can test Sentry error tracking by visiting:

GET /test-sentry

This endpoint will throw a test error that should appear in your Sentry dashboard.

Production Considerations

  • Set SENTRY_ENABLED=true in production
  • Configure appropriate sampling rates for your traffic volume
  • Set up Sentry release tracking for better error correlation
  • Monitor Sentry quotas and adjust sampling rates as needed

Testing

This project includes a comprehensive testing suite built with Vitest:

Test Structure

  • Unit Tests (tests/unit/) - Test individual functions with mocks
  • Integration Tests (tests/integration/) - Test API endpoints and database operations
  • Test Utilities (tests/utils/) - Database helpers, mocks, and fixtures
  • Test Fixtures (tests/fixtures/) - Consistent test data

Key Features

  • Database Testing - Real PostgreSQL with automatic cleanup
  • Authentication Testing - Complete auth flow testing
  • CRUD Testing - Database operations with proper cleanup
  • Coverage Reporting - 70% minimum coverage requirement
  • CI/CD Integration - Automated testing on GitHub Actions

Running Tests

# Unit tests (no database required)
npm run test:unit

# Integration tests (requires PostgreSQL)
npm run test:integration

# All tests with coverage
npm run test:coverage

File Upload & Duplicate Handling

The file upload system includes intelligent duplicate detection and handling to optimize storage and processing.

Duplicate Detection

The system uses a three-tier duplicate detection approach, checking in this order:

  1. Exact Path + Filename Match: Checks if a file with the same path and filename already exists

    • If found, also checks if the content hash matches
    • If content matches: Returns success immediately (file already uploaded)
    • If content doesn't match: Treated as filename duplicate
  2. Organization-Wide Content Hash Match: If no exact match, checks for same content anywhere in the organization

    • Compares SHA-256 content hash across all documents in the organization
    • Detects if duplicate exists in a different location
  3. Organization-Wide Filename Match: If no content match, checks for same filename anywhere in the organization

    • Compares original filename across all documents in the organization
    • Detects if a file with the same name exists in a different location

Duplicate Types:

  • exact_match: Same path and filename (with optional content match indicator)
  • content: Same file content (hash match) anywhere in the organization
  • filename: Same filename anywhere in the organization (different content)

Handling Duplicates

When a duplicate is detected, the API returns a 409 Conflict response with duplicate information. The frontend presents the user with options based on the duplicate type:

Exact Match with Same Content

When a file with the same path, filename, and content is found:

  • No upload needed: Returns success immediately with existing file information
  • No user action required: Frontend shows a simple notice that the file is already uploaded
  • Processing status: completed (already processed)

Content Match (Different Location)

When a file with the same content exists in a different location:

  • Copy File (?copy=true): Creates a copy of the file in the new location

    • The file is copied in object storage with all metadata intact
    • The existing document record is updated to point to the new location (same document ID)
    • No text extraction is triggered - uses the same document record and chunks
    • The original file remains in its original location
  • Move File (?move=true): Moves the file to the new location

    • The file is moved in object storage (original location deleted)
    • The existing document record is updated to point to the new location (same document ID)
    • No text extraction is triggered - uses the same document record and chunks
    • The original file is removed from its original location

Filename Match (Different Location)

When a file with the same filename exists anywhere in the organization:

  • Upload as New (?upload_as_new=true): Uploads with a modified filename (e.g., filename-1.pdf)

    • Creates a new document record
    • Text extraction is triggered for the new file
    • Original file remains unchanged
  • Replace Original (?replace=true): Replaces the existing file

    • Old file version is preserved by object storage versioning (automatic)
    • The old document record is archived in the database
    • A new file is uploaded and processed
    • Text extraction is triggered for the new file
  • Duplicate & Replace (?replace_with_new_chunks=true): Uploads new file, processes it, then updates old document

    • New file is uploaded and processed (chunked)
    • After successful processing, old document record is updated to point to new file
    • Old chunks are deleted and replaced with new chunks
    • Old file version is preserved by object storage versioning (automatic)
    • Preserves old document ID - keeps existing references intact

API Endpoints

Project Upload:

POST /projects/:projectId/upload?copy=true|move=true|replace=true|upload_as_new=true|replace_with_new_chunks=true&subfolders=folder1/folder2

Organization Upload:

POST /organizations/:organizationId/upload?copy=true|move=true|replace=true|upload_as_new=true|replace_with_new_chunks=true&subfolders=folder1/folder2

Note: Files are stored using only user-created folder structure (no automatic organization/ or projects/ prefixes). The subfolders query parameter accepts a path string (e.g., folder1/folder2) to specify the target folder structure.

Response Format

Duplicate Detected (409 Conflict):

{
"success": false,
"error": "duplicate_file",
"duplicateType": "exact_match" | "content" | "filename",
"contentMatches": true,
"existingFile": {
"documentId": "uuid",
"storagePath": "bucket/path/to/file",
"filename": "original-filename.pdf",
"uploadedAt": "2025-01-15T10:30:00Z",
"projectId": "uuid" | null
},
"isDifferentLocation": true,
"targetLocation": {
"projectId": "uuid" | null,
"storagePath": "gs://bucket/target/path"
}
}

Successful Exact Match (Same Content):

{
"success": true,
"file": {
"id": "document-uuid",
"filename": "original-filename.pdf",
"fileUrl": "bucket/path/to/file",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "completed"
}
}

Successful Copy:

{
"success": true,
"file": {
"id": "document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "queued"
}
}

Successful Move:

{
"success": true,
"file": {
"id": "document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "queued"
}
}

Successful Replace:

{
"success": true,
"file": {
"id": "new-document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "queued"
}
}

Successful Replace with New Chunks:

{
"success": true,
"file": {
"id": "old-document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "completed"
}
}

Implementation Details

  • Hash Algorithm: SHA-256 content hashing for duplicate detection
  • Storage: Files stored in object storage with unique filenames to prevent conflicts
  • Database: Document records in text_extractor_documents table
  • Archiving: Replaced document records are archived in text_extractor_documents_archive table for audit trail. Old file versions are automatically preserved by object storage versioning.
  • Text Extraction:
    • Only triggered for new uploads, not for copies/moves (which reuse existing document records)
    • Triggered for replace operations (new file needs processing)
    • Triggered for replace_with_new_chunks (new file is processed, then old document is updated)
  • Chunk Management:
    • Copy/move operations preserve existing chunks (same document ID)
    • Replace operations create new chunks (new document ID)
    • Replace with new chunks updates old document to use new chunks (preserves document ID)

Contributing

  1. Follow the established coding standards
  2. Use TypeScript for all new code
  3. Maintain API versioning structure
  4. Add appropriate error handling and validation
  5. ALWAYS add tests for new functionality - See .cursor/rules/testing.mdc
  6. Update documentation for new features

License

ISC