Citemark API
A Node.js/TypeScript API server for managing AI workflows in the Citemark application. Built with Fastify and PostgreSQL, featuring JWT authentication and role-based access control.
Features
- Authentication & Authorization: JWT-based authentication with role-based permissions
- User Management: User registration, login, and profile management
- Organization Management: Multi-tenant organization support
- Project Management: Project creation and management within organizations
- Role-Based Access Control: Granular permissions system with custom roles
- AI Workflow Management: Core functionality for managing AI workflows
- Skills Framework: AI agent execution system with VoltAgent integration, workflow orchestration, MCP support, and shadow mode execution
Technology Stack
- Runtime: Node.js with TypeScript
- Framework: Fastify
- Database: PostgreSQL
- Authentication: JWT (JSON Web Tokens)
- Security: Helmet, rate limiting, secure sessions
- Error Tracking: Sentry integration for production monitoring
- Testing: Vitest with comprehensive unit and integration tests
API Endpoints
Authentication (/auth)
- User registration and login
- JWT token management
- Password reset functionality
Users (/users)
- User profile management
- User listing and search
- User role assignments
Organizations (/organizations)
- Organization creation and management
- Multi-tenant organization support
- Organization member management
Projects (/projects)
- Project creation within organizations
- Project management and configuration
- Project member assignments
Roles & Permissions (/roles, /permissions)
- Role-based access control
- Permission management
- User role assignments
Files (/files)
- File upload to projects and organizations
- File download and management
- Duplicate Detection: Three-tier detection (exact path+filename, content hash, filename) to prevent duplicate file processing
- Duplicate Handling Options: When duplicates are detected, users can copy, move, upload as new, replace, or replace with new chunks
- Text Extraction Integration: Automatically triggers text extraction via message queue when new files are uploaded (not triggered for copies/moves)
- Signed URLs:
- Project scope:
POST /file-management/projects/:projectId/files/signed-url(checks project access) - Organization scope:
POST /file-management/organizations/:organizationId/files/signed-url(checks org membership; no project required). Used by the PDF viewer when no project is selected or for org-level files. Returns 401 (no/invalid token), 403 (not an org member), 404 (file or bucket missing), 200 withsignedUrl/expiresAton success.
- Project scope:
Documents (/documents)
- Document Status Streaming:
GET /documents/:documentId/status/stream- Server-Sent Events (SSE) stream for real-time document extraction status updates - Document Status Polling:
GET /documents/:documentId/status- REST endpoint to fetch current document extraction status (used as fallback when SSE fails) - Document Retry:
POST /documents/:documentId/retry- Re-queue a failed document for full extraction processing (both fast and Docling phases) - Docling Retry:
POST /documents/:documentId/retry-docling- Re-queue a document for Docling enhanced extraction only (Phase 2). Only available whendocling_extraction_status = 'docling_failed' - Status Updates: Real-time progress updates published every 10% during extraction (10%, 20%, 30%, etc.)
- Progress Metadata: Includes
percent_complete,pages_processed,total_pages,chunks_stored, andstageinformation - Two-Phase Status: Supports
fast_extraction_statusanddocling_extraction_statusfields for tracking separate extraction phases
Capabilities
GET /capabilities— returns environment-level feature availability for the frontend (auth required)
Skills (/skills)
- Skills Framework: AI agent execution system using VoltAgent executor
- Skill Creation: Create skills via
POST /skills/authorAPI endpoint (Expert users). Skills are created directly in the Citemark app, not just in VoltAgent Console. - Skill Versions: Create draft versions via
POST /skills/:skillId/versions, update viaPATCH /skills/:skillId/versions/:versionId, and publish viaPOST /skills/:skillId/versions/:versionId/publish - Skill Execution:
POST /skills/:skillId/run— Execute skills with SSE streaming using VoltAgent executor - Tool Steps: Tool steps use LLM-driven invocation where the LLM decides whether to call tools based on constructed prompts
- Workflow Management: Suspend/resume workflows with checkpoint support for human-in-the-loop review
- MCP Integration: Model Context Protocol support for external tool access
- VoltAgent Docker Service: Separate Docker container (
voltagentservice in docker-compose.yml) running standalone Hono server for VoltAgent Console integration - See
docs/planning/voltagent-integration-plan.mdfor detailed architecture
Conversations (/conversations)
- Per-user, per-organization conversation persistence (PostgreSQL + localStorage hybrid on FE)
- Endpoints:
GET /conversations— list conversations (paginated)POST /conversations— create (enforces per-user/org cap)GET /conversations/:id— fetch with messages (cursor-based pagination)PATCH /conversations/:id— renameDELETE /conversations/:id— soft deletePOST /conversations/:id/messages— add messagePOST /conversations/sync— bulk sync (idempotent upsert, returnsmaxAcceptedTimestamp)GET /conversations/count— returns current count and cap
- Security: JWT auth + org ownership checks + RLS (
app.current_user_id,app.current_org_id) + per-route rate limits keyed by verified user/org. - Sync guardrails: max 100 messages per request; conflict-safe upserts;
updated_atonly bumps on actual inserts/updates or deletes.
Getting Started
Prerequisites
- Docker Desktop (no Node.js required on host)
- PostgreSQL database (can be containerized or external)
Development Setup
This service runs in a Docker container with bind mounts for hot reloading.
-
Start containers from the root directory:
docker compose up -d -
Access the API:
- Via API Gateway (recommended): http://localhost:3333
- Direct access:
- API: http://localhost:3000
- API Docs: http://localhost:3000/docs
Hot Reloading
Code changes on the host are immediately reflected in the container. TypeScript files are automatically recompiled using tsx watch:
- Edit files in
workspace-api/on your host machine - Changes are automatically detected by
tsx watch - Server restarts automatically with new code
Environment Variables
Configured in root .env file (unified for all services). See root example.env for all available variables.
Database Authentication
The application uses standard PostgreSQL username/password authentication with SSL:
-
Environment Variables Required:
POSTGRES_HOST: Database hostPOSTGRES_PORT: Database port (default:25060)POSTGRES_USER: Database username (e.g.,doadmin)POSTGRES_PASSWORD: Database passwordPOSTGRES_DATABASE: Database name (default:coda_app)POSTGRES_SSLMODE: SSL mode (default:require)POSTGRES_CA_CERT: Optional path to CA certificate for SSL verification
-
Usage: Set these variables in your
.envfile for both development and production
API Documentation (Swagger/OpenAPI)
- The API exposes OpenAPI docs via Swagger when enabled.
- Default behavior: enabled in non-production or when
WORKSPACE_API_DOCS_ENABLED=true. - UI route prefix is configurable with
WORKSPACE_API_DOCS_ROUTE(default/docs).
Environment variables:
WORKSPACE_API_DOCS_ENABLED=true # Force enable in any environment (default true in non-prod)
WORKSPACE_API_DOCS_ROUTE=/docs # Route for Swagger UI
Once the server is running:
- Swagger UI:
http://localhost:3000/docs - Raw OpenAPI JSON:
http://localhost:3000/docs/json
Database Setup
The database schema is managed via database/postgres_run.ts script:
Full Setup (Schema + Seed Data):
. database/init.sh
Or using npm:
npm run db:init
Schema Only:
npm run db:schema
Seed Data Only:
npm run db:seed
Schema Features:
- Idempotent Execution: Can be run multiple times safely - existing constraints and objects are skipped
- Dual Schema Support: Creates both staging (
app_data_*) and test (test_app_data_*) schemas - Extracted Constraints: All constraints are defined separately with existence checks for safe re-runs
Required Environment Variables
- Set up the database:
. database/init.sh
- Build and start the application:
npm run build
npm start
Development Guidelines
Code Style
- Use arrow functions wherever possible
- Extract environment variables at the top of scripts
- Pass objects to functions and destructure parameters instead of multiple arguments
- Export constants using
export const - Use single-line syntax when it improves readability
API Structure
- API endpoints are organized in versioned folders (e.g.,
/routes/) - New environment variables should be added to
/.envwith placeholder values - Follow the established patterns for authentication and error handling
Example Function Structure
// Good: Arrow function with object destructuring
export const createUser = async ({ email, firstName, lastName, ...additionalData }) => {
// Implementation
}
// Good: Single-line conditional
if (userExists) return existingUser
// Good: Environment variable extraction
const { NODE_ENV, DATABASE_URL } = process.env
Storage Integration
The application uses S3-compatible object storage for file storage. Storage operations use environment variables for consistent configuration.
Storage Configuration
Set the following environment variables in your .env file:
SPACES_ENDPOINT=your-spaces-endpoint
SPACES_REGION=nyc3
SPACES_KEY_ID=your-access-key-id
SPACES_SECRET_KEY=your-secret-access-key
SPACES_BUCKET_PREFIX=coda-org-
Available Scripts
Testing and Validation
npm run test:storage- Test object storage operationsnpm run test:db- Test database connection
Scripts
npm run build- Compile TypeScript to JavaScriptnpm start- Start the production servernpm run db:initor. database/init.sh- Set up database schema and initial data (full setup)npm run db:schema- Set up database schema onlynpm run db:seed- Insert seed data only
Testing Scripts
npm test- Run all tests in watch modenpm run test:run- Run all tests oncenpm run test:unit- Run unit tests only (no database required)npm run test:integration- Run integration tests (requires PostgreSQL)npm run test:coverage- Run tests with coverage reportnpm run test:auth- Run authentication testsnpm run test:users- Run user model tests
Database
The application uses PostgreSQL with a comprehensive schema for:
- User management and authentication
- Organization and project structure
- Role-based permissions system
- AI workflow data
Schema Management
The database schema is defined in database/postgres.sql and deployed via database/postgres_run.ts:
Schema Structure:
- Staging Schema:
app_data_*tables (production data) - Test Schema:
test_app_data_*tables (test data) - Idempotent Constraints: All constraints are extracted into separate
ALTER TABLEstatements with existence checks
Deployment:
# Full setup (schema + seed data)
npm run db:init
# Schema only
npm run db:schema
# Seed data only
npm run db:seed
Connection Configuration:
- Development: Uses password authentication when
POSTGRES_USERandPOSTGRES_PASSWORDare set - Production: Uses IAM authentication with service account credentials
- Automatic Selection: Authentication method is automatically selected based on environment variables
See the Database Authentication section above for detailed connection setup.
Error Tracking with Sentry
The application includes Sentry integration for production error monitoring and performance tracking.
Configuration
Sentry is configured via environment variables in your .env file:
# Enable/disable Sentry
SENTRY_ENABLED=true
# Your Sentry DSN (get this from your Sentry project settings)
SENTRY_DSN=https://your-dsn-here@o4510131393986560.ingest.us.sentry.io/4510131396280320
# Optional: Release version for tracking
SENTRY_RELEASE=v1.0.0
Features
- Automatic Error Capture: All unhandled errors are automatically sent to Sentry
- Performance Monitoring: Request tracing and performance metrics
- Custom Error Context: Request body, query parameters, and user context are included
- Environment-Aware: Different sampling rates for development vs production
- Security: PII data is only sent in non-production environments
Testing Sentry Integration
In development mode, you can test Sentry error tracking by visiting:
GET /test-sentry
This endpoint will throw a test error that should appear in your Sentry dashboard.
Production Considerations
- Set
SENTRY_ENABLED=truein production - Configure appropriate sampling rates for your traffic volume
- Set up Sentry release tracking for better error correlation
- Monitor Sentry quotas and adjust sampling rates as needed
Testing
This project includes a comprehensive testing suite built with Vitest:
Test Structure
- Unit Tests (
tests/unit/) - Test individual functions with mocks - Integration Tests (
tests/integration/) - Test API endpoints and database operations - Test Utilities (
tests/utils/) - Database helpers, mocks, and fixtures - Test Fixtures (
tests/fixtures/) - Consistent test data
Key Features
- Database Testing - Real PostgreSQL with automatic cleanup
- Authentication Testing - Complete auth flow testing
- CRUD Testing - Database operations with proper cleanup
- Coverage Reporting - 70% minimum coverage requirement
- CI/CD Integration - Automated testing on GitHub Actions
Running Tests
# Unit tests (no database required)
npm run test:unit
# Integration tests (requires PostgreSQL)
npm run test:integration
# All tests with coverage
npm run test:coverage
File Upload & Duplicate Handling
The file upload system includes intelligent duplicate detection and handling to optimize storage and processing.
Duplicate Detection
The system uses a three-tier duplicate detection approach, checking in this order:
-
Exact Path + Filename Match: Checks if a file with the same path and filename already exists
- If found, also checks if the content hash matches
- If content matches: Returns success immediately (file already uploaded)
- If content doesn't match: Treated as filename duplicate
-
Organization-Wide Content Hash Match: If no exact match, checks for same content anywhere in the organization
- Compares SHA-256 content hash across all documents in the organization
- Detects if duplicate exists in a different location
-
Organization-Wide Filename Match: If no content match, checks for same filename anywhere in the organization
- Compares original filename across all documents in the organization
- Detects if a file with the same name exists in a different location
Duplicate Types:
- exact_match: Same path and filename (with optional content match indicator)
- content: Same file content (hash match) anywhere in the organization
- filename: Same filename anywhere in the organization (different content)
Handling Duplicates
When a duplicate is detected, the API returns a 409 Conflict response with duplicate information. The frontend presents the user with options based on the duplicate type:
Exact Match with Same Content
When a file with the same path, filename, and content is found:
- No upload needed: Returns success immediately with existing file information
- No user action required: Frontend shows a simple notice that the file is already uploaded
- Processing status:
completed(already processed)
Content Match (Different Location)
When a file with the same content exists in a different location:
-
Copy File (
?copy=true): Creates a copy of the file in the new location- The file is copied in object storage with all metadata intact
- The existing document record is updated to point to the new location (same document ID)
- No text extraction is triggered - uses the same document record and chunks
- The original file remains in its original location
-
Move File (
?move=true): Moves the file to the new location- The file is moved in object storage (original location deleted)
- The existing document record is updated to point to the new location (same document ID)
- No text extraction is triggered - uses the same document record and chunks
- The original file is removed from its original location
Filename Match (Different Location)
When a file with the same filename exists anywhere in the organization:
-
Upload as New (
?upload_as_new=true): Uploads with a modified filename (e.g.,filename-1.pdf)- Creates a new document record
- Text extraction is triggered for the new file
- Original file remains unchanged
-
Replace Original (
?replace=true): Replaces the existing file- Old file version is preserved by object storage versioning (automatic)
- The old document record is archived in the database
- A new file is uploaded and processed
- Text extraction is triggered for the new file
-
Duplicate & Replace (
?replace_with_new_chunks=true): Uploads new file, processes it, then updates old document- New file is uploaded and processed (chunked)
- After successful processing, old document record is updated to point to new file
- Old chunks are deleted and replaced with new chunks
- Old file version is preserved by object storage versioning (automatic)
- Preserves old document ID - keeps existing references intact
API Endpoints
Project Upload:
POST /projects/:projectId/upload?copy=true|move=true|replace=true|upload_as_new=true|replace_with_new_chunks=true&subfolders=folder1/folder2
Organization Upload:
POST /organizations/:organizationId/upload?copy=true|move=true|replace=true|upload_as_new=true|replace_with_new_chunks=true&subfolders=folder1/folder2
Note: Files are stored using only user-created folder structure (no automatic organization/ or projects/ prefixes). The subfolders query parameter accepts a path string (e.g., folder1/folder2) to specify the target folder structure.
Response Format
Duplicate Detected (409 Conflict):
{
"success": false,
"error": "duplicate_file",
"duplicateType": "exact_match" | "content" | "filename",
"contentMatches": true,
"existingFile": {
"documentId": "uuid",
"storagePath": "bucket/path/to/file",
"filename": "original-filename.pdf",
"uploadedAt": "2025-01-15T10:30:00Z",
"projectId": "uuid" | null
},
"isDifferentLocation": true,
"targetLocation": {
"projectId": "uuid" | null,
"storagePath": "gs://bucket/target/path"
}
}
Successful Exact Match (Same Content):
{
"success": true,
"file": {
"id": "document-uuid",
"filename": "original-filename.pdf",
"fileUrl": "bucket/path/to/file",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "completed"
}
}
Successful Copy:
{
"success": true,
"file": {
"id": "document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "queued"
}
}
Successful Move:
{
"success": true,
"file": {
"id": "document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "queued"
}
}
Successful Replace:
{
"success": true,
"file": {
"id": "new-document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "queued"
}
}
Successful Replace with New Chunks:
{
"success": true,
"file": {
"id": "old-document-uuid",
"filename": "unique-filename.pdf",
"fileUrl": "gs://bucket/new/path",
"fileSize": 1024000,
"contentType": "application/pdf",
"processingStatus": "completed"
}
}
Implementation Details
- Hash Algorithm: SHA-256 content hashing for duplicate detection
- Storage: Files stored in object storage with unique filenames to prevent conflicts
- Database: Document records in
text_extractor_documentstable - Archiving: Replaced document records are archived in
text_extractor_documents_archivetable for audit trail. Old file versions are automatically preserved by object storage versioning. - Text Extraction:
- Only triggered for new uploads, not for copies/moves (which reuse existing document records)
- Triggered for replace operations (new file needs processing)
- Triggered for replace_with_new_chunks (new file is processed, then old document is updated)
- Chunk Management:
- Copy/move operations preserve existing chunks (same document ID)
- Replace operations create new chunks (new document ID)
- Replace with new chunks updates old document to use new chunks (preserves document ID)
Contributing
- Follow the established coding standards
- Use TypeScript for all new code
- Maintain API versioning structure
- Add appropriate error handling and validation
- ALWAYS add tests for new functionality - See
.cursor/rules/testing.mdc - Update documentation for new features
License
ISC