A Next.js application that uses advanced AI technology to analyze, interpret, and extract insights from professional documents. Features employee/employer authentication, document upload and management, AI-powered chat, and comprehensive predictive document analysis that identifies missing documents, provides recommendations, and suggests related content.
- Missing Document Detection: AI automatically identifies critical documents that should be present but are missing
- Priority Assessment: Categorizes missing documents by priority (high, medium, low) for efficient workflow management
- Smart Recommendations: Provides actionable recommendations for document organization and compliance
- Related Document Suggestions: Suggests relevant external resources and related documents
- Page-Level Analysis: Pinpoints specific pages where missing documents are referenced
- Real-time Analysis: Instant analysis with caching for improved performance
- Comprehensive Reporting: Detailed breakdown of analysis results with actionable insights
- Advanced AI algorithms analyze documents and extract key information
- OCR Processing: Optional advanced OCR using Datalab Marker API for scanned documents and images
- AI-Powered Chat: Interactive chat interface for document-specific questions and insights
- Web Search Agent: Modern UI with Tailwind CSS
- Role-Based Authentication: Separate interfaces for employees and employers using Clerk
- Document Management: Upload, organize, and manage documents with category support
- Employee Management: Employer dashboard for managing employee access and approvals
- Real-time Chat History: Persistent chat sessions for each document
- Responsive Design: Modern UI with Tailwind CSS
The Predictive Document Analysis feature is the cornerstone of PDR AI, providing intelligent document management and compliance assistance:
- Document Upload: Upload your professional documents (PDFs, contracts, manuals, etc.)
- AI Analysis: Our advanced AI scans through the document content and structure
- Missing Document Detection: Identifies references to documents that should be present but aren't
- Priority Classification: Automatically categorizes findings by importance and urgency
- Smart Recommendations: Provides specific, actionable recommendations for document management
- Related Content: Suggests relevant external resources and related documents
- Compliance Assurance: Never miss critical documents required for compliance
- Workflow Optimization: Streamline document management with AI-powered insights
- Risk Mitigation: Identify potential gaps in documentation before they become issues
- Time Savings: Automated analysis saves hours of manual document review
- Proactive Management: Stay ahead of document requirements and deadlines
The system provides comprehensive analysis including:
- Missing Documents Count: Total number of missing documents identified
- High Priority Items: Critical documents requiring immediate attention
- Recommendations: Specific actions to improve document organization
- Suggested Related Documents: External resources and related content
- Page References: Exact page numbers where missing documents are mentioned
PDR AI includes optional advanced OCR (Optical Character Recognition) capabilities for processing scanned documents, images, and PDFs with poor text extraction:
- Scanned Documents: Physical documents that have been scanned to PDF
- Image-based PDFs: PDFs that contain images of text rather than actual text
- Poor Quality Documents: Documents with low-quality text that standard extraction can't read
- Handwritten Content: Documents with handwritten notes or forms (with AI assistance)
- Mixed Content: Documents combining text, images, tables, and diagrams
Backend Infrastructure:
-
Environment Configuration: Set
DATALAB_API_KEYin your.envfile (optional) -
Database Schema: Tracks OCR status with fields:
ocrEnabled: Boolean flag indicating if OCR was requestedocrProcessed: Boolean flag indicating if OCR completed successfullyocrMetadata: JSON field storing OCR processing details (page count, processing time, etc.)
-
OCR Service Module (
src/app/api/services/ocrService.ts):- Complete Datalab Marker API integration
- Asynchronous submission and polling architecture
- Configurable processing options (force_ocr, use_llm, output_format)
- Comprehensive error handling and retry logic
- Timeout management (5 minutes default)
-
Upload API Enhancement (
src/app/api/uploadDocument/route.ts):- Dual-path processing:
- OCR Path: Uses Datalab Marker API when
enableOCR=true - Standard Path: Uses traditional PDFLoader for regular PDFs
- OCR Path: Uses Datalab Marker API when
- Unified chunking and embedding pipeline
- Stores OCR metadata with document records
- Dual-path processing:
Frontend Integration:
- Upload Form UI: OCR checkbox appears when
DATALAB_API_KEYis configured - Form Validation: Schema validates
enableOCRfield - User Guidance: Help text explains when to use OCR
- Dark Theme Support: Custom checkbox styling for both light and dark modes
// Standard PDF Upload (enableOCR: false or not set)
1. Download PDF from URL
2. Extract text using PDFLoader
3. Split into chunks
4. Generate embeddings
5. Store in database
// OCR-Enhanced Upload (enableOCR: true)
1. Download PDF from URL
2. Submit to Datalab Marker API
3. Poll for completion (up to 5 minutes)
4. Receive markdown/HTML/JSON output
5. Split into chunks
6. Generate embeddings
7. Store in database with OCR metadatainterface OCROptions {
force_ocr?: boolean; // Force OCR even if text exists
use_llm?: boolean; // Use AI for better accuracy
output_format?: 'markdown' | 'json' | 'html'; // Output format
strip_existing_ocr?: boolean; // Remove existing OCR layer
}-
Configure API Key (one-time setup):
DATALAB_API_KEY=your_datalab_api_key
-
Upload Document with OCR:
- Navigate to the employer upload page
- Select your document
- Check the "Enable OCR Processing" checkbox
- Upload the document
- System will process with OCR and notify when complete
-
Monitor Processing:
- OCR processing typically takes 1-3 minutes
- Progress is tracked in backend logs
- Document becomes available once processing completes
| Feature | Standard Processing | OCR Processing |
|---|---|---|
| Best For | Digital PDFs with embedded text | Scanned documents, images |
| Processing Time | < 10 seconds | 1-3 minutes |
| Accuracy | High for digital text | High for scanned/image text |
| Cost | Free (OpenAI embeddings only) | Requires Datalab API credits |
| Handwriting Support | No | Yes (with AI assistance) |
| Table Extraction | Basic | Advanced |
| Image Analysis | No | Yes |
The OCR system includes comprehensive error handling:
- API connection failures
- Timeout management (5-minute limit)
- Retry logic for transient errors
- Graceful fallback messages
- Detailed error logging
The predictive analysis feature automatically scans uploaded documents and provides comprehensive insights:
{
"success": true,
"documentId": 123,
"analysisType": "predictive",
"summary": {
"totalMissingDocuments": 5,
"highPriorityItems": 2,
"totalRecommendations": 3,
"totalSuggestedRelated": 4,
"analysisTimestamp": "2024-01-15T10:30:00Z"
},
"analysis": {
"missingDocuments": [
{
"documentName": "Employee Handbook",
"documentType": "Policy Document",
"reason": "Referenced in section 2.1 but not found in uploaded documents",
"page": 15,
"priority": "high",
"suggestedLinks": [
{
"title": "Sample Employee Handbook Template",
"link": "https://example.com/handbook-template",
"snippet": "Comprehensive employee handbook template..."
}
]
}
],
"recommendations": [
"Consider implementing a document version control system",
"Review document retention policies for compliance",
"Establish regular document audit procedures"
],
"suggestedRelatedDocuments": [
{
"title": "Document Management Best Practices",
"link": "https://example.com/best-practices",
"snippet": "Industry standards for document organization..."
}
]
}
}- Upload Documents: Use the employer dashboard to upload your documents
- Run Analysis: Click the "Predictive Analysis" tab in the document viewer
- Review Results: Examine missing documents, recommendations, and suggestions
- Take Action: Follow the provided recommendations and suggested links
- Track Progress: Re-run analysis to verify improvements
Ask questions about your documents and get AI-powered responses:
// Example API call for document Q&A
const response = await fetch('/api/LangChain', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
question: "What are the key compliance requirements mentioned?",
documentId: 123,
style: "professional" // or "casual", "technical", "summary"
})
});- Contract Management: Identify missing clauses, attachments, and referenced documents
- Regulatory Compliance: Ensure all required documentation is present and up-to-date
- Due Diligence: Comprehensive document review for mergers and acquisitions
- Risk Assessment: Identify potential legal risks from missing documentation
- Employee Documentation: Ensure all required employee documents are collected
- Policy Compliance: Verify policy documents are complete and current
- Onboarding Process: Streamline new employee documentation requirements
- Audit Preparation: Prepare for HR audits with confidence
- Financial Reporting: Ensure all supporting documents are included
- Audit Trail: Maintain complete documentation for financial audits
- Compliance Reporting: Meet regulatory requirements for document retention
- Process Documentation: Streamline financial process documentation
- Patient Records: Ensure complete patient documentation
- Regulatory Compliance: Meet healthcare documentation requirements
- Quality Assurance: Maintain high standards for medical documentation
- Risk Management: Identify potential documentation gaps
- Automated Analysis: Reduce manual document review time by 80%
- Instant Insights: Get immediate feedback on document completeness
- Proactive Management: Address issues before they become problems
- Compliance Assurance: Never miss critical required documents
- Error Prevention: Catch documentation gaps before they cause issues
- Audit Readiness: Always be prepared for regulatory audits
- Standardized Workflows: Establish consistent document management processes
- Quality Control: Maintain high standards for document organization
- Continuous Improvement: Use AI insights to optimize processes
- Document Review Time: 80% reduction in manual review time
- Compliance Risk: 95% reduction in missing document incidents
- Audit Preparation: 90% faster audit preparation time
- Process Efficiency: 70% improvement in document management workflows
- Framework: Next.js 15 with TypeScript
- Authentication: Clerk
- Database: PostgreSQL with Drizzle ORM
- AI Integration: OpenAI + LangChain
- OCR Processing: Datalab Marker API (optional)
- File Upload: UploadThing
- Styling: Tailwind CSS
- Package Manager: pnpm
Before you begin, ensure you have the following installed:
- Node.js (version 18.0 or higher)
- pnpm (recommended) or npm
- Docker (for local database)
- Git
git clone <repository-url>
cd pdr_ai_v2-2pnpm installCreate a .env file in the root directory with the following variables:
# Database Configuration
# Format: postgresql://[user]:[password]@[host]:[port]/[database]
# For local development using Docker: postgresql://postgres:password@localhost:5432/pdr_ai_v2
# For production: Use your production PostgreSQL connection string
DATABASE_URL="postgresql://postgres:password@localhost:5432/pdr_ai_v2"
# Clerk Authentication (get from https://clerk.com/)
# Required for user authentication and authorization
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
CLERK_SECRET_KEY=your_clerk_secret_key
# Clerk Force Redirect URLs (Optional - for custom redirect after authentication)
# These URLs control where users are redirected after sign in/up/sign out
# If not set, Clerk will use default redirect behavior
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL=https://your-domain.com/employer/home
NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL=https://your-domain.com/signup
NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL=https://your-domain.com/
# OpenAI API (get from https://platform.openai.com/)
# Required for AI features: document analysis, embeddings, chat functionality
OPENAI_API_KEY=your_openai_api_key
# LangChain (get from https://smith.langchain.com/)
# Optional: Required for LangSmith tracing and monitoring of LangChain operations
# LangSmith provides observability, debugging, and monitoring for LangChain applications
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langchain_api_key
# Tavily Search API (get from https://tavily.com/)
# Optional: Required for enhanced web search capabilities in document analysis
# Used for finding related documents and external resources
TAVILY_API_KEY=your_tavily_api_key
# Datalab Marker API (get from https://www.datalab.to/)
# Optional: Required for advanced OCR processing of scanned documents
# Enables OCR checkbox in document upload interface
DATALAB_API_KEY=your_datalab_api_key
# UploadThing (get from https://uploadthing.com/)
# Required for file uploads (PDF documents)
UPLOADTHING_SECRET=your_uploadthing_secret
UPLOADTHING_APP_ID=your_uploadthing_app_id
# Environment Configuration
# Options: development, test, production
NODE_ENV=development
# Optional: Skip environment validation (useful for Docker builds)
# Set to "true" to skip validation during build
# SKIP_ENV_VALIDATION=false# Make the script executable
chmod +x start-database.sh
# Start the database container
./start-database.shThis will:
- Create a Docker container with PostgreSQL
- Set up the database with proper credentials
- Generate a secure password if using default settings
# Generate migration files
pnpm db:generate
# Apply migrations to database
pnpm db:migrate
# Alternative: Push schema directly (for development)
pnpm db:push- Create account at Clerk
- Create a new application
- Copy the publishable and secret keys to your
.envfile - Configure sign-in/sign-up methods as needed
- Create account at OpenAI
- Generate an API key
- Add the key to your
.envfile
- Create account at LangSmith
- Generate an API key from your account settings
- Set
LANGCHAIN_TRACING_V2=trueand addLANGCHAIN_API_KEYto your.envfile - This enables tracing and monitoring of LangChain operations for debugging and observability
- Create account at Tavily
- Generate an API key from your dashboard
- Add
TAVILY_API_KEYto your.envfile - Used for enhanced web search capabilities in document analysis features
- Create account at Datalab
- Navigate to the API section and generate an API key
- Add
DATALAB_API_KEYto your.envfile - Enables advanced OCR processing for scanned documents and images in PDFs
- When configured, an OCR checkbox will appear in the document upload interface
- Create account at UploadThing
- Create a new app
- Copy the secret and app ID to your
.envfile
pnpm devThe application will be available at http://localhost:3000
# Build the application
pnpm build
# Start production server
pnpm startBefore deploying, ensure you have:
- β All environment variables configured
- β Production database set up (PostgreSQL with pgvector extension)
- β API keys for all external services
- β Domain name configured (if using custom domain)
Vercel is the recommended platform for Next.js applications:
Steps:
-
Push your code to GitHub
git push origin main
-
Import repository on Vercel
- Go to vercel.com and sign in
- Click "Add New Project"
- Import your GitHub repository
-
Set up Database and Environment Variables
Database Setup:
Option A: Using Vercel Postgres (Recommended)
- In Vercel dashboard, go to Storage β Create Database β Postgres
- Choose a region and create the database
- Vercel will automatically create the
DATABASE_URLenvironment variable - Enable pgvector extension: Connect to your database and run
CREATE EXTENSION IF NOT EXISTS vector;
Option B: Using Neon Database (Recommended for pgvector support)
- Create a Neon account at neon.tech if you don't have one
- Create a new project in Neon dashboard
- Choose PostgreSQL version 14 or higher
- In Vercel dashboard, go to your project β Storage tab
- Click "Create Database" or "Browse Marketplace"
- Select "Neon" from the integrations
- Click "Connect" or "Add Integration"
- Authenticate with your Neon account
- Select your Neon project and branch
- Vercel will automatically create the
DATABASE_URLenvironment variable from Neon - You may also see additional Neon-related variables like:
POSTGRES_URLPOSTGRES_PRISMA_URLPOSTGRES_URL_NON_POOLING- Your application uses
DATABASE_URL, so ensure this is set correctly
- Enable pgvector extension in Neon:
- Go to Neon dashboard β SQL Editor
- Run:
CREATE EXTENSION IF NOT EXISTS vector; - Or use Neon's SQL editor to enable the extension
Option C: Using External Database (Manual Setup)
- In Vercel dashboard, go to Settings β Environment Variables
- Click "Add New"
- Key:
DATABASE_URL - Value: Your PostgreSQL connection string (e.g.,
postgresql://user:password@host:port/database) - Select environments: Production, Preview, Development (as needed)
- Click "Save"
Add Other Environment Variables:
- In Vercel dashboard, go to Settings β Environment Variables
- Add all required environment variables:
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEYCLERK_SECRET_KEYOPENAI_API_KEYUPLOADTHING_SECRETUPLOADTHING_APP_IDNODE_ENV=productionLANGCHAIN_TRACING_V2=true(optional, for LangSmith tracing)LANGCHAIN_API_KEY(optional, required ifLANGCHAIN_TRACING_V2=true)TAVILY_API_KEY(optional, for enhanced web search)DATALAB_API_KEY(optional, for OCR processing)NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL(optional)NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL(optional)NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL(optional)
-
Configure build settings
- Build Command:
pnpm build - Output Directory:
.next(default) - Install Command:
pnpm install
- Build Command:
-
Deploy
- Click "Deploy"
- Vercel will automatically deploy on every push to your main branch
Post-Deployment:
-
Enable pgvector Extension (Required)
- For Vercel Postgres: Connect to your database using Vercel's database connection tool or SQL editor in the Storage dashboard
- For Neon: Go to Neon dashboard β SQL Editor and run the command
- For External Database: Connect using your preferred PostgreSQL client
- Run:
CREATE EXTENSION IF NOT EXISTS vector;
-
Run Database Migrations
- After deployment, run migrations using one of these methods:
# Option 1: Using Vercel CLI locally vercel env pull .env.local pnpm db:migrate # Option 2: Using direct connection (set DATABASE_URL locally) DATABASE_URL="your_production_db_url" pnpm db:migrate # Option 3: Using Drizzle Studio with production URL DATABASE_URL="your_production_db_url" pnpm db:studio
- After deployment, run migrations using one of these methods:
-
Set up Clerk webhooks (if needed)
- Configure webhook URL in Clerk dashboard
- URL format:
https://your-domain.com/api/webhooks/clerk
-
Configure UploadThing
- Add your production domain to UploadThing allowed origins
- Configure CORS settings in UploadThing dashboard
Prerequisites:
- VPS with Node.js 18+ installed
- PostgreSQL database (with pgvector extension)
- Nginx (for reverse proxy)
- PM2 or similar process manager
Steps:
-
Clone and install dependencies
git clone <your-repo-url> cd pdr_ai_v2-2 pnpm install
-
Configure environment variables
# Create .env file nano .env # Add all production environment variables
-
Build the application
pnpm build
-
Set up PM2
# Install PM2 globally npm install -g pm2 # Start the application pm2 start pnpm --name "pdr-ai" -- start # Save PM2 configuration pm2 save pm2 startup
-
Configure Nginx
server { listen 80; server_name your-domain.com; location / { proxy_pass http://localhost:3000; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
-
Set up SSL with Let's Encrypt
sudo apt-get install certbot python3-certbot-nginx sudo certbot --nginx -d your-domain.com
-
Run database migrations
pnpm db:migrate
Important: Your production database must have the pgvector extension enabled:
-- Connect to your PostgreSQL database
CREATE EXTENSION IF NOT EXISTS vector;Database Connection:
For production, use a managed PostgreSQL service (recommended):
- Neon: Fully serverless PostgreSQL with pgvector support
- Supabase: PostgreSQL with pgvector extension
- AWS RDS: Managed PostgreSQL (requires manual pgvector installation)
- Railway: Simple PostgreSQL hosting
Example Neon connection string:
DATABASE_URL="postgresql://user:[email protected]/dbname?sslmode=require"
- Verify all environment variables are set correctly
- Database migrations have been run
- Clerk authentication is working
- File uploads are working (UploadThing)
- AI features are functioning (OpenAI API)
- Database has pgvector extension enabled
- SSL certificate is configured (if using custom domain)
- Monitoring and logging are set up
- Backup strategy is in place
- Error tracking is configured (e.g., Sentry)
Health Checks:
- Monitor application uptime
- Check database connection health
- Monitor API usage (OpenAI, UploadThing)
- Track error rates
Backup Strategy:
- Set up automated database backups
- Configure backup retention policy
- Test restore procedures regularly
Scaling Considerations:
- Database connection pooling (use PgBouncer or similar)
- CDN for static assets (Vercel handles this automatically)
- Rate limiting for API endpoints
- Caching strategy for frequently accessed data
# Database management
pnpm db:studio # Open Drizzle Studio (database GUI)
pnpm db:generate # Generate new migrations
pnpm db:migrate # Apply migrations
pnpm db:push # Push schema changes directly
# Code quality
pnpm lint # Run ESLint
pnpm lint:fix # Fix ESLint issues
pnpm typecheck # Run TypeScript type checking
pnpm format:write # Format code with Prettier
pnpm format:check # Check code formatting
# Development
pnpm check # Run linting and type checking
pnpm preview # Build and start production previewsrc/
βββ app/ # Next.js App Router
β βββ api/ # API routes
β β βββ predictive-document-analysis/ # Predictive analysis endpoints
β β β βββ route.ts # Main analysis API
β β β βββ agent.ts # AI analysis agent
β β βββ services/ # Backend services
β β β βββ ocrService.ts # OCR processing service
β β βββ uploadDocument/ # Document upload endpoint
β β βββ LangChain/ # AI chat functionality
β β βββ ... # Other API endpoints
β βββ employee/ # Employee dashboard pages
β βββ employer/ # Employer dashboard pages
β β βββ documents/ # Document viewer with predictive analysis
β β βββ upload/ # Document upload with OCR option
β βββ signup/ # Authentication pages
β βββ _components/ # Shared components
βββ server/
β βββ db/ # Database configuration and schema
βββ styles/ # CSS modules and global styles
βββ env.js # Environment validation
Key directories:
- `/employee` - Employee interface for document viewing and chat
- `/employer` - Employer interface for management and uploads
- `/api/predictive-document-analysis` - Core predictive analysis functionality
- `/api/services` - Reusable backend services (OCR, etc.)
- `/api/uploadDocument` - Document upload with OCR support
- `/api` - Backend API endpoints for all functionality
- `/server/db` - Database schema and configuration
POST /api/predictive-document-analysis- Analyze documents for missing content and recommendationsGET /api/fetchDocument- Retrieve document content for analysis
POST /api/uploadDocument- Upload documents for processing (supports OCR viaenableOCRparameter)- Standard path: Uses PDFLoader for digital PDFs
- OCR path: Uses Datalab Marker API for scanned documents
- Returns document metadata including OCR processing status
POST /api/LangChain- AI-powered document Q&AGET /api/Questions/fetch- Retrieve Q&A historyPOST /api/Questions/add- Add new questions
GET /api/fetchCompany- Get company documentsPOST /api/deleteDocument- Remove documentsGET /api/Categories/GetCategories- Get document categories
GET /api/metrics- Prometheus-compatible metrics stream (seedocs/observability.mdfor dashboard ideas)
- View assigned documents
- Chat with AI about documents
- Access document analysis and insights
- Pending approval flow for new employees
- Upload and manage documents
- Manage employee access and approvals
- View analytics and statistics
- Configure document categories
- Employee management dashboard
| Variable | Description | Required | Example |
|---|---|---|---|
DATABASE_URL |
PostgreSQL connection string. Format: postgresql://user:password@host:port/database |
β | postgresql://postgres:password@localhost:5432/pdr_ai_v2 |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
Clerk publishable key (client-side). Get from Clerk Dashboard | β | pk_test_... |
CLERK_SECRET_KEY |
Clerk secret key (server-side). Get from Clerk Dashboard | β | sk_test_... |
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL |
Force redirect URL after sign in. If not set, uses Clerk default. | β | https://your-domain.com/employer/home |
NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL |
Force redirect URL after sign up. If not set, uses Clerk default. | β | https://your-domain.com/signup |
NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL |
Force redirect URL after sign out. If not set, uses Clerk default. | β | https://your-domain.com/ |
OPENAI_API_KEY |
OpenAI API key for AI features (embeddings, chat, document analysis). Get from OpenAI Platform | β | sk-... |
LANGCHAIN_TRACING_V2 |
Enable LangSmith tracing for LangChain operations. Set to true to enable. Get API key from LangSmith |
β | true or false |
LANGCHAIN_API_KEY |
LangChain API key for LangSmith tracing and monitoring. Required if LANGCHAIN_TRACING_V2=true. Get from LangSmith |
β | lsv2_... |
TAVILY_API_KEY |
Tavily Search API key for enhanced web search in document analysis. Get from Tavily | β | tvly-... |
DATALAB_API_KEY |
Datalab Marker API key for advanced OCR processing of scanned documents. Get from Datalab | β | your_datalab_key |
UPLOADTHING_SECRET |
UploadThing secret key for file uploads. Get from UploadThing Dashboard | β | sk_live_... |
UPLOADTHING_APP_ID |
UploadThing application ID. Get from UploadThing Dashboard | β | your_app_id |
NODE_ENV |
Environment mode. Must be one of: development, test, production |
β | development |
SKIP_ENV_VALIDATION |
Skip environment validation during build (useful for Docker builds) | β | false or true |
- Authentication:
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY,CLERK_SECRET_KEY - Authentication Redirects:
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL,NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL,NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL - Database:
DATABASE_URL - AI Features:
OPENAI_API_KEY(used for embeddings, chat, and document analysis) - AI Observability:
LANGCHAIN_TRACING_V2,LANGCHAIN_API_KEY(for LangSmith tracing and monitoring) - Search Features:
TAVILY_API_KEY(for enhanced web search in document analysis) - OCR Processing:
DATALAB_API_KEY(for advanced OCR of scanned documents) - File Uploads:
UPLOADTHING_SECRET,UPLOADTHING_APP_ID - Build Configuration:
NODE_ENV,SKIP_ENV_VALIDATION
- Ensure Docker is running before starting the database
- Check if the database container is running:
docker ps - Restart the database:
docker restart pdr_ai_v2-postgres
- Verify all required environment variables are set
- Check
.envfile formatting (no spaces around=) - Ensure API keys are valid and have proper permissions
- Clear Next.js cache:
rm -rf .next - Reinstall dependencies:
rm -rf node_modules && pnpm install - Check TypeScript errors:
pnpm typecheck
- OCR checkbox not appearing: Verify
DATALAB_API_KEYis set in your.envfile - OCR processing timeout: Documents taking longer than 5 minutes will timeout; try with smaller documents first
- OCR processing failed: Check API key validity and Datalab service status
- Poor OCR quality: Enable
use_llm: trueoption in OCR configuration for AI-enhanced accuracy - Cost concerns: OCR uses Datalab API credits; use only for scanned/image-based documents
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Run tests and linting:
pnpm check - Commit your changes:
git commit -m 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
This project is private and proprietary.
For support or questions, contact the development team or create an issue in the repository.