Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -240,4 +240,44 @@ litellm_uuid.txt
.aider*
file.txt
numbers.txt
poetry.lock
poetry.lock

# Additional ignores for virtual environments and cache files
open-interpreter-env/
**/__pycache__/
**/*.pyc
**/*.pyo
**/*.pyd
**/.pytest_cache/
**/.mypy_cache/
**/.coverage
**/coverage.xml
**/htmlcov/
**/venv/
**/.venv/
**/env/
**/.env
**/node_modules/
**/.DS_Store
**/Thumbs.db
**/.ipynb_checkpoints

# Additional ignores for virtual environments and cache files
open-interpreter-env/
**/__pycache__/
**/*.pyc
**/*.pyo
**/*.pyd
**/.pytest_cache/
**/.mypy_cache/
**/.coverage
**/coverage.xml
**/htmlcov/
**/venv/
**/.venv/
**/env/
**/.env
**/node_modules/
**/.DS_Store
**/Thumbs.db
**/.ipynb_checkpoints
51 changes: 51 additions & 0 deletions Dockerfile.ui-tars
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
###########################################################################################
# Dockerfile for Open Interpreter with UI-TARS support for enhanced browser control #
# This Dockerfile includes all dependencies needed for UI-TARS model integration #
###########################################################################################

FROM nvidia/cuda:11.8-devel-ubuntu22.04

# Set environment variables
ENV HOST 0.0.0.0
ENV DEBIAN_FRONTEND=noninteractive
# ^ Sets the server host to 0.0.0.0, Required for the server to be accessible outside the container

# Install system dependencies
RUN apt-get update && apt-get install -y \
python3.11 \
python3.11-pip \
python3.11-venv \
curl \
git \
wget \
&& rm -rf /var/lib/apt/lists/*

# Create symbolic links for python and pip
RUN ln -s /usr/bin/python3.11 /usr/bin/python
RUN ln -s /usr/bin/pip3 /usr/bin/pip

# Upgrade pip
RUN pip install --upgrade pip

# Set working directory
WORKDIR /app

# Copy required files into container
COPY . .

# Install all dependencies including UI-TARS requirements
# Note: This will require significant disk space (~10-15GB) for the ML dependencies
RUN pip install ".[ui-tars,server,local]"

# Download UI-TARS model during build time (optional, can be done at runtime)
# This will significantly increase the image size but reduce first-run time
# RUN python -c "from interpreter.core.computer.vision.ui_tars.ui_tars_vision import UiTarsVision; \
# computer = type('Computer', (), {'debug': True})(); \
# ui_tars = UiTarsVision(computer); \
# ui_tars.load()"

# Expose port 8000
EXPOSE 8000

# Start the server
ENTRYPOINT ["interpreter", "--server"]
83 changes: 83 additions & 0 deletions POML_INTEGRATION_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# POML Integration Summary

This document summarizes the successful integration of Microsoft's Prompt Orchestration Markup Language (POML) into Open Interpreter.

## Integration Overview

The POML Python SDK has been successfully integrated into Open Interpreter, providing enhanced agent capabilities through structured prompting.

## Changes Made

### 1. Dependency Management
- Added `poml = "^0.0.8"` to [tool.poetry.dependencies] in `pyproject.toml`
- POML is now a standard dependency included in all deployments

### 2. Core Integration
- Created `interpreter/core/computer/poml.py` - New POML module with:
- Template creation and rendering capabilities
- Agent prompt generation functions
- Availability checking and error handling
- Modified `interpreter/core/computer/computer.py` to include POML module
- Updated `interpreter/core/computer/__init__.py` to export POML

### 3. Documentation
- Created `docs/agents.md` - Comprehensive documentation on POML integration
- Updated `docs/README.md` to reference the new agents documentation
- Updated `docs/mint.json` to include agents documentation in navigation
- Updated main `README.md` to highlight POML capabilities

### 4. Examples and Testing
- Created `examples/poml_agent_example.py` - Demonstrates POML usage patterns
- Created `test_poml_integration.py` - Verification script for integration

## Key Features

### POML Module API
The new POML module provides the following methods:
- `is_available()` - Check if POML is properly installed
- `create_template(name, content, **kwargs)` - Create POML templates
- `render_template(template, data)` - Render templates with data
- `create_agent_prompt(agent_type, objective, context, instructions)` - Generate structured agent prompts

### Usage Example
```python
from interpreter import interpreter

# Create structured prompts with POML
prompt = interpreter.computer.poml.create_agent_prompt(
agent_type="data_analyst",
objective="Analyze sales data",
context={"period": "Q1 2024", "region": "North America"},
instructions=["Load sales data", "Identify trends", "Generate insights"]
)

interpreter.chat(prompt)
```

## Benefits

1. **Structured Prompting**: Organize complex prompts with clear sections and hierarchy
2. **Data Handling**: Efficiently manage and process data within prompts
3. **Templating Engine**: Create reusable prompt templates with variables and logic
4. **Agent Orchestration**: Better coordination of complex multi-step agent workflows
5. **Maintainability**: Easier to maintain and update prompt structures

## Verification

The integration has been verified to work correctly:
- POML module loads successfully as part of the Computer API
- Template creation and rendering functions work as expected
- Agent prompt generation produces properly structured prompts
- All new files follow Open Interpreter coding standards

## Future Enhancements

Planned improvements include:
1. Agent-specific POML template libraries
2. Visual prompt builder tools
3. Template marketplace for community sharing
4. Advanced analytics for prompt performance optimization

## Conclusion

The POML integration successfully enhances Open Interpreter's agent capabilities by providing structured prompting mechanisms. This integration is now ready for use in all deployment scenarios and provides developers with powerful tools for creating sophisticated agent workflows.
Loading
Loading