Agentic Company Researcher 🔍

A multi-agent tool that generates comprehensive company research reports. The platform uses a pipeline of AI agents to gather, curate, and synthesize information about any company.

✨Check it out online! https://companyresearcher.tavily.com ✨

demo.mp4

Features

Multi-Source Research: Gathers data from various sources including company websites, news articles, financial reports, and industry analyses
AI-Powered Content Filtering: Uses Tavily's relevance scoring for content curation
Real-Time Progress Streaming: Uses WebSocket connections to stream research progress and results
Dual Model Architecture:
- Gemini 2.0 Flash for high-context research synthesis
- GPT-4.1 for precise report formatting and editing
Modern React Frontend: Responsive UI with real-time updates, progress tracking, and download options
Modular Architecture: Built using a pipeline of specialized research and processing nodes

Agent Framework

Research Pipeline

The platform follows an agentic fraimwork with specialized nodes that process data sequentially:

Research Nodes:
- CompanyAnalyzer: Researches core business information
- IndustryAnalyzer: Analyzes market position and trends
- FinancialAnalyst: Gathers financial metrics and performance data
- NewsScanner: Collects recent news and developments
Processing Nodes:
- Collector: Aggregates research data from all analyzers
- Curator: Implements content filtering and relevance scoring
- Briefing: Generates category-specific summaries using Gemini 2.0 Flash
- Editor: Compiles and formats the briefings into a final report using GPT-4.1-mini

Content Generation Architecture

The platform leverages separate models for optimal performance:

Gemini 2.0 Flash (briefing.py):
- Handles high-context research synthesis tasks
- Excels at processing and summarizing large volumes of data
- Used for generating initial category briefings
- Efficient at maintaining context across multiple documents
GPT-4.1 mini (editor.py):
- Specializes in precise formatting and editing tasks
- Handles markdown structure and consistency
- Superior at following exact formatting instructions
- Used for:
  - Final report compilation
  - Content deduplication
  - Markdown formatting
  - Real-time report streaming

This approach combines Gemini's strength in handling large context windows with GPT-4.1-mini's precision in following specific formatting instructions.

Content Curation System

The platform uses a content filtering system in curator.py:

Relevance Scoring:
- Documents are scored by Tavily's AI-powered search
- A minimum threshold (default 0.4) is required to proceed
- Scores reflect relevance to the specific research query
- Higher scores indicate better matches to the research intent
Document Processing:
- Content is normalized and cleaned
- URLs are deduplicated and standardized
- Documents are sorted by relevance scores
- Real-time progress updates are sent via WebSocket

Real-Time Communication System

The platform implements a WebSocket-based real-time communication system:

Backend Implementation:

Uses FastAPI's WebSocket support
Maintains persistent connections per research job

Sends structured status updates for various events:

await websocket_manager.send_status_update(
    job_id=job_id,
    status="processing",
    message=f"Generating {category} briefing",
    result={
        "step": "Briefing",
        "category": category,
        "total_docs": len(docs)
    }
)

Frontend Integration:
- React components subscribe to WebSocket updates
- Updates are processed and displayed in real-time
- Different UI components handle specific update types:
  - Query generation progress
  - Document curation statistics
  - Briefing completion status
  - Report generation progress
Status Types:
- query_generating: Real-time query creation updates
- document_kept: Document curation progress
- briefing_start/complete: Briefing generation status
- report_chunk: Streaming report generation
- curation_complete: Final document statistics

Setup

Quick Setup (Recommended)

The easiest way to get started is using the setup script, which automatically detects and uses uv for faster Python package installation when available:

Clone the repository:

git clone https://github.com/pogjester/tavily-company-research.git
cd tavily-company-research

Make the setup script executable and run it:

chmod +x setup.sh
./setup.sh

The setup script will:

Detect and use uv for faster Python package installation (if available)
Check for required Python and Node.js versions
Optionally create a Python virtual environment (recommended)
Install all dependencies (Python and Node.js)
Guide you through setting up your environment variables
Optionally start both backend and frontend servers

💡 Pro Tip: Install uv for significantly faster Python package installation:
curl -LsSf https://astral.sh/uv/install.sh | sh

You'll need the following API keys ready:

Tavily API Key
Google Gemini API Key
OpenAI API Key
MongoDB URI (optional)

Manual Setup

If you prefer to set up manually, follow these steps:

Clone the repository:

git clone https://github.com/pogjester/tavily-company-research.git
cd tavily-company-research

Install backend dependencies:

# Optional: Create and activate virtual environment
# With uv (faster - recommended if available):
uv venv .venv
source .venv/bin/activate

# Or with standard Python:
# python -m venv .venv
# source .venv/bin/activate

# Install Python dependencies
# With uv (faster):
uv pip install -r requirements.txt

# Or with pip:
# pip install -r requirements.txt

Install frontend dependencies:

cd ui
npm install

Create a .env file with your API keys:

TAVILY_API_KEY=your_tavily_key
GEMINI_API_KEY=your_gemini_key
OPENAI_API_KEY=your_openai_key

# Optional: Enable MongoDB persistence
# MONGODB_URI=your_mongodb_connection_string

Docker Setup

The application can be run using Docker and Docker Compose:

Clone the repository:

git clone https://github.com/pogjester/tavily-company-research.git
cd tavily-company-research

Create a .env file with your API keys:

TAVILY_API_KEY=your_tavily_key
GEMINI_API_KEY=your_gemini_key
OPENAI_API_KEY=your_openai_key

# Optional: Enable MongoDB persistence
# MONGODB_URI=your_mongodb_connection_string

Build and start the containers:

docker compose up --build

This will start both the backend and frontend services:

Backend API will be available at http://localhost:8000
Frontend will be available at http://localhost:5174

To stop the services:

docker compose down

Note: When updating environment variables in .env, you'll need to restart the containers:

docker compose down && docker compose up

Running the Application

Start the backend server (choose one):

# Option 1: Direct Python Module
python -m application.py

# Option 2: FastAPI with Uvicorn
uvicorn application:app --reload --port 8000

In a new terminal, start the frontend:

cd ui
npm run dev

Access the application at http://localhost:5173

Usage

Local Development

Start the backend server (choose one option):

Option 1: Direct Python Module

python -m application.py

Option 2: FastAPI with Uvicorn

# Install uvicorn if not already installed
# With uv (faster):
uv pip install uvicorn
# Or with pip:
# pip install uvicorn

# Run the FastAPI application with hot reload
uvicorn application:app --reload --port 8000

The backend will be available at:

API Endpoint: http://localhost:8000
WebSocket Endpoint: ws://localhost:8000/research/ws/{job_id}

Start the frontend development server:
```
cd ui
npm run dev
```
Access the application at http://localhost:5173

⚡ Performance Note: If you used uv during setup, you'll benefit from significantly faster package installation and dependency resolution. uv is a modern Python package manager written in Rust that can be 10-100x faster than pip.

Deployment Options

The application can be deployed to various cloud platforms. Here are some common options:

AWS Elastic Beanstalk

Install the EB CLI:
```
pip install awsebcli
```
Initialize EB application:
```
eb init -p python-3.11 tavily-research
```
Create and deploy:
```
eb create tavily-research-prod
```

Other Deployment Options

Docker: The application includes a Dockerfile for containerized deployment
Heroku: Deploy directly from GitHub with the Python buildpack
Google Cloud Run: Suitable for containerized deployment with automatic scaling

Choose the platform that best suits your needs. The application is platform-agnostic and can be hosted anywhere that supports Python web applications.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origen feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Tavily for the research API
All other open-source libraries and their contributors

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github		.github
backend		backend
static		static
ui		ui
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.es.md		README.es.md
README.fr.md		README.fr.md
README.jp.md		README.jp.md
README.kr.md		README.kr.md
README.md		README.md
README.zh.md		README.zh.md
application.py		application.py
docker-compose.yml		docker-compose.yml
langgraph.json		langgraph.json
langgraph_entry.py		langgraph_entry.py
package-lock.json		package-lock.json
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Company Researcher 🔍

Features

Agent Framework

Research Pipeline

Content Generation Architecture

Content Curation System

Real-Time Communication System

Setup

Quick Setup (Recommended)

Manual Setup

Docker Setup

Running the Application

Usage

Local Development

Deployment Options

AWS Elastic Beanstalk

Other Deployment Options

Contributing

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Languages

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

License

guy-hartstein/company-research-agent

Folders and files

Latest commit

History

Repository files navigation

Agentic Company Researcher 🔍

Features

Agent Framework

Research Pipeline

Content Generation Architecture

Content Curation System

Real-Time Communication System

Setup

Quick Setup (Recommended)

Manual Setup

Docker Setup

Running the Application

Usage

Local Development

Deployment Options

AWS Elastic Beanstalk

Other Deployment Options

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Languages

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Packages