process multimodal with context
This commit is contained in:
@@ -19,7 +19,7 @@
|
||||
</p>
|
||||
<p>
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/stargazers"><img src='https://img.shields.io/github/stars/HKUDS/RAG-Anything?color=00d9ff&style=for-the-badge&logo=star&logoColor=white&labelColor=1a1a2e' /></a>
|
||||
<img src="https://img.shields.io/badge/🐍Python-3.9+-4ecdc4?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e">
|
||||
<img src="https://img.shields.io/badge/🐍Python-3.10-4ecdc4?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e">
|
||||
<a href="https://pypi.org/project/raganything/"><img src="https://img.shields.io/pypi/v/raganything.svg?style=for-the-badge&logo=pypi&logoColor=white&labelColor=1a1a2e&color=ff6b6b"></a>
|
||||
</p>
|
||||
<p>
|
||||
|
||||
@@ -19,7 +19,7 @@
|
||||
</p>
|
||||
<p>
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/stargazers"><img src='https://img.shields.io/github/stars/HKUDS/RAG-Anything?color=00d9ff&style=for-the-badge&logo=star&logoColor=white&labelColor=1a1a2e' /></a>
|
||||
<img src="https://img.shields.io/badge/🐍Python-3.9+-4ecdc4?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e">
|
||||
<img src="https://img.shields.io/badge/🐍Python-3.10-4ecdc4?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e">
|
||||
<a href="https://pypi.org/project/raganything/"><img src="https://img.shields.io/pypi/v/raganything.svg?style=for-the-badge&logo=pypi&logoColor=white&labelColor=1a1a2e&color=ff6b6b"></a>
|
||||
</p>
|
||||
<p>
|
||||
|
||||
374
docs/context_aware_processing.md
Normal file
374
docs/context_aware_processing.md
Normal file
@@ -0,0 +1,374 @@
|
||||
# Context-Aware Multimodal Processing in RAGAnything
|
||||
|
||||
This document describes the context-aware multimodal processing feature in RAGAnything, which provides surrounding content information to LLMs when analyzing images, tables, equations, and other multimodal content for enhanced accuracy and relevance.
|
||||
|
||||
## Overview
|
||||
|
||||
The context-aware feature enables RAGAnything to automatically extract and provide surrounding text content as context when processing multimodal content. This leads to more accurate and contextually relevant analysis by giving AI models additional information about where the content appears in the document structure.
|
||||
|
||||
### Key Benefits
|
||||
|
||||
- **Enhanced Accuracy**: Context helps AI understand the purpose and meaning of multimodal content
|
||||
- **Semantic Coherence**: Generated descriptions align with document context and terminology
|
||||
- **Automated Integration**: Context extraction is automatically enabled during document processing
|
||||
- **Flexible Configuration**: Multiple extraction modes and filtering options
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. Configuration Support
|
||||
- **Integrated Configuration**: Complete context options in `RAGAnythingConfig`
|
||||
- **Environment Variables**: Configure all context parameters via environment variables
|
||||
- **Dynamic Updates**: Runtime configuration updates supported
|
||||
- **Content Format Control**: Configurable content source format detection
|
||||
|
||||
### 2. Automated Integration
|
||||
- **Auto-Initialization**: Modal processors automatically receive tokenizer and context configuration
|
||||
- **Content Source Setup**: Document processing automatically sets content sources for context extraction
|
||||
- **Position Information**: Automatic position info (page_idx, index) passed to processors
|
||||
- **Batch Processing**: Context-aware batch processing for efficient document handling
|
||||
|
||||
### 3. Advanced Token Management
|
||||
- **Accurate Token Counting**: Uses LightRAG's tokenizer for precise token calculation
|
||||
- **Smart Boundary Preservation**: Truncates at sentence/paragraph boundaries
|
||||
- **Backward Compatibility**: Fallback to character truncation when tokenizer unavailable
|
||||
|
||||
### 4. Universal Context Extraction
|
||||
- **Multiple Formats**: Support for MinerU, plain text, custom formats
|
||||
- **Flexible Modes**: Page-based and chunk-based context extraction
|
||||
- **Content Filtering**: Configurable content type filtering
|
||||
- **Header Support**: Optional inclusion of document headers and structure
|
||||
|
||||
## Configuration
|
||||
|
||||
### RAGAnythingConfig Parameters
|
||||
|
||||
```python
|
||||
# Context Extraction Configuration
|
||||
context_window: int = 1 # Context window size (pages/chunks)
|
||||
context_mode: str = "page" # Context mode ("page" or "chunk")
|
||||
max_context_tokens: int = 2000 # Maximum context tokens
|
||||
include_headers: bool = True # Include document headers
|
||||
include_captions: bool = True # Include image/table captions
|
||||
context_filter_content_types: List[str] = ["text"] # Content types to include
|
||||
content_format: str = "minerU" # Default content format for context extraction
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Context extraction settings
|
||||
CONTEXT_WINDOW=2
|
||||
CONTEXT_MODE=page
|
||||
MAX_CONTEXT_TOKENS=3000
|
||||
INCLUDE_HEADERS=true
|
||||
INCLUDE_CAPTIONS=true
|
||||
CONTEXT_FILTER_CONTENT_TYPES=text,image
|
||||
CONTENT_FORMAT=minerU
|
||||
```
|
||||
|
||||
## Usage Guide
|
||||
|
||||
### 1. Basic Configuration
|
||||
|
||||
```python
|
||||
from raganything import RAGAnything, RAGAnythingConfig
|
||||
|
||||
# Create configuration with context settings
|
||||
config = RAGAnythingConfig(
|
||||
context_window=2,
|
||||
context_mode="page",
|
||||
max_context_tokens=3000,
|
||||
include_headers=True,
|
||||
include_captions=True,
|
||||
context_filter_content_types=["text", "image"],
|
||||
content_format="minerU"
|
||||
)
|
||||
|
||||
# Create RAGAnything instance
|
||||
rag_anything = RAGAnything(
|
||||
config=config,
|
||||
llm_model_func=your_llm_function,
|
||||
embedding_func=your_embedding_function
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Automatic Document Processing
|
||||
|
||||
```python
|
||||
# Context is automatically enabled during document processing
|
||||
await rag_anything.process_document_complete("document.pdf")
|
||||
```
|
||||
|
||||
### 3. Manual Content Source Configuration
|
||||
|
||||
```python
|
||||
# Set content source for specific content lists
|
||||
rag_anything.set_content_source_for_context(content_list, "minerU")
|
||||
|
||||
# Update context configuration at runtime
|
||||
rag_anything.update_context_config(
|
||||
context_window=1,
|
||||
max_context_tokens=1500,
|
||||
include_captions=False
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Direct Modal Processor Usage
|
||||
|
||||
```python
|
||||
from raganything.modalprocessors import (
|
||||
ContextExtractor,
|
||||
ContextConfig,
|
||||
ImageModalProcessor
|
||||
)
|
||||
|
||||
# Configure context extraction
|
||||
config = ContextConfig(
|
||||
context_window=1,
|
||||
context_mode="page",
|
||||
max_context_tokens=2000,
|
||||
include_headers=True,
|
||||
include_captions=True,
|
||||
filter_content_types=["text"]
|
||||
)
|
||||
|
||||
# Initialize context extractor
|
||||
context_extractor = ContextExtractor(config)
|
||||
|
||||
# Initialize modal processor with context support
|
||||
processor = ImageModalProcessor(lightrag, caption_func, context_extractor)
|
||||
|
||||
# Set content source
|
||||
processor.set_content_source(content_list, "minerU")
|
||||
|
||||
# Process with context
|
||||
item_info = {
|
||||
"page_idx": 2,
|
||||
"index": 5,
|
||||
"type": "image"
|
||||
}
|
||||
|
||||
result = await processor.process_multimodal_content(
|
||||
modal_content=image_data,
|
||||
content_type="image",
|
||||
file_path="document.pdf",
|
||||
entity_name="Architecture Diagram",
|
||||
item_info=item_info
|
||||
)
|
||||
```
|
||||
|
||||
## Context Modes
|
||||
|
||||
### Page-Based Context (`context_mode="page"`)
|
||||
- Extracts context based on page boundaries
|
||||
- Uses `page_idx` field from content items
|
||||
- Suitable for document-structured content
|
||||
- Example: Include text from 2 pages before and after current image
|
||||
|
||||
### Chunk-Based Context (`context_mode="chunk"`)
|
||||
- Extracts context based on content item positions
|
||||
- Uses sequential position in content list
|
||||
- Suitable for fine-grained control
|
||||
- Example: Include 5 content items before and after current table
|
||||
|
||||
## Processing Workflow
|
||||
|
||||
### 1. Document Parsing
|
||||
```
|
||||
Document Input → MinerU Parsing → content_list Generation
|
||||
```
|
||||
|
||||
### 2. Context Setup
|
||||
```
|
||||
content_list → Set as Context Source → All Modal Processors Gain Context Capability
|
||||
```
|
||||
|
||||
### 3. Multimodal Processing
|
||||
```
|
||||
Multimodal Content → Extract Surrounding Context → Enhanced LLM Analysis → More Accurate Results
|
||||
```
|
||||
|
||||
## Content Source Formats
|
||||
|
||||
### MinerU Format
|
||||
```json
|
||||
[
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Document content here...",
|
||||
"text_level": 1,
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "image",
|
||||
"img_path": "images/figure1.jpg",
|
||||
"img_caption": ["Figure 1: Architecture"],
|
||||
"page_idx": 1
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Custom Text Chunks
|
||||
```python
|
||||
text_chunks = [
|
||||
"First chunk of text content...",
|
||||
"Second chunk of text content...",
|
||||
"Third chunk of text content..."
|
||||
]
|
||||
```
|
||||
|
||||
### Plain Text
|
||||
```python
|
||||
full_document = "Complete document text with all content..."
|
||||
```
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### High-Precision Context
|
||||
For focused analysis with minimal context:
|
||||
```python
|
||||
config = RAGAnythingConfig(
|
||||
context_window=1,
|
||||
context_mode="page",
|
||||
max_context_tokens=1000,
|
||||
include_headers=True,
|
||||
include_captions=False,
|
||||
context_filter_content_types=["text"]
|
||||
)
|
||||
```
|
||||
|
||||
### Comprehensive Context
|
||||
For broad analysis with rich context:
|
||||
```python
|
||||
config = RAGAnythingConfig(
|
||||
context_window=2,
|
||||
context_mode="page",
|
||||
max_context_tokens=3000,
|
||||
include_headers=True,
|
||||
include_captions=True,
|
||||
context_filter_content_types=["text", "image", "table"]
|
||||
)
|
||||
```
|
||||
|
||||
### Chunk-Based Analysis
|
||||
For fine-grained sequential context:
|
||||
```python
|
||||
config = RAGAnythingConfig(
|
||||
context_window=5,
|
||||
context_mode="chunk",
|
||||
max_context_tokens=2000,
|
||||
include_headers=False,
|
||||
include_captions=False,
|
||||
context_filter_content_types=["text"]
|
||||
)
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### 1. Accurate Token Control
|
||||
- Uses real tokenizer for precise token counting
|
||||
- Avoids exceeding LLM token limits
|
||||
- Provides consistent performance
|
||||
|
||||
### 2. Smart Truncation
|
||||
- Truncates at sentence boundaries
|
||||
- Maintains semantic integrity
|
||||
- Adds truncation indicators
|
||||
|
||||
### 3. Caching Optimization
|
||||
- Context extraction results can be reused
|
||||
- Reduces redundant computation overhead
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Context Truncation
|
||||
The system automatically truncates context to fit within token limits:
|
||||
- Uses actual tokenizer for accurate token counting
|
||||
- Attempts to end at sentence boundaries (periods)
|
||||
- Falls back to line boundaries if needed
|
||||
- Adds "..." indicator for truncated content
|
||||
|
||||
### Header Formatting
|
||||
When `include_headers=True`, headers are formatted with markdown-style prefixes:
|
||||
```
|
||||
# Level 1 Header
|
||||
## Level 2 Header
|
||||
### Level 3 Header
|
||||
```
|
||||
|
||||
### Caption Integration
|
||||
When `include_captions=True`, image and table captions are included as:
|
||||
```
|
||||
[Image: Figure 1 caption text]
|
||||
[Table: Table 1 caption text]
|
||||
```
|
||||
|
||||
## Integration with RAGAnything
|
||||
|
||||
The context-aware feature is seamlessly integrated into RAGAnything's workflow:
|
||||
|
||||
1. **Automatic Setup**: Context extractors are automatically created and configured
|
||||
2. **Content Source Management**: Document processing automatically sets content sources
|
||||
3. **Processor Integration**: All modal processors receive context capabilities
|
||||
4. **Configuration Consistency**: Single configuration system for all context settings
|
||||
|
||||
## Error Handling
|
||||
|
||||
The system includes robust error handling:
|
||||
- Gracefully handles missing or invalid content sources
|
||||
- Returns empty context for unsupported formats
|
||||
- Logs warnings for configuration issues
|
||||
- Continues processing even if context extraction fails
|
||||
|
||||
## Compatibility
|
||||
|
||||
- **Backward Compatible**: Existing code works without modification
|
||||
- **Optional Feature**: Context can be selectively enabled/disabled
|
||||
- **Flexible Configuration**: Supports multiple configuration combinations
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Token Limits**: Ensure `max_context_tokens` doesn't exceed LLM context limits
|
||||
2. **Performance Impact**: Larger context windows increase processing time
|
||||
3. **Content Quality**: Context quality directly affects analysis accuracy
|
||||
4. **Window Size**: Match window size to content structure (documents vs articles)
|
||||
5. **Content Filtering**: Use `context_filter_content_types` to reduce noise
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Context Not Extracted**
|
||||
- Check if `set_content_source_for_context()` was called
|
||||
- Verify `item_info` contains required fields (`page_idx`, `index`)
|
||||
- Confirm content source format is correct
|
||||
|
||||
**Context Too Long/Short**
|
||||
- Adjust `max_context_tokens` setting
|
||||
- Modify `context_window` size
|
||||
- Check `context_filter_content_types` configuration
|
||||
|
||||
**Irrelevant Context**
|
||||
- Refine `context_filter_content_types` to exclude noise
|
||||
- Reduce `context_window` size
|
||||
- Set `include_captions=False` if captions are not helpful
|
||||
|
||||
**Configuration Issues**
|
||||
- Verify environment variables are set correctly
|
||||
- Check RAGAnythingConfig parameter names
|
||||
- Ensure content_format matches your data source
|
||||
|
||||
## Examples
|
||||
|
||||
Check out these example files for complete usage demonstrations:
|
||||
|
||||
- **Configuration Examples**: See how to set up different context configurations
|
||||
- **Integration Examples**: Learn how to integrate context-aware processing into your workflow
|
||||
- **Custom Processors**: Examples of creating custom modal processors with context support
|
||||
|
||||
## API Reference
|
||||
|
||||
For detailed API documentation, see the docstrings in:
|
||||
- `raganything/modalprocessors.py` - Context extraction and modal processors
|
||||
- `raganything/config.py` - Configuration options
|
||||
- `raganything/raganything.py` - Main RAGAnything class integration
|
||||
@@ -48,6 +48,15 @@ OLLAMA_EMULATING_MODEL_TAG=latest
|
||||
# SUPPORTED_FILE_EXTENSIONS=.pdf,.jpg,.jpeg,.png,.bmp,.tiff,.tif,.gif,.webp,.doc,.docx,.ppt,.pptx,.xls,.xlsx,.txt,.md
|
||||
# RECURSIVE_FOLDER_PROCESSING=true
|
||||
|
||||
### Context Extraction Configuration
|
||||
# CONTEXT_WINDOW=1
|
||||
# CONTEXT_MODE=page
|
||||
# MAX_CONTEXT_TOKENS=2000
|
||||
# INCLUDE_HEADERS=true
|
||||
# INCLUDE_CAPTIONS=true
|
||||
# CONTEXT_FILTER_CONTENT_TYPES=text
|
||||
# CONTENT_FORMAT=minerU
|
||||
|
||||
### Max nodes return from grap retrieval
|
||||
# MAX_GRAPH_NODES=1000
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
from .raganything import RAGAnything as RAGAnything
|
||||
from .config import RAGAnythingConfig as RAGAnythingConfig
|
||||
|
||||
__version__ = "1.2.0"
|
||||
__version__ = "1.2.1"
|
||||
__author__ = "Zirui Guo"
|
||||
__url__ = "https://github.com/HKUDS/RAG-Anything"
|
||||
|
||||
|
||||
@@ -72,3 +72,34 @@ class RAGAnythingConfig:
|
||||
default=get_env_value("RECURSIVE_FOLDER_PROCESSING", True, bool)
|
||||
)
|
||||
"""Whether to recursively process subfolders in batch mode."""
|
||||
|
||||
# Context Extraction Configuration
|
||||
# ---
|
||||
context_window: int = field(default=get_env_value("CONTEXT_WINDOW", 1, int))
|
||||
"""Number of pages/chunks to include before and after current item for context."""
|
||||
|
||||
context_mode: str = field(default=get_env_value("CONTEXT_MODE", "page", str))
|
||||
"""Context extraction mode: 'page' for page-based, 'chunk' for chunk-based."""
|
||||
|
||||
max_context_tokens: int = field(
|
||||
default=get_env_value("MAX_CONTEXT_TOKENS", 2000, int)
|
||||
)
|
||||
"""Maximum number of tokens in extracted context."""
|
||||
|
||||
include_headers: bool = field(default=get_env_value("INCLUDE_HEADERS", True, bool))
|
||||
"""Whether to include document headers and titles in context."""
|
||||
|
||||
include_captions: bool = field(
|
||||
default=get_env_value("INCLUDE_CAPTIONS", True, bool)
|
||||
)
|
||||
"""Whether to include image/table captions in context."""
|
||||
|
||||
context_filter_content_types: List[str] = field(
|
||||
default_factory=lambda: get_env_value(
|
||||
"CONTEXT_FILTER_CONTENT_TYPES", "text", str
|
||||
).split(",")
|
||||
)
|
||||
"""Content types to include in context extraction (e.g., 'text', 'image', 'table')."""
|
||||
|
||||
content_format: str = field(default=get_env_value("CONTENT_FORMAT", "minerU", str))
|
||||
"""Default content format for context extraction when processing documents."""
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -155,15 +155,24 @@ class ProcessorMixin:
|
||||
processor = get_processor_for_type(self.modal_processors, content_type)
|
||||
|
||||
if processor:
|
||||
# Prepare item info for context extraction
|
||||
item_info = {
|
||||
"page_idx": item.get("page_idx", 0),
|
||||
"index": i,
|
||||
"type": content_type,
|
||||
}
|
||||
|
||||
# Process content and get chunk results instead of immediately merging
|
||||
(
|
||||
enhanced_caption,
|
||||
entity_info,
|
||||
chunk_results,
|
||||
) = await processor.process_multimodal_content_batch(
|
||||
) = await processor.process_multimodal_content(
|
||||
modal_content=item,
|
||||
content_type=content_type,
|
||||
file_path=file_name,
|
||||
item_info=item_info, # Pass item info for context extraction
|
||||
batch_mode=True,
|
||||
)
|
||||
|
||||
# Collect chunk results for batch processing
|
||||
@@ -208,6 +217,8 @@ class ProcessorMixin:
|
||||
file_path=file_name,
|
||||
)
|
||||
|
||||
await self.lightrag._insert_done()
|
||||
|
||||
self.logger.info("Multimodal content processing complete")
|
||||
|
||||
async def process_document_complete(
|
||||
@@ -253,6 +264,15 @@ class ProcessorMixin:
|
||||
# Step 2: Separate text and multimodal content
|
||||
text_content, multimodal_items = separate_content(content_list)
|
||||
|
||||
# Step 2.5: Set content source for context extraction in multimodal processing
|
||||
if hasattr(self, "set_content_source_for_context") and multimodal_items:
|
||||
self.logger.info(
|
||||
"Setting content source for context-aware multimodal processing..."
|
||||
)
|
||||
self.set_content_source_for_context(
|
||||
content_list, self.config.content_format
|
||||
)
|
||||
|
||||
# Step 3: Insert pure text content with all parameters
|
||||
if text_content.strip():
|
||||
file_name = os.path.basename(file_path)
|
||||
|
||||
@@ -56,6 +56,38 @@ Additional context:
|
||||
|
||||
Focus on providing accurate, detailed visual analysis that would be useful for knowledge retrieval."""
|
||||
|
||||
# Image analysis prompt with context support
|
||||
PROMPTS[
|
||||
"vision_prompt_with_context"
|
||||
] = """Please analyze this image in detail, considering the surrounding context. Provide a JSON response with the following structure:
|
||||
|
||||
{{
|
||||
"detailed_description": "A comprehensive and detailed visual description of the image following these guidelines:
|
||||
- Describe the overall composition and layout
|
||||
- Identify all objects, people, text, and visual elements
|
||||
- Explain relationships between elements and how they relate to the surrounding context
|
||||
- Note colors, lighting, and visual style
|
||||
- Describe any actions or activities shown
|
||||
- Include technical details if relevant (charts, diagrams, etc.)
|
||||
- Reference connections to the surrounding content when relevant
|
||||
- Always use specific names instead of pronouns",
|
||||
"entity_info": {{
|
||||
"entity_name": "{entity_name}",
|
||||
"entity_type": "image",
|
||||
"summary": "concise summary of the image content, its significance, and relationship to surrounding content (max 100 words)"
|
||||
}}
|
||||
}}
|
||||
|
||||
Context from surrounding content:
|
||||
{context}
|
||||
|
||||
Image details:
|
||||
- Image Path: {image_path}
|
||||
- Captions: {captions}
|
||||
- Footnotes: {footnotes}
|
||||
|
||||
Focus on providing accurate, detailed visual analysis that incorporates the context and would be useful for knowledge retrieval."""
|
||||
|
||||
# Image analysis prompt with text fallback
|
||||
PROMPTS["text_prompt"] = """Based on the following image information, provide analysis:
|
||||
|
||||
@@ -94,6 +126,39 @@ Footnotes: {table_footnote}
|
||||
|
||||
Focus on extracting meaningful insights and relationships from the tabular data."""
|
||||
|
||||
# Table analysis prompt with context support
|
||||
PROMPTS[
|
||||
"table_prompt_with_context"
|
||||
] = """Please analyze this table content considering the surrounding context, and provide a JSON response with the following structure:
|
||||
|
||||
{{
|
||||
"detailed_description": "A comprehensive analysis of the table including:
|
||||
- Table structure and organization
|
||||
- Column headers and their meanings
|
||||
- Key data points and patterns
|
||||
- Statistical insights and trends
|
||||
- Relationships between data elements
|
||||
- Significance of the data presented in relation to surrounding context
|
||||
- How the table supports or illustrates concepts from the surrounding content
|
||||
Always use specific names and values instead of general references.",
|
||||
"entity_info": {{
|
||||
"entity_name": "{entity_name}",
|
||||
"entity_type": "table",
|
||||
"summary": "concise summary of the table's purpose, key findings, and relationship to surrounding content (max 100 words)"
|
||||
}}
|
||||
}}
|
||||
|
||||
Context from surrounding content:
|
||||
{context}
|
||||
|
||||
Table Information:
|
||||
Image Path: {table_img_path}
|
||||
Caption: {table_caption}
|
||||
Body: {table_body}
|
||||
Footnotes: {table_footnote}
|
||||
|
||||
Focus on extracting meaningful insights and relationships from the tabular data in the context of the surrounding content."""
|
||||
|
||||
# Equation analysis prompt template
|
||||
PROMPTS[
|
||||
"equation_prompt"
|
||||
@@ -122,6 +187,38 @@ Format: {equation_format}
|
||||
|
||||
Focus on providing mathematical insights and explaining the equation's significance."""
|
||||
|
||||
# Equation analysis prompt with context support
|
||||
PROMPTS[
|
||||
"equation_prompt_with_context"
|
||||
] = """Please analyze this mathematical equation considering the surrounding context, and provide a JSON response with the following structure:
|
||||
|
||||
{{
|
||||
"detailed_description": "A comprehensive analysis of the equation including:
|
||||
- Mathematical meaning and interpretation
|
||||
- Variables and their definitions in the context of surrounding content
|
||||
- Mathematical operations and functions used
|
||||
- Application domain and context based on surrounding material
|
||||
- Physical or theoretical significance
|
||||
- Relationship to other mathematical concepts mentioned in the context
|
||||
- Practical applications or use cases
|
||||
- How the equation relates to the broader discussion or framework
|
||||
Always use specific mathematical terminology.",
|
||||
"entity_info": {{
|
||||
"entity_name": "{entity_name}",
|
||||
"entity_type": "equation",
|
||||
"summary": "concise summary of the equation's purpose, significance, and role in the surrounding context (max 100 words)"
|
||||
}}
|
||||
}}
|
||||
|
||||
Context from surrounding content:
|
||||
{context}
|
||||
|
||||
Equation Information:
|
||||
Equation: {equation_text}
|
||||
Format: {equation_format}
|
||||
|
||||
Focus on providing mathematical insights and explaining the equation's significance within the broader context."""
|
||||
|
||||
# Generic content analysis prompt template
|
||||
PROMPTS[
|
||||
"generic_prompt"
|
||||
@@ -146,6 +243,34 @@ Content: {content}
|
||||
|
||||
Focus on extracting meaningful information that would be useful for knowledge retrieval."""
|
||||
|
||||
# Generic content analysis prompt with context support
|
||||
PROMPTS[
|
||||
"generic_prompt_with_context"
|
||||
] = """Please analyze this {content_type} content considering the surrounding context, and provide a JSON response with the following structure:
|
||||
|
||||
{{
|
||||
"detailed_description": "A comprehensive analysis of the content including:
|
||||
- Content structure and organization
|
||||
- Key information and elements
|
||||
- Relationships between components
|
||||
- Context and significance in relation to surrounding content
|
||||
- How this content connects to or supports the broader discussion
|
||||
- Relevant details for knowledge retrieval
|
||||
Always use specific terminology appropriate for {content_type} content.",
|
||||
"entity_info": {{
|
||||
"entity_name": "{entity_name}",
|
||||
"entity_type": "{content_type}",
|
||||
"summary": "concise summary of the content's purpose, key points, and relationship to surrounding context (max 100 words)"
|
||||
}}
|
||||
}}
|
||||
|
||||
Context from surrounding content:
|
||||
{context}
|
||||
|
||||
Content: {content}
|
||||
|
||||
Focus on extracting meaningful information that would be useful for knowledge retrieval and understanding the content's role in the broader context."""
|
||||
|
||||
# Modal chunk templates
|
||||
PROMPTS["image_chunk"] = """
|
||||
Image Content Analysis:
|
||||
|
||||
@@ -38,6 +38,8 @@ from raganything.modalprocessors import (
|
||||
TableModalProcessor,
|
||||
EquationModalProcessor,
|
||||
GenericModalProcessor,
|
||||
ContextExtractor,
|
||||
ContextConfig,
|
||||
)
|
||||
|
||||
|
||||
@@ -67,6 +69,9 @@ class RAGAnything(QueryMixin, ProcessorMixin, BatchMixin):
|
||||
modal_processors: Dict[str, Any] = field(default_factory=dict, init=False)
|
||||
"""Dictionary of multimodal processors."""
|
||||
|
||||
context_extractor: Optional[ContextExtractor] = field(default=None, init=False)
|
||||
"""Context extractor for providing surrounding content to modal processors."""
|
||||
|
||||
def __post_init__(self):
|
||||
"""Post-initialization setup following LightRAG pattern"""
|
||||
# Initialize configuration if not provided
|
||||
@@ -99,6 +104,29 @@ class RAGAnything(QueryMixin, ProcessorMixin, BatchMixin):
|
||||
)
|
||||
self.logger.info(f" Max concurrent files: {self.config.max_concurrent_files}")
|
||||
|
||||
def _create_context_config(self) -> ContextConfig:
|
||||
"""Create context configuration from RAGAnything config"""
|
||||
return ContextConfig(
|
||||
context_window=self.config.context_window,
|
||||
context_mode=self.config.context_mode,
|
||||
max_context_tokens=self.config.max_context_tokens,
|
||||
include_headers=self.config.include_headers,
|
||||
include_captions=self.config.include_captions,
|
||||
filter_content_types=self.config.context_filter_content_types,
|
||||
)
|
||||
|
||||
def _create_context_extractor(self) -> ContextExtractor:
|
||||
"""Create context extractor with tokenizer from LightRAG"""
|
||||
if self.lightrag is None:
|
||||
raise ValueError(
|
||||
"LightRAG must be initialized before creating context extractor"
|
||||
)
|
||||
|
||||
context_config = self._create_context_config()
|
||||
return ContextExtractor(
|
||||
config=context_config, tokenizer=self.lightrag.tokenizer
|
||||
)
|
||||
|
||||
def _initialize_processors(self):
|
||||
"""Initialize multimodal processors with appropriate model functions"""
|
||||
if self.lightrag is None:
|
||||
@@ -106,6 +134,9 @@ class RAGAnything(QueryMixin, ProcessorMixin, BatchMixin):
|
||||
"LightRAG instance must be initialized before creating processors"
|
||||
)
|
||||
|
||||
# Create context extractor
|
||||
self.context_extractor = self._create_context_extractor()
|
||||
|
||||
# Create different multimodal processors based on configuration
|
||||
self.modal_processors = {}
|
||||
|
||||
@@ -113,25 +144,33 @@ class RAGAnything(QueryMixin, ProcessorMixin, BatchMixin):
|
||||
self.modal_processors["image"] = ImageModalProcessor(
|
||||
lightrag=self.lightrag,
|
||||
modal_caption_func=self.vision_model_func or self.llm_model_func,
|
||||
context_extractor=self.context_extractor,
|
||||
)
|
||||
|
||||
if self.config.enable_table_processing:
|
||||
self.modal_processors["table"] = TableModalProcessor(
|
||||
lightrag=self.lightrag, modal_caption_func=self.llm_model_func
|
||||
lightrag=self.lightrag,
|
||||
modal_caption_func=self.llm_model_func,
|
||||
context_extractor=self.context_extractor,
|
||||
)
|
||||
|
||||
if self.config.enable_equation_processing:
|
||||
self.modal_processors["equation"] = EquationModalProcessor(
|
||||
lightrag=self.lightrag, modal_caption_func=self.llm_model_func
|
||||
lightrag=self.lightrag,
|
||||
modal_caption_func=self.llm_model_func,
|
||||
context_extractor=self.context_extractor,
|
||||
)
|
||||
|
||||
# Always include generic processor as fallback
|
||||
self.modal_processors["generic"] = GenericModalProcessor(
|
||||
lightrag=self.lightrag, modal_caption_func=self.llm_model_func
|
||||
lightrag=self.lightrag,
|
||||
modal_caption_func=self.llm_model_func,
|
||||
context_extractor=self.context_extractor,
|
||||
)
|
||||
|
||||
self.logger.info("Multimodal processors initialized")
|
||||
self.logger.info("Multimodal processors initialized with context support")
|
||||
self.logger.info(f"Available processors: {list(self.modal_processors.keys())}")
|
||||
self.logger.info(f"Context configuration: {self._create_context_config()}")
|
||||
|
||||
def update_config(self, **kwargs):
|
||||
"""Update configuration with new values"""
|
||||
@@ -207,6 +246,14 @@ class RAGAnything(QueryMixin, ProcessorMixin, BatchMixin):
|
||||
"enable_table_processing": self.config.enable_table_processing,
|
||||
"enable_equation_processing": self.config.enable_equation_processing,
|
||||
},
|
||||
"context_extraction": {
|
||||
"context_window": self.config.context_window,
|
||||
"context_mode": self.config.context_mode,
|
||||
"max_context_tokens": self.config.max_context_tokens,
|
||||
"include_headers": self.config.include_headers,
|
||||
"include_captions": self.config.include_captions,
|
||||
"filter_content_types": self.config.context_filter_content_types,
|
||||
},
|
||||
"batch_processing": {
|
||||
"max_concurrent_files": self.config.max_concurrent_files,
|
||||
"supported_file_extensions": self.config.supported_file_extensions,
|
||||
@@ -217,6 +264,66 @@ class RAGAnything(QueryMixin, ProcessorMixin, BatchMixin):
|
||||
},
|
||||
}
|
||||
|
||||
def set_content_source_for_context(
|
||||
self, content_source, content_format: str = "auto"
|
||||
):
|
||||
"""Set content source for context extraction in all modal processors
|
||||
|
||||
Args:
|
||||
content_source: Source content for context extraction (e.g., MinerU content list)
|
||||
content_format: Format of content source ("minerU", "text_chunks", "auto")
|
||||
"""
|
||||
if not self.modal_processors:
|
||||
self.logger.warning(
|
||||
"Modal processors not initialized. Content source will be set when processors are created."
|
||||
)
|
||||
return
|
||||
|
||||
for processor_name, processor in self.modal_processors.items():
|
||||
try:
|
||||
processor.set_content_source(content_source, content_format)
|
||||
self.logger.debug(f"Set content source for {processor_name} processor")
|
||||
except Exception as e:
|
||||
self.logger.error(
|
||||
f"Failed to set content source for {processor_name}: {e}"
|
||||
)
|
||||
|
||||
self.logger.info(
|
||||
f"Content source set for context extraction (format: {content_format})"
|
||||
)
|
||||
|
||||
def update_context_config(self, **context_kwargs):
|
||||
"""Update context extraction configuration
|
||||
|
||||
Args:
|
||||
**context_kwargs: Context configuration parameters to update
|
||||
(context_window, context_mode, max_context_tokens, etc.)
|
||||
"""
|
||||
# Update the main config
|
||||
for key, value in context_kwargs.items():
|
||||
if hasattr(self.config, key):
|
||||
setattr(self.config, key, value)
|
||||
self.logger.debug(f"Updated context config: {key} = {value}")
|
||||
else:
|
||||
self.logger.warning(f"Unknown context config parameter: {key}")
|
||||
|
||||
# Recreate context extractor with new config if processors are initialized
|
||||
if self.lightrag and self.modal_processors:
|
||||
try:
|
||||
self.context_extractor = self._create_context_extractor()
|
||||
# Update all processors with new context extractor
|
||||
for processor_name, processor in self.modal_processors.items():
|
||||
processor.context_extractor = self.context_extractor
|
||||
|
||||
self.logger.info(
|
||||
"Context configuration updated and applied to all processors"
|
||||
)
|
||||
self.logger.info(
|
||||
f"New context configuration: {self._create_context_config()}"
|
||||
)
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to update context configuration: {e}")
|
||||
|
||||
def get_processor_info(self) -> Dict[str, Any]:
|
||||
"""Get processor information"""
|
||||
base_info = {
|
||||
|
||||
Reference in New Issue
Block a user