fix: use DocStatus.PROCESSED enum instead of hardcoded uppercase string

## Problem

Status comparisons used hardcoded uppercase string "PROCESSED" which
didn't match LightRAG's DocStatus enum that stores lowercase "processed".
This caused text_processed to always return False even when documents
were successfully processed.

**Evidence:**
- LightRAG's DocStatus enum (lightrag/base.py): PROCESSED = "processed"
- RAGAnything's DocStatus enum (raganything/base.py:11): PROCESSED = "processed"
- Current code checked: doc_status == "PROCESSED" (uppercase) 
- Actual value from LightRAG: "processed" (lowercase) ✓

**Impact:**
- is_document_fully_processed() always returned False
- get_document_processing_status() showed text_processed as False
- Multimodal processing logic incorrectly detected status

## Solution

Replace hardcoded string literals with DocStatus.PROCESSED enum constant
(already imported at line 14).

**Changes:**
- Line 481: doc_status == "PROCESSED" → DocStatus.PROCESSED
- Line 486: doc_status == "PROCESSED" → DocStatus.PROCESSED
- Line 1355: doc_status.get("status") == "PROCESSED" → DocStatus.PROCESSED
- Line 1387: doc_status.get("status") == "PROCESSED" → DocStatus.PROCESSED
- Updated comments (lines 463, 478) for consistency

**Benefits:**
1.  Fixes case mismatch bug - enum auto-converts to lowercase
2.  Type-safe - IDE/linter catches errors
3.  Maintainable - single source of truth (no magic strings)
4.  Future-proof - if enum changes, code updates automatically
5.  Follows Python best practices

**Compatibility:**
- Works with LightRAG v1.4.9.2+
- Compatible with LightRAG v1.4.9.3 (which added PREPROCESSED status)
- No breaking changes

**References:**
- LightRAG DocStatus: lightrag/base.py
- RAGAnything DocStatus: raganything/base.py:11
- Related: LightRAG v1.4.9.3 added PREPROCESSED = "multimodal_processed"
This commit is contained in:
Yasiru Rangana
2025-10-19 23:36:54 +11:00
parent 8079053506
commit e70cf8d38a

View File

@@ -460,7 +460,7 @@ class ProcessorMixin:
self.logger.debug("No multimodal content to process")
return
# Check multimodal processing status - handle LightRAG's early "PROCESSED" marking
# Check multimodal processing status - handle LightRAG's early DocStatus.PROCESSED marking
try:
existing_doc_status = await self.lightrag.doc_status.get_by_id(doc_id)
if existing_doc_status:
@@ -475,15 +475,15 @@ class ProcessorMixin:
)
return
# Even if status is "PROCESSED" (text processing done),
# Even if status is DocStatus.PROCESSED (text processing done),
# we still need to process multimodal content if not yet done
doc_status = existing_doc_status.get("status", "")
if doc_status == "PROCESSED" and not multimodal_processed:
if doc_status == DocStatus.PROCESSED and not multimodal_processed:
self.logger.info(
f"Document {doc_id} text processing is complete, but multimodal content still needs processing"
)
# Continue with multimodal processing
elif doc_status == "PROCESSED" and multimodal_processed:
elif doc_status == DocStatus.PROCESSED and multimodal_processed:
self.logger.info(
f"Document {doc_id} is fully processed (text + multimodal)"
)
@@ -1352,7 +1352,7 @@ class ProcessorMixin:
if not doc_status:
return False
text_processed = doc_status.get("status") == "PROCESSED"
text_processed = doc_status.get("status") == DocStatus.PROCESSED
multimodal_processed = doc_status.get("multimodal_processed", False)
return text_processed and multimodal_processed
@@ -1384,7 +1384,7 @@ class ProcessorMixin:
"chunks_count": 0,
}
text_processed = doc_status.get("status") == "PROCESSED"
text_processed = doc_status.get("status") == DocStatus.PROCESSED
multimodal_processed = doc_status.get("multimodal_processed", False)
fully_processed = text_processed and multimodal_processed