docs: update decision records

This commit is contained in:
laansdole
2025-10-04 21:25:53 +07:00
parent 0ac3dc6bf7
commit 10ee99952a

View File

@@ -18,34 +18,61 @@ This dependency is indirect. The `RAG-Anything` codebase itself does not directl
## The Solution: Using a Local `tiktoken` Cache ## The Solution: Using a Local `tiktoken` Cache
To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable. To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable **before** the application starts.
When this environment variable is set, `tiktoken` will look for its model files in the specified local directory instead of attempting to download them from the internet. When this environment variable is set, `tiktoken` will look for its model files in the specified local directory instead of attempting to download them from the internet.
### Steps to Implement the Solution: ### Steps to Implement the Solution:
1. **Create a Model Cache:** In an environment *with* internet access, run a simple Python script to download and cache the necessary `tiktoken` models. 1. **Create a Model Cache:** In an environment *with* internet access, run the provided script to download and cache the necessary `tiktoken` models.
```python ```bash
import tiktoken # Run the cache creation script
import os uv run scripts/create_tiktoken_cache.py
# Define the directory where you want to store the cache
cache_dir = "./tiktoken_cache"
if "TIKTOKEN_CACHE_DIR" not in os.environ:
os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir
# Create the directory if it doesn't exist
if not os.path.exists(cache_dir):
os.makedirs(cache_dir)
print("Downloading and caching tiktoken models...")
tiktoken.get_encoding("cl100k_base")
# tiktoken.get_encoding("p50k_base")
print(f"tiktoken models have been cached in '{cache_dir}'")
``` ```
2. **Deploy the Cache:** Copy the created `tiktoken_cache` directory to the machine where you will be running the `RAG-Anything` application. This will create a `tiktoken_cache` directory in your project root containing the required model files.
2. **Configure the Environment Variable:** Add the following line to your `.env` file:
```bash
TIKTOKEN_CACHE_DIR=./tiktoken_cache
```
**Important:** You should ensure that the `.env` file is loaded **before** `LightRAG` imports `tiktoken`, making this configuration effective.
```python
import os
from typing import Dict, Any, Optional, Callable
import sys
import asyncio
import atexit
from dataclasses import dataclass, field
from pathlib import Path
from dotenv import load_dotenv
# Add project root directory to Python path
sys.path.insert(0, str(Path(__file__).parent.parent))
# Load environment variables FIRST - before any imports that use tiktoken
load_dotenv(dotenv_path=".env", override=False)
# Now import LightRAG (which will import tiktoken with the correct env var set)
from lightrag import LightRAG
from lightrag.utils import logger
# Rest of the code...
```
### Testing the Offline Setup
1. **Create a `tiktoken_cache` directory:** If you don't have one already, create a directory named `tiktoken_cache` in the project root.
2. **Populate the cache:** Run the `scripts/create_tiktoken_cache.py` script to download the necessary tiktoken models into the `tiktoken_cache` directory.
3. **Set the `TIKTOKEN_CACHE_DIR` environment variable:** Add the line `TIKTOKEN_CACHE_DIR=./tiktoken_cache` to your `.env` file.
4. **Disconnect from the internet:** Disable your internet connection or put your machine in airplane mode.
5. **Run the application:** Start the `RAG-Anything` application. For example:
```
uv run examples/raganything_example.py requirements.txt
```
By following these steps, you can eliminate the network dependency and run the `RAG-Anything` project successfully in a fully offline environment. By following these steps, you can eliminate the network dependency and run the `RAG-Anything` project successfully in a fully offline environment.