docs: update decision records
This commit is contained in:
@@ -18,34 +18,61 @@ This dependency is indirect. The `RAG-Anything` codebase itself does not directl
|
|||||||
|
|
||||||
## The Solution: Using a Local `tiktoken` Cache
|
## The Solution: Using a Local `tiktoken` Cache
|
||||||
|
|
||||||
To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable.
|
To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable **before** the application starts.
|
||||||
|
|
||||||
When this environment variable is set, `tiktoken` will look for its model files in the specified local directory instead of attempting to download them from the internet.
|
When this environment variable is set, `tiktoken` will look for its model files in the specified local directory instead of attempting to download them from the internet.
|
||||||
|
|
||||||
### Steps to Implement the Solution:
|
### Steps to Implement the Solution:
|
||||||
|
|
||||||
1. **Create a Model Cache:** In an environment *with* internet access, run a simple Python script to download and cache the necessary `tiktoken` models.
|
1. **Create a Model Cache:** In an environment *with* internet access, run the provided script to download and cache the necessary `tiktoken` models.
|
||||||
|
|
||||||
```python
|
```bash
|
||||||
import tiktoken
|
# Run the cache creation script
|
||||||
import os
|
uv run scripts/create_tiktoken_cache.py
|
||||||
|
|
||||||
# Define the directory where you want to store the cache
|
|
||||||
cache_dir = "./tiktoken_cache"
|
|
||||||
if "TIKTOKEN_CACHE_DIR" not in os.environ:
|
|
||||||
os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir
|
|
||||||
|
|
||||||
# Create the directory if it doesn't exist
|
|
||||||
if not os.path.exists(cache_dir):
|
|
||||||
os.makedirs(cache_dir)
|
|
||||||
|
|
||||||
print("Downloading and caching tiktoken models...")
|
|
||||||
tiktoken.get_encoding("cl100k_base")
|
|
||||||
# tiktoken.get_encoding("p50k_base")
|
|
||||||
|
|
||||||
print(f"tiktoken models have been cached in '{cache_dir}'")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Deploy the Cache:** Copy the created `tiktoken_cache` directory to the machine where you will be running the `RAG-Anything` application.
|
This will create a `tiktoken_cache` directory in your project root containing the required model files.
|
||||||
|
|
||||||
|
2. **Configure the Environment Variable:** Add the following line to your `.env` file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
TIKTOKEN_CACHE_DIR=./tiktoken_cache
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:** You should ensure that the `.env` file is loaded **before** `LightRAG` imports `tiktoken`, making this configuration effective.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
from typing import Dict, Any, Optional, Callable
|
||||||
|
import sys
|
||||||
|
import asyncio
|
||||||
|
import atexit
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
|
# Add project root directory to Python path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
# Load environment variables FIRST - before any imports that use tiktoken
|
||||||
|
load_dotenv(dotenv_path=".env", override=False)
|
||||||
|
|
||||||
|
# Now import LightRAG (which will import tiktoken with the correct env var set)
|
||||||
|
from lightrag import LightRAG
|
||||||
|
from lightrag.utils import logger
|
||||||
|
|
||||||
|
# Rest of the code...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Testing the Offline Setup
|
||||||
|
|
||||||
|
1. **Create a `tiktoken_cache` directory:** If you don't have one already, create a directory named `tiktoken_cache` in the project root.
|
||||||
|
2. **Populate the cache:** Run the `scripts/create_tiktoken_cache.py` script to download the necessary tiktoken models into the `tiktoken_cache` directory.
|
||||||
|
3. **Set the `TIKTOKEN_CACHE_DIR` environment variable:** Add the line `TIKTOKEN_CACHE_DIR=./tiktoken_cache` to your `.env` file.
|
||||||
|
4. **Disconnect from the internet:** Disable your internet connection or put your machine in airplane mode.
|
||||||
|
5. **Run the application:** Start the `RAG-Anything` application. For example:
|
||||||
|
```
|
||||||
|
uv run examples/raganything_example.py requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
By following these steps, you can eliminate the network dependency and run the `RAG-Anything` project successfully in a fully offline environment.
|
By following these steps, you can eliminate the network dependency and run the `RAG-Anything` project successfully in a fully offline environment.
|
||||||
|
|||||||
Reference in New Issue
Block a user