From 10ee99952a6075add7133bba927d570f20ec5b9d Mon Sep 17 00:00:00 2001 From: laansdole Date: Sat, 4 Oct 2025 21:25:53 +0700 Subject: [PATCH] docs: update decision records --- docs/offline_setup.md | 69 ++++++++++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 21 deletions(-) diff --git a/docs/offline_setup.md b/docs/offline_setup.md index 8331010..2e5c16f 100644 --- a/docs/offline_setup.md +++ b/docs/offline_setup.md @@ -18,34 +18,61 @@ This dependency is indirect. The `RAG-Anything` codebase itself does not directl ## The Solution: Using a Local `tiktoken` Cache -To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable. +To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable **before** the application starts. When this environment variable is set, `tiktoken` will look for its model files in the specified local directory instead of attempting to download them from the internet. ### Steps to Implement the Solution: -1. **Create a Model Cache:** In an environment *with* internet access, run a simple Python script to download and cache the necessary `tiktoken` models. +1. **Create a Model Cache:** In an environment *with* internet access, run the provided script to download and cache the necessary `tiktoken` models. - ```python - import tiktoken - import os - - # Define the directory where you want to store the cache - cache_dir = "./tiktoken_cache" - if "TIKTOKEN_CACHE_DIR" not in os.environ: - os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir - - # Create the directory if it doesn't exist - if not os.path.exists(cache_dir): - os.makedirs(cache_dir) - - print("Downloading and caching tiktoken models...") - tiktoken.get_encoding("cl100k_base") - # tiktoken.get_encoding("p50k_base") - - print(f"tiktoken models have been cached in '{cache_dir}'") + ```bash + # Run the cache creation script + uv run scripts/create_tiktoken_cache.py ``` -2. **Deploy the Cache:** Copy the created `tiktoken_cache` directory to the machine where you will be running the `RAG-Anything` application. + This will create a `tiktoken_cache` directory in your project root containing the required model files. + +2. **Configure the Environment Variable:** Add the following line to your `.env` file: + + ```bash + TIKTOKEN_CACHE_DIR=./tiktoken_cache + ``` + + **Important:** You should ensure that the `.env` file is loaded **before** `LightRAG` imports `tiktoken`, making this configuration effective. + + ```python + import os + from typing import Dict, Any, Optional, Callable + import sys + import asyncio + import atexit + from dataclasses import dataclass, field + from pathlib import Path + from dotenv import load_dotenv + + # Add project root directory to Python path + sys.path.insert(0, str(Path(__file__).parent.parent)) + + # Load environment variables FIRST - before any imports that use tiktoken + load_dotenv(dotenv_path=".env", override=False) + + # Now import LightRAG (which will import tiktoken with the correct env var set) + from lightrag import LightRAG + from lightrag.utils import logger + + # Rest of the code... + ``` + +### Testing the Offline Setup + +1. **Create a `tiktoken_cache` directory:** If you don't have one already, create a directory named `tiktoken_cache` in the project root. +2. **Populate the cache:** Run the `scripts/create_tiktoken_cache.py` script to download the necessary tiktoken models into the `tiktoken_cache` directory. +3. **Set the `TIKTOKEN_CACHE_DIR` environment variable:** Add the line `TIKTOKEN_CACHE_DIR=./tiktoken_cache` to your `.env` file. +4. **Disconnect from the internet:** Disable your internet connection or put your machine in airplane mode. +5. **Run the application:** Start the `RAG-Anything` application. For example: + ``` + uv run examples/raganything_example.py requirements.txt + ``` By following these steps, you can eliminate the network dependency and run the `RAG-Anything` project successfully in a fully offline environment.