docs: update decision records

2025-10-04 21:25:53 +07:00
parent 0ac3dc6bf7
commit 10ee99952a
1 changed files with 48 additions and 21 deletions
--- a/docs/offline_setup.md
+++ b/docs/offline_setup.md
@@ -18,34 +18,61 @@ This dependency is indirect. The `RAG-Anything` codebase itself does not directl
 ## The Solution: Using a Local `tiktoken` Cache
-To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable.
+To resolve this issue and enable fully offline operation, you must provide a local cache for the `tiktoken` models. This is achieved by setting the `TIKTOKEN_CACHE_DIR` environment variable **before** the application starts.
 When this environment variable is set, `tiktoken` will look for its model files in the specified local directory instead of attempting to download them from the internet.
 ### Steps to Implement the Solution:
-1.  **Create a Model Cache:** In an environment *with* internet access, run a simple Python script to download and cache the necessary `tiktoken` models.
+1.  **Create a Model Cache:** In an environment *with* internet access, run the provided script to download and cache the necessary `tiktoken` models.
-    ```python
+    ```bash
-    import tiktoken
+    # Run the cache creation script
-    import os
+    uv run scripts/create_tiktoken_cache.py
    # Define the directory where you want to store the cache
    cache_dir = "./tiktoken_cache"
    if "TIKTOKEN_CACHE_DIR" not in os.environ:
        os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir
    # Create the directory if it doesn't exist
    if not os.path.exists(cache_dir):
        os.makedirs(cache_dir)
    print("Downloading and caching tiktoken models...")
    tiktoken.get_encoding("cl100k_base")
    # tiktoken.get_encoding("p50k_base")
    print(f"tiktoken models have been cached in '{cache_dir}'")
    ```
-2.  **Deploy the Cache:** Copy the created `tiktoken_cache` directory to the machine where you will be running the `RAG-Anything` application.
+    This will create a `tiktoken_cache` directory in your project root containing the required model files.
 2.  **Configure the Environment Variable:** Add the following line to your `.env` file:
    ```bash
    TIKTOKEN_CACHE_DIR=./tiktoken_cache
    ```
    **Important:** You should ensure that the `.env` file is loaded **before** `LightRAG` imports `tiktoken`, making this configuration effective.
    ```python
    import os
    from typing import Dict, Any, Optional, Callable
    import sys
    import asyncio
    import atexit
    from dataclasses import dataclass, field
    from pathlib import Path
    from dotenv import load_dotenv
    # Add project root directory to Python path
    sys.path.insert(0, str(Path(__file__).parent.parent))
    # Load environment variables FIRST - before any imports that use tiktoken
    load_dotenv(dotenv_path=".env", override=False)
    # Now import LightRAG (which will import tiktoken with the correct env var set)
    from lightrag import LightRAG
    from lightrag.utils import logger
    # Rest of the code...
    ```
 ### Testing the Offline Setup
 1.  **Create a `tiktoken_cache` directory:** If you don't have one already, create a directory named `tiktoken_cache` in the project root.
 2.  **Populate the cache:** Run the `scripts/create_tiktoken_cache.py` script to download the necessary tiktoken models into the `tiktoken_cache` directory.
 3.  **Set the `TIKTOKEN_CACHE_DIR` environment variable:** Add the line `TIKTOKEN_CACHE_DIR=./tiktoken_cache` to your `.env` file.
 4.  **Disconnect from the internet:** Disable your internet connection or put your machine in airplane mode.
 5.  **Run the application:** Start the `RAG-Anything` application. For example:
    ```
    uv run examples/raganything_example.py requirements.txt
    ```
 By following these steps, you can eliminate the network dependency and run the `RAG-Anything` project successfully in a fully offline environment.