Red String is an application that generates interactive knowledge graphs in real-time from provided text. It utilizes a fine-tuned, quantized LLaMA 3.1 8B model to perform entity relationship extraction, processing text through sliding semantic windows to build a comprehensive node-link visualization.
Model Weights: The fine-tuned and quantized GGUF model is publicly hosted on Hugging Face: tspec2/redstring-8gb.
- Real-Time Graph Generation: Extracts relationships from input text and dynamically constructs the graph.
- Interactive Visualization: Users can hover over nodes and edges to view specific connection details, and click and drag nodes or isolated graph components to reorganize the layout.
- Deep Context Parsing: To overcome the model's limitation of extracting a small number of relationships per prompt, the application chunks input text into 2-sentence sliding windows. While this increases processing time, it ensures fine-grained, deep contextual relationship extraction across documents of any length.
- Entity Resolution: To avoid multiple connections to the same thing,
Jensen Huang -> founder of -> NvidiaandJensen Huang -> founder of -> Nvidia Inc., relations are normalized from common corporate/organizational suffixes. Entity resolution also utilizes token subset matching to connect partial names (e.g., "Trump" to "Donald Trump"), acronym detection, and Levenshtein distance fuzzy matching viacmpstrto resolve minor variations. To maintain graph accuracy, the system implements capitalization proxies to distinguish proper from common nouns, strict word-count limits to filter out LLM sentence hallucinations, and an exclusionary dictionary to prevent merging generic nouns (e.g., "President", "government"). When entities are merged, the graph dynamically upgrades node labels to display the most descriptive version of the term. - Semantic Graph Deduplication: Because the LLM may extract the same factual relationship using different phrasing across different context windows (e.g., "Pope Leo" vs. "The Pope"), the application performs a global semantic deduplication pass. It utilizes a background Web Worker running
@huggingface/transformersto generate dense vector embeddings for every extracted triplet using theall-MiniLM-L6-v2model. A cosine similarity threshold is then applied to merge semantically identical threads, ensuring a clean, uncluttered final graph without freezing the main UI thread.
Training is not required to run the application. The project currently consists of a Python-based Colab server for model inference and a React frontend for visualization.
- Google Colab account (A standard T4 GPU instance is sufficient; higher RAM/GPU instances will yield faster inference).
- Node.js and npm installed locally.
- Open
notebooks/server.ipynbin Google Colab. - Connect to a runtime equipped with a GPU.
- Run all cells in the notebook.
- The final cell execution will generate a secure Cloudflare URL (e.g.,
https://<random-string>.trycloudflare.com). Copy this URL.
- Open
app/src/App.jsxand locate the configuration section at the top of the file. - Paste the copied Cloudflare URL into the
API_URLvariable. - Navigate to the frontend directory:
cd app- Start the development server:
npm run dev- Open your browser to the provided localhost address. Paste your text into the input area and click "Start Investigation" to begin generating the graph.
- Base Model: Meta LLaMA 3.1 8B.
- PEFT / LoRA Configuration: The model was fine-tuned using Unsloth with a LoRA rank of 32 and an alpha of 16. An alpha of 16 was chosen to prevent overfitting to the training data and to enable more robust relation identification. The adapters were applied to all linear projection modules (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj). This comprehensive targeting was necessary because the model was trained on a dual objective: to execute accurate semantic relationship extraction while strictly adhering to a JSON output structure. - Optimization: Training utilized the
adamw_8bitoptimizer, which provided stable convergence without the need for extensive hyperparameter sweeps. - Quantization Strategy: The final model is serialized in the GGUF format using 8-bit (Q8_0) quantization. During development, 4-bit quantization (Q4_K_M) was evaluated and yielded ~50% faster inference times. However, the 4-bit degradation severely affected the model's ability to output valid JSON (requiring manual prompt pre-filling) and led to highly inaccurate relation extractions. Q8_0 was selected as the optimal deployment format, as the speed trade-off is negligible on GPU hardware while fully preserving output fidelity.
The model was fine-tuned using a refined subset of the REBEL dataset for relation extraction.
To optimize the data for LLM instruction-tuning, the following preprocessing steps were applied:
- Parquet Integration: Loaded from an auto-converted Parquet branch to bypass broken dataset loading scripts.
- Tag Parsing & Formatting: Native XML-style tags (
<triplet>,<subj>,<obj>) were parsed and converted into a structured JSON dictionary format. - Heuristic Filtering: Entries with malformed text or more than 5 distinct relationships were filtered out. The number of examples was limited to 20,000, and after filtering, this yielded approximately 16,000 examples.
- Instruction Tuning: The processed JSON triplets and context windows were mapped to a standard Alpaca instruction-tuning prompt format.
Context
Philippine president Manuel A . Roxas is currently featured on the front side of the bill , while the Mayon Volcano and the whale shark ( locally known as "butanding" ) are featured on the reverse side .
Unformatted Target
<triplet> Manuel A. Roxas <subj> Philippine president <obj> position held
JSON Formatted Target
{
"head": "Manuel A. Roxas",
"type": "position held",
"tail": "Philippine president"
}The model was evaluated against an isolated 500-sample slice of the REBEL test split. Ground truth examples were filtered to contain no more than 5 triplets to match the constraints of the instruction-tuning phase. The extraction quality was measured across three distinct tiers of string and semantic alignment to account for valid variations in generation:
- Exact Match: Requires a perfect, case-insensitive string match for the head, relation type, and tail simultaneously.
- Partial Match: Allows a match if the generated relation string overlaps with the ground truth, and the predicted entities are valid substrings of the target entities (resolving false penalties for generations like "Eiffel Tower" vs. "The Eiffel Tower").
- Semantic Match: Converts triples to dense vector embeddings using
all-MiniLM-L6-v2and calculates cosine similarity. Triples with a similarity score above the 80% threshold are matched. This captures factually similar relationships articulated with different vocabulary.
Results:
| Evaluation Tier | Precision | Recall | F1-Score |
|---|---|---|---|
| Exact Match | 50.61 | 46.43 | 48.43 |
| Partial Match | 55.39 | 50.81 | 53.00 |
| Semantic Match | 60.57 | 55.57 | 57.96 |
To quantify failure modes regarding fabricated information, the model's generated entities were checked against the source text.
- Hallucination Rate: 6.21%
This indicates that approximately 94% of all generated entities are exact lexical substrings of the original input context.
The pipeline's extraction speed was benchmarked across various document lengths using the quantized Q8_0 model hosted via Cloudflare Tunnels. The sliding window approach ensures deep contextual extraction with the following performance observed on a standard GPU instance:
- 467 words: 22 seconds (~21.2 words/sec)
- 599 words: 45 seconds (~13.3 words/sec)
- 716 words: 61 seconds (~11.7 words/sec)
- 1216 words: 81 seconds (~15.0 words/sec)
Average Processing Speed: ~14.3 words per second.
- Hosting the API: Initial deployment attempts using ngrok resulted in frequent pipeline errors. The server architecture was migrated to Cloudflare Tunnels, which provided a more stable and consistent hosting environment, albeit with a slight speed reduction.
- Contextual Limits: The fine-tuned model exhibited a ceiling of extracting ~5 relations per prompt, mirroring the distribution of the training data. This was resolved on the frontend by implementing a sliding-window text parser, which trades off overall processing speed for extraction depth.
The fine-tuned 8-bit quantized model proved highly effective for the character relationship extraction task. During inference, the model maintained a very low hallucination rate and achieved near 100% reliability in generating correctly formatted JSON outputs. In quantitative evaluations measuring semantic similarity, the model achieved an F1 score of 57.96.
Future iterations of this project will focus on optimizing the underlying language model and the fine-tuning pipeline. Planned experiments include:
- Model Scaling: Testing the extraction capabilities using a smaller base model to potentially reduce overhead.
- Dataset Expansion: Training on a larger subset of the available data to improve the model's generalization capabilities.
- Hyperparameter Optimization: Experimenting with alternative LoRA (Low-Rank Adaptation) configurations to identify more efficient training regimes.
The primary limitation on implementing these future improvements is the high computational cost of model training.
red-string/
├── app/ # React/Vite frontend application
├── notebooks/
│ ├── server.ipynb # Inference server initialization and hosting
│ └── train.ipynb # Model fine-tuning notebook
├── train_utils/ # Utilities for model training
│ ├── config.py
│ ├── data_utils.py
│ └── model_utils.py
└── requirements.txt # Python dependencies
The Python environment requires the following packages, detailed in requirements.txt:
unslothtorchtransformersdatasetsllama-cpp-pythonopenaitrltqdmsentence-transformers
The React/Vite frontend relies on the following core libraries for visualization and resolution:
react-force-graph-2dlucid-reactd3-forcecmpstr
