Generative AI: Unlocking Knowledge with LlamaIndex and Hugging Face in an On-Premise Setup

4 min readDec 16, 2023

As a General Knowledge teacher, you understand the importance of engaging students with interactive and dynamic learning experiences. In today’s era, leveraging technology can enhance the educational process. In this blog post, we’ll explore the implementation of a Local Language Model (LLM) using the LlamaIndex and Hugging Face to create a chatbot-like application capable of handling multiple queries and answers, mimicking a conversational experience with students.

Problem Statement

As a GK teacher, you aim to develop an application to answer questions from the 9th-grade GK books in PDF format. However, instead of a traditional question-answer format, the goal is to create a chatbot-like experience that can seamlessly handle follow-up questions and clarifications.

Business Requirements

Given budget constraints, the school authorities prefer in-house LLM and Vector Index provision. They are looking to avoid SAAS tools like OpenAI or Pinecone vector stores and want a one-time investment in infrastructure. The LLM should be hosted on-premises, supporting Vector Indexing. The initial focus is on the GK subject, but scalability is crucial for future expansion into other subjects like Mathematics, Chemistry, and Physics.

Technology Drivers

LlamaIndex

LlamaIndex comes to the rescue by connecting to data sources, and augmenting LLMs with Retrieval-Augmented Generation (RAG). This enables querying data, transforming it, and generating new insights. LlamaIndex facilitates asking questions about data, building chatbots, and creating semi-autonomous agents.

Hugging Face

Hugging Face, an open-source machine learning and data science platform provides a hub for AI experts and enthusiasts. It aids in building, deploying, and training machine learning models. The ability to create custom AI models is a key feature of Hugging Face.

Torch and Python v3.11

The Torch package, with its multi-dimensional tensors and mathematical operations, is crucial for the LLM model. Python v3.11, the latest major release, is the programming language of choice, offering numerous new features and optimizations.

Poetry

Poetry, a tool for dependency management and packaging in Python, simplifies library declaration and management, ensuring a smooth workflow.

Low-Level Requirements

LLM Model: Writer/camel-5b-hf

The Camel-5b model, trained on a dataset of 70,000 instruction-response records, is a testament to the expertise of the dedicated Writer Linguist team. This model excels in understanding and executing language-based instructions.

BAAI/bge-small-en-v1.5

The FlagEmbedding model, part of BAAI general embedding, maps text to a low-dimensional dense vector. This vector proves valuable for tasks like retrieval, classification, clustering, and semantic search.

Implementation Steps

Step 1: Download the LLM Model from the Hugging Face AI Registry

Part 1: Start the journey by downloading the LLM model from the Hugging Face AI registry.

Part 2:

Step 2: Train LLM Model with 9th Class GK PDF Files

Place the 9th class GK book in PDF format under the ./data location. Utilize the HuggingFaceLLM to train the LLM model with PDF files from local storage. This process internally uses the Torch plugin to train the LLM model in the .bin extension. The Retrieval Augmentation Generation (RAG) is applied to convert text to vectors using the embeddings model. The Llamaindex will use pdf packages to extract text from .pdf files and provide StorageContext to create a bridge between HuggingFaceLLM and local storage.

Step 3: Create ServiceContext Container

Create a ServiceContext container, a utility container for LlamaIndex index and query classes. This container includes objects commonly used for configuring every index and query, such as the LLM, the PromptHelper (for configuring input size/chunk size), the BaseEmbedding (for configuring the embedding model), and more.

Initialize the LlamaIndex query engine, a generic interface allowing you to ask questions about your data. The query engine takes in a natural language query and returns a rich response, often built on one or many indexes via retrievers. You can compose multiple query engines for advanced capabilities.

Execute QnA

Finally, execute the QnA system using the following commands:

poetry install
poetry run llm

Conclusion

In conclusion, this blog post outlines the steps to implement a powerful LLM using LlamaIndex and Hugging Face in an on-premise setup. This solution not only caters to the immediate requirement of answering questions from 9th-grade GK books but also lays the foundation for scalability in other subjects in the future. By combining the strengths of LlamaIndex and Hugging Face, you’re not just creating an educational tool but also unlocking a world of possibilities for interactive and engaging learning experiences.