Welcome & Goals
Welcome! If you are reading this, you are probably navigating the vast and fascinating world of Big Data.
In recent years, the explosion of Large Language Models (LLMs) like ChatGPT has completely changed how we interact with information. However, LLMs have a significant limitation: they only know what they were trained on, and they cannot securely access your private or strictly up-to-date data.
This is where Vector Databases and the RAG (Retrieval-Augmented Generation) architecture come into play. They are the missing link that allows AI models to dynamically read, understand, and use your specific data to generate accurate answers.
The Goal of This Guide
This tutorial was created as a practical, hands-on resource for our Big Data course. The main objective is to move from theory to practice without getting bogged down in unnecessary complexity.
By the end of this guide, you will be able to:
- Understand what a Vector Database is and why traditional relational databases (SQL) aren’t enough for AI applications.
- Set up and interact with Weaviate, a leading open-source Vector Database.
- Build a foundational RAG pipeline from scratch using Python.
Prerequisites
To get the most out of this tutorial, you don’t need to be a machine learning expert. You only need:
- A basic understanding of Python.
- A terminal and a code editor (like VS Code).
- Curiosity about how AI actually searches and retrieves information under the hood.
Ready to build your first AI-powered search engine? Let’s dive in!
Repository
Inside the github repository you can find a fully working implementation the code you’re going to see in this guide
Credits This guide is writter by Andrea Moschetto as an academic course project under the supervision of Professor Alfredo Pulvirenti