Maximizing Relevance in Qdrant Multi-Vector RAG Environments: Hybrid Search, Scoring Strategies, and Adaptive Embedding Optimization

Learn how to achieve optimal relevance using Qdrant in a Multi-Vector RAG (Retrieval Augmented Generation) system. This guide presents practical methods to maximize search quality through hybrid search, customized scoring strategies, and adaptive embedding optimization, thereby revolutionizing the user experience.

1. The Challenge / Context

The core of a RAG system lies in its ability to retrieve the most relevant documents for a user's query. Single-vector embedding methods have limitations in capturing various aspects of a document, leading to the adoption of multi-vector approaches. However, multi-vector RAG requires sophisticated strategies to correctly define relationships between vectors and maximize the relevance of search results. Especially when using vector databases like Qdrant, hybrid search, scoring strategies, and embedding optimization are crucial. In an age of information overload, accurately and quickly finding the information users want is paramount, and irrelevant search results can lead to user churn. This article deeply analyzes how to overcome these challenges and maximize the performance of multi-vector RAG systems using Qdrant.

2. Deep Dive: Qdrant and Multi-Vector RAG

Qdrant is a fast and scalable vector similarity search engine. In a RAG system, Qdrant converts user queries into vectors and compares them with document vectors stored in the database to retrieve the most similar documents. Multi-vector RAG represents a single document with multiple vectors, where each vector represents different aspects of the document (e.g., topic, style, keywords). This allows for richer and more accurate searches than single-vector methods. Qdrant supports various distance metrics (cosine similarity, Euclidean distance, etc.) and can combine them at query time to implement hybrid search. Furthermore, scoring strategies can be used to sort search results, and adaptive embedding optimization can continuously improve system performance.

3. Step-by-Step Guide / Implementation

This section provides a step-by-step guide to maximizing relevance in a Qdrant multi-vector RAG environment.

Step 1: Data Structure Design and Vector Definition

Define how documents will be divided and how vectors will be generated for each part. For example, for a news article, you can create separate vectors for the title, body summary, and main keywords.


from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer

client = QdrantClient(":memory:")  # 로컬 메모리 환경에서 Qdrant 실행 (테스트용)

# 컬렉션 생성
client.recreate_collection(
    collection_name="multi_vector_rag",
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE), # 임베딩 차원 설정
)

# 임베딩 모델 초기화
model = SentenceTransformer('all-MiniLM-L6-v2')

def create_payload(title, summary, keywords):
    return {
        "title": title,
        "summary": summary,
        "keywords": keywords
    }

def create_vectors(title, summary, keywords):
    return [
        model.encode(title).tolist(),
        model.encode(summary).tolist(),
        model.encode(keywords).tolist()
    ]

Step 2: Data Insertion

Insert the defined vectors and metadata into Qdrant. Each vector has a unique ID, and metadata can be used for filtering and sorting search results.


# 샘플 데이터
title = "인공지능, 미래를 바꾸다"
summary = "인공지능 기술이 빠르게 발전하며 사회 전반에 걸쳐 혁신을 일으키고 있습니다."
keywords = "인공지능, AI, 머신러닝, 딥러닝"

# 페이로드 생성
payload = create_payload(title, summary, keywords)

# 벡터 생성
vectors = create_vectors(title, summary, keywords)

# Qdrant에 데이터 삽입
client.upsert(
    collection_name="multi_vector_rag",
    points=[
        models.PointStruct(
            id=1,
            vector=vectors[0], # 제목 벡터
            payload=payload
        ),
        models.PointStruct(
            id=2,
            vector=vectors[1], # 요약 벡터
            payload=payload
        ),
        models.PointStruct(
            id=3,
            vector=vectors[2], # 키워드 벡터
            payload=payload
        ),
    ],
    wait=True # 작업 완료까지 대기
)

Step 3: Implement Hybrid Search

Convert user queries into vectors and perform similarity searches in Qdrant. Calculate the similarity between the query vector and each document vector, then apply a scoring strategy to sort the search results. Weights can be adjusted to assign different importance to multiple vectors.


# 쿼리 생성
query = "인공지능 기술 동향"

# 쿼리 벡터 생성
query_vector = model.encode(query).tolist()

# 하이브리드 검색 (제목, 요약, 키워드 벡터 모두 검색)
search_result = client.search(
    collection_name="multi_vector_rag",
    query_vector=query_vector,
    limit=5, # 상위 5개 결과 반환
    append_payload=True # 페이로드 정보 추가
)

# 검색 결과 출력
for result in search_result:
    print(f"Score: {result.score},