What is the difference between an embedding and a vector?

They are the same thing in the AI context. An embedding IS a vector — a list of numbers — produced by an embedding model to represent text (or images, audio, etc.) in a way that captures meaning. The terminology overlaps; 'embedding' emphasises the meaning-capture purpose, 'vector' emphasises the mathematical structure.

Should I use pgvector or a specialised vector database?

pgvector for the vast majority of cases — under 10M vectors, existing Postgres infrastructure, mixed transactional and vector workloads. Switch to a specialised managed service (Pinecone, Weaviate Cloud, Qdrant Cloud) when you cross tens of millions of vectors, when you need specific features pgvector does not handle (advanced multi-tenancy, complex hybrid search), or when you do not want to operate Postgres yourself.

Why does my RAG system retrieve the wrong documents?

Most RAG retrieval failures trace to the chunking strategy or the embedding model, not the database. Common causes: chunks too large (loses precision), chunks too small (loses context), pure vector search missing terminology-driven queries (use hybrid search), or stale data (build refresh pipelines). The retrieval layer is where 80 percent of RAG failures occur — invest evaluation effort there first.

How much do embeddings cost?

OpenAI's text-embedding-3-small costs roughly USD 0.02 per million tokens in 2026. For a corpus of 1 million documents averaging 2,000 tokens each, the embedding cost is around USD 40. Re-embedding for a model upgrade is the same cost, applied to the same corpus. Storage costs are typically larger than embedding costs at scale.

Is HRDC funding available for vector database training?

Yes. AITraining2U's AI Engineering programme — covering embeddings, vector databases (pgvector, Pinecone, Weaviate), RAG architecture, and production evaluation — is HRDC SBL-KHAS claimable for eligible Malaysian employers.

Embeddings & Vector Databases: A Practitioner's Guide (2026)

Most production AI systems in 2026 — anything with retrieval, semantic search, recommendations, or memory — are built on top of two things: embeddings and vector databases. Understanding them is non-negotiable for anyone building AI applications. Yet the foundational concepts get glossed over in most tutorials.

This is the working practitioner's view: what these things actually are, how to choose between options, and the implementation choices that actually matter.

1. What an embedding actually is

An embedding is a list of numbers — typically 384, 768, or 1,536 numbers long — that represents a piece of text in a way that captures its meaning. Texts with similar meanings produce embeddings that are close together in the multi-dimensional space those numbers describe.

Two crucial properties:

Semantic similarity becomes geometric distance. "A cat sat on the mat" and "The feline was on the rug" produce embeddings that are close together — even though they share almost no words.
The mapping is consistent within a model. Two embeddings produced by the same model can be compared. Embeddings from different models cannot be compared meaningfully.

Embeddings are produced by an embedding model. The popular options in 2026 include OpenAI's text-embedding-3 family, Cohere embed-v4, Anthropic's Voyage embeddings, and a long tail of open-source models (Nomic, BGE, E5, multilingual variants). For most production systems, pick one and stick with it — switching embedding models requires re-embedding all your data.

2. How vector databases work

The embedding & retrieval lifecycle

A vector database is a specialised store for embeddings. Its core operation is nearest-neighbour search: given a query embedding, return the K stored embeddings closest to it. This is the operation behind every semantic search, every "find similar documents," and every RAG retrieval step.

The naive implementation — compute distance to every stored embedding — is O(n) and unworkable beyond about 10,000 vectors. Vector databases use approximate nearest-neighbour (ANN) algorithms — HNSW, IVF, ScaNN — that trade a small amount of recall for orders-of-magnitude faster query time. In practice this is invisible: you query and get results in milliseconds.

3. Choosing a vector database in 2026

Vector database decision tree (2026)

Three categories of choice, depending on your situation.

Use pgvector if you can

The Postgres extension. If your team already runs Postgres — and most do — adding pgvector keeps your vector data alongside your transactional data, with one set of credentials, one backup strategy, and one operational story. Performance is good enough for the vast majority of real-world workloads (under 10M vectors). The fewer specialised systems in your stack, the better.

Use a managed vector DB for scale or simplicity

Pinecone, Weaviate Cloud, Qdrant Cloud, Turbopuffer. Reasons to pick these: tens of millions of vectors and counting; you do not have a Postgres team; you want hosted simplicity over operational control; you need specific features (multi-tenancy, hybrid search, namespaces) that pgvector does not handle naturally.

Use a self-hosted specialised vector DB rarely

Self-hosting Weaviate, Qdrant, or Milvus. Reasons: extreme scale (hundreds of millions to billions of vectors); strict data residency requirements that forbid managed services; or specialised features required for research workloads. For most enterprise applications, this is the wrong default.

4. The implementation decisions that actually matter

Embedding model choice

Larger embeddings (1536+ dimensions) are not always better — they cost more to store and query. For most applications, a strong 768- or 1024-dimensional embedding model is the sweet spot. Test on your actual data with a holdout evaluation set; benchmark numbers from generic datasets often do not predict performance on your specific corpus.

Chunking strategy

You do not embed entire documents — you embed chunks. The decision of how to chunk is the single biggest determinant of retrieval quality. A 2000-character chunk with 200-character overlap is a reasonable default. For technical content, semantic chunking (splitting at topic boundaries detected by embedding similarity) typically outperforms fixed-size splitting. We dive into this further in our production RAG article.

Hybrid search

Pure vector search (semantic similarity) misses queries that hinge on specific terminology or proper nouns. Hybrid search combines vector search with traditional keyword search (BM25 or TF-IDF) and merges results using reciprocal rank fusion. The recall improvement is consistently meaningful and worth the implementation cost.

Metadata filtering

Real applications need to filter results — by user, tenant, date range, document category — alongside semantic search. Vector databases handle this very differently. pgvector lets you use any SQL WHERE clause; managed vector DBs typically have their own metadata schemas. Decide your filtering needs upfront; retrofitting is painful.

5. Operational gotchas

Re-embedding is expensive. If you switch embedding models, you must re-embed everything. Build for this from day one — keep ingestion idempotent and versioned.
Stale data is invisible. Unlike a database query that fails when data is missing, a vector search returns whatever is closest — even if your index is months out of date. Build refresh pipelines and monitor recency.
Cost scales with vectors stored, not just queries. Plan capacity. A million vectors at 1536 dimensions in float32 is about 6 GB before any indexing overhead.
Quality is not a single number. Recall, precision, latency, and cost trade off against each other. Pick the optimisation target your application actually cares about, then measure it.

6. The minimum viable evaluation

Before you ship any retrieval-based system, build a small evaluation set: 50–100 query-correct-answer pairs. Run your retrieval pipeline on it. Measure recall@5 and recall@10. This is the difference between "the demo works" and "the system works." The discipline costs a day and saves quarters of debugging.

For Malaysian teams building retrieval systems with proper evaluation discipline, our AI Engineering programme covers embeddings, vector databases, RAG, and evaluation hands-on, HRDC SBL-KHAS claimable for eligible employers.

Embeddings & Vector Databases: A Practitioner's Guide

1. What an embedding actually is

2. How vector databases work

The embedding & retrieval lifecycle

3. Choosing a vector database in 2026

Vector database decision tree (2026)

Use pgvector if you can

Use a managed vector DB for scale or simplicity

Use a self-hosted specialised vector DB rarely

4. The implementation decisions that actually matter

Embedding model choice

Chunking strategy

Hybrid search

Metadata filtering

5. Operational gotchas

6. The minimum viable evaluation

About the author

Sources & References

More in AI Fundamentals

Reasoning Models in 2026: o3, DeepSeek R1, and Claude Extended Thinking

Model Context Protocol (MCP) Explained: The 2026 Standard for AI Tools

Production RAG: Patterns, Pitfalls, and What Actually Works

Frequently Asked Questions

Want to apply this in your organisation?