You probably built your first RAG in n8n because it was fast. Then the wheels came off when you tried to embed 5,000 documents. At that point code wins.
1. Where n8n Winsand Why It Breaks for 5,000-Document RAG Pipelines
n8n shines for quick demos and low-risk workflows. It struggles when volume, retries, and observability matter.
Visual speed and prototyping strengths
You can sketch a workflow in minutes and show value fast. That momentum matters.
- Dragâandâdrop nodes reduce cognitive load for early ideation.
- Builtâin auth and connectors eliminate boilerplate.
- âSplit in Batchesâ and simple loops handle dozens or hundreds of items.
For small experiments n8n feels like magic.
If your pipeline fits on one canvas and fails loudly you are fine.
| Capability | n8n strengths | Hidden limits at scale |
|---|---|---|
| Time to first demo | Minutes with prebuilt nodes | Demo logic becomes brittle under volume |
| Connectors | Rich catalog | Vendor API quirks multiply and are hard to normalize |
| Iteration | Visual edits are quick | Versioning and repeatability lag behind code |
The moment you scale to thousands the cracks show.
The embeddingâ5,000âdocuments problem (performance, memory, throughput)
Embedding looks simple until chunking multiplies your workload. The totals spike fast.
- 5,000 PDFs with ~5 chunks each - ~25,000 embedding calls.
- 200 ms per call at modest concurrency - hours to complete.
- Node memory spikes when upstream nodes buffer large payloads.
A rough backâofâtheâenvelope helps you plan.
# 25k embeddings, 200ms each, concurrency = 8
ops = 25_000
latency = 0.200
conc = 8.0
eta_hours = (ops * latency) / conc / 3600.0
puts format("ETA â %.2f hours", eta_hours) # â 0.17*25 â 3.47h
You can bump concurrency yet retries and backoff then become the real bottleneck.
Silent failures in the data loader and weak error/retry semantics
At scale you hit edge cases that the canvas hides.
- File pickers succeed yet downstream text extractors return empty strings.
- API nodes retry per node not per unit of work which breaks idempotency.
- Partial runs produce mixed states with little auditability.
Symptoms appear late which hurts trust.
The worst failure is the one you do not see until retrieval returns nonsense.
| Concern | n8n default | What goes wrong at 5k+ |
|---|---|---|
| Retries | Perânode, fixed attempts | No perâdocument backoff and no deadâletter queue |
| Idempotency | Manual | Duplicate embeddings and skewed vectors |
| Logging | Nodeâlevel | Hard to correlate documentâlevel events |
| Checkpointing | Ad hoc | Reruns reprocess good items or skip bad ones |
You can hack around these issues yet the cost rises fast.
Why these issues matter specifically for RAG reliability
RAG breaks quietly when embeddings go missing or corrupted. You need guarantees not vibes.
- Retrieval quality depends on consistent chunking and vector fidelity.
- Evaluation becomes noisy when batches fail midâstream.
- Incident response needs perâdocument lineage and deterministic reruns.
For durable RAG you need codeâgrade control.
2. Building a Simple, Reliable RAG Pipeline with a Ruby Script and RubyLLM
Start small with a single Ruby file. Add robust logging, retries, and idempotency first.
Highâlevel architecture of a Rubyâbased RAG pipeline
Keep components explicit and observable.
- Loader: enumerate files and extract clean text.
- Chunker: split by tokens with overlap and stable IDs.
- Embedder: call RubyLLM with retries and backoff.
- Store: write to Postgres + pgvector with upserts.
- Audit: log per document and emit metrics.
This shape scales because each box is testable.
[Files] -> [Loader] -> [Chunker] -> [Embedder] -> [pgvector]
\_________________[Logger/Metrics]________/
Example Ruby script for loading, chunking, and embedding documents
You do not need a framework to get reliability. You need discipline.
# Gemfile (conceptual)
# gem "rubyllm"
# gem "pg"
# gem "pgvector"
# gem "concurrent-ruby"
# gem "jsonl"
require "logger"
require "concurrent-ruby"
require "pg"
require "rubyllm"
LOGGER = Logger.new($stdout, level: :info)
DB = PG.connect(ENV.fetch("DATABASE_URL"))
def upsert_embedding(doc_id:, chunk_id:, vector:, text:)
DB.exec_params(<<~SQL, [doc_id, chunk_id, vector.pack('F*'), text])
INSERT INTO embeddings (document_id, chunk_id, embedding, content)
VALUES ($1, $2, $3, $4)
ON CONFLICT (document_id, chunk_id)
DO UPDATE SET embedding = EXCLUDED.embedding, content = EXCLUDED.content;
SQL
end
def backoff(retries)
sleep([2 ** retries * 0.25, 8].min)
end
def embed_with_retry(text, model: "text-embedding-3-large")
retries = 0
begin
RubyLLM.embed(text:, model:)
rescue => e
raise if retries >= 5
LOGGER.warn("embed failed: #{e.class} #{e.message}; retry=#{retries}")
backoff(retries)
retries += 1
retry
end
end
files = Dir.glob("./docs/**/*.pdf")
pool = Concurrent::FixedThreadPool.new(Integer(ENV.fetch("CONCURRENCY", 8)))
files.each_with_index do |path, i|
pool.post do
doc_id = File.basename(path)
chunks = Chunker.from_pdf(path, size: 800, overlap: 120) # your implementation
chunks.each_with_index do |chunk, j|
vec = embed_with_retry(chunk.text)
upsert_embedding(doc_id: doc_id, chunk_id: j, vector: vec, text: chunk.text)
end
LOGGER.info("indexed #{doc_id} (#{chunks.size} chunks) [#{i+1}/#{files.size}]")
end
end
pool.shutdown
pool.wait_for_termination
LOGGER.info("done")
This script is boring on purpose which is exactly what you want.
Robust error handling, logging, and retries in Ruby
Own your failure modes. Make them visible and recoverable.
- Perâdocument try/catch with exponential backoff.
- Upserts for idempotency on reruns.
- Structured logs with document and chunk context.
Now you can replay only what failed and prove it worked.
begin
# process one document
rescue SpecificLoaderError => e
LOGGER.error(doc: doc_id, err: e.message, type: "loader")
DeadLetters.write(doc_id, reason: e.message)
rescue => e
LOGGER.error(doc: doc_id, err: e.full_message, type: "unknown")
raise # fail fast in batch mode
end
You elevate reliability without adding complexity.
Comparing operational visibility vs n8n
The differences jump off the page once you watch a large batch run.
| Area | n8n canvas | Ruby script + RubyLLM |
|---|---|---|
| Log granularity | Node level | Per doc and per chunk with fields |
| Retries | Fixed attempts per node | Exponential backoff per item |
| Idempotency | Manual | Upserts keyed by document + chunk |
| Reprocessing | Rerun whole workflow | Targeted replay from deadâletter store |
Ruby makes the pipeline boring and debuggable which saves your weekend.
3. Hosting and Scaling the RAG Pipeline in Rails
Rails turns the script into a service with jobs, APIs, and dashboards. You keep control and gain comfort.
Modeling documents and embeddings in Rails (pgvector/Neighbor)
Represent vectors as firstâclass data. Keep schema simple.
# db/migrate/XXXX_add_pgvector.rb
class AddPgvector < ActiveRecord::Migration[7.1]
def change
enable_extension "vector"
create_table :documents do |t|
t.string :external_id, null: false
t.string :title
t.jsonb :metadata, default: {}
t.timestamps
end
add_index :documents, :external_id, unique: true
create_table :embeddings do |t|
t.references :document, null: false, foreign_key: true
t.integer :chunk_id, null: false
t.vector :embedding, limit: 1536 # match your model
t.text :content
t.timestamps
end
add_index :embeddings, [:document_id, :chunk_id], unique: true
add_index :embeddings, :embedding, using: :hnsw # pgvector 0.6+
end
end
A clear schema reduces confusion and speeds up incident work.
Background jobs (Sidekiq/ActiveJob) for processing 5,000+ documents
Batch work belongs in jobs with bounded concurrency and backoff.
# app/jobs/index_document_job.rb
class IndexDocumentJob < ApplicationJob
queue_as :embeddings
retry_on StandardError, wait: :exponentially_longer, attempts: 10
def perform(external_id)
doc = Document.find_or_create_by!(external_id:)
chunks = Chunker.for(doc).chunks
chunks.each.with_index do |chunk, j|
vec = RubyLLM.embed(text: chunk.text, model: ENV.fetch("EMBED_MODEL"))
Embedding.upsert({ document_id: doc.id, chunk_id: j, embedding: vec, content: chunk.text }, unique_by: %i[document_id chunk_id])
end
end
end
You pick the queue size and workers which sets clear guardrails.
# config/sidekiq.yml
:concurrency: 12
:queues:
- [embeddings, 5]
- [default, 3]
- [low, 1]
This keeps your API responsive while large batches run.
Implementing the retrieval + generation flow (RAGSearchServiceâstyle)
Wrap retrieval and LLM calls behind a service object. Test it like any other class.
# app/services/rag_search_service.rb
class RAGSearchService
TopK = 8
def initialize(llm: RubyLLM)
@llm = llm
end
def call(query)
qvec = @llm.embed(text: query, model: ENV["EMBED_MODEL"]) # one call
hits = Embedding.order(Embedding.arel_table[:embedding].cosine_distance(qvec)).limit(TopK)
context = hits.map { |e| e.content }.join("\n---\n")
@llm.chat!(system: "Answer with cited snippets.", messages: [
{ role: :user, content: prompt(query, context) }
])
end
private
def prompt(query, context)
<<~TXT
Use the context to answer the question.
Context:\n#{context}
Question: #{query}
TXT
end
end
A controller exposes it and a health check watches it.
# app/controllers/search_controller.rb
class SearchController < ApplicationController
def index
render json: RAGSearchService.new.call(params.require(:q))
end
end
Small pieces fit together without drama which is the whole point.
Monitoring, observability, and deployment considerations for production
Treat this like any Rails app. That is a feature not a bug.
- Metrics: queue depth, job duration, embeddings/sec, error rate.
- Logs: document_id, chunk_id, attempt, latency.
- Alerts: deadâletter backlog and elevated 5xx on search endpoint.
Ship with health checks and a clear rollback plan.
You cannot fix what you cannot see so wire up telemetry first.
4. Choosing Between n8n and Ruby/Rails for Your Next RAG Project
Use n8n when speed beats certainty. Use Rails when certainty beats speed.
When n8n is âgood enoughâ
Keep the canvas if the stakes are low or the scale is tiny.
- Prototypes, internal demos, and oneâoff research.
- Under 100 documents or fewer than 1,000 chunks end to end.
- Non critical latency and manual babysitting is acceptable.
If the outcome is reversible you can stay visual for a while.
Clear signals youâve outgrown noâcode for RAG
Certain smells tell you it is time to switch.
- You cannot answer âwhich documents failed and why.â
- Reruns duplicate vectors or corrupt state.
- Batch runs exceed your change window.
If you feel dread before every run you already know the answer.
Migration path: from n8n prototype to Railsâbased RAG system
Move in three tight loops. Keep risk small and progress visible.
- Extract embedding into a Ruby worker with pgvector writes.
- Replace n8n chunking and loaders with tested Ruby modules.
- Swap retrieval to Rails and leave the rest behind the API.
You end with a boring pipeline and a calmer team.
| Step | What you keep | What you replace |
|---|---|---|
| 1. Embedding worker | n8n triggers | Ruby job + upserts + metrics |
| 2. Document ingest | n8n file nodes | Ruby loaders + deterministic chunker |
| 3. Search API | n8n HTTP node | Rails controller + RAGSearchService |
A checklist helps keep you honest.
- pgvector enabled and indexed
- Deadâletter queue wired
- Retry policy documented
- Dashboards for QPS, errors, and queue depth
Momentum feels good when signal replaces guesswork.
Conclusion: Use n8n for Speed, Ruby/Rails for Safety and Scale
n8n gets you moving fast which is valuable. Ruby/Rails with RubyLLM gets you to reliable scale which is priceless.
- Use n8n for demos and small internal tools.
- Use Ruby/Rails for 5,000+ documents, strict retries, and audit trails.
- Ship the pipeline as a Rails service with jobs and metrics.
Choose the tool that fits the blast radius not just the first hour of work.
Quick recap: prototype in n8n, embed and store in Ruby, then graduate to Rails for jobs, APIs, and observability. Your future self will thank you.