n8n vs Ruby/Rails for RAG: Scale Beyond 5,000 Docs

💡

You probably built your first RAG in n8n because it was fast. Then the wheels came off when you tried to embed 5,000 documents. At that point code wins.

1. Where n8n Winsand Why It Breaks for 5,000-Document RAG Pipelines

n8n shines for quick demos and low-risk workflows. It struggles when volume, retries, and observability matter.

Visual speed and prototyping strengths

You can sketch a workflow in minutes and show value fast. That momentum matters.

Drag‑and‑drop nodes reduce cognitive load for early ideation.
Built‑in auth and connectors eliminate boilerplate.
“Split in Batches” and simple loops handle dozens or hundreds of items.

For small experiments n8n feels like magic.

If your pipeline fits on one canvas and fails loudly you are fine.

Capability	n8n strengths	Hidden limits at scale
Time to first demo	Minutes with prebuilt nodes	Demo logic becomes brittle under volume
Connectors	Rich catalog	Vendor API quirks multiply and are hard to normalize
Iteration	Visual edits are quick	Versioning and repeatability lag behind code

The moment you scale to thousands the cracks show.

The embedding‑5,000‑documents problem (performance, memory, throughput)

Embedding looks simple until chunking multiplies your workload. The totals spike fast.

5,000 PDFs with ~5 chunks each - ~25,000 embedding calls.
200 ms per call at modest concurrency - hours to complete.
Node memory spikes when upstream nodes buffer large payloads.

A rough back‑of‑the‑envelope helps you plan.

# 25k embeddings, 200ms each, concurrency = 8
ops = 25_000
latency = 0.200
conc = 8.0
eta_hours = (ops * latency) / conc / 3600.0
puts format("ETA ≈ %.2f hours", eta_hours)  # ≈ 0.17*25 ≈ 3.47h

You can bump concurrency yet retries and backoff then become the real bottleneck.

Silent failures in the data loader and weak error/retry semantics

At scale you hit edge cases that the canvas hides.

File pickers succeed yet downstream text extractors return empty strings.
API nodes retry per node not per unit of work which breaks idempotency.
Partial runs produce mixed states with little auditability.

Symptoms appear late which hurts trust.

The worst failure is the one you do not see until retrieval returns nonsense.

Concern	n8n default	What goes wrong at 5k+
Retries	Per‑node, fixed attempts	No per‑document backoff and no dead‑letter queue
Idempotency	Manual	Duplicate embeddings and skewed vectors
Logging	Node‑level	Hard to correlate document‑level events
Checkpointing	Ad hoc	Reruns reprocess good items or skip bad ones

You can hack around these issues yet the cost rises fast.

Why these issues matter specifically for RAG reliability

RAG breaks quietly when embeddings go missing or corrupted. You need guarantees not vibes.

Retrieval quality depends on consistent chunking and vector fidelity.
Evaluation becomes noisy when batches fail mid‑stream.
Incident response needs per‑document lineage and deterministic reruns.

For durable RAG you need code‑grade control.

2. Building a Simple, Reliable RAG Pipeline with a Ruby Script and RubyLLM

Start small with a single Ruby file. Add robust logging, retries, and idempotency first.

High‑level architecture of a Ruby‑based RAG pipeline

Keep components explicit and observable.

Loader: enumerate files and extract clean text.
Chunker: split by tokens with overlap and stable IDs.
Embedder: call RubyLLM with retries and backoff.
Store: write to Postgres + pgvector with upserts.
Audit: log per document and emit metrics.

This shape scales because each box is testable.

[Files] -> [Loader] -> [Chunker] -> [Embedder] -> [pgvector]
                 \_________________[Logger/Metrics]________/

Example Ruby script for loading, chunking, and embedding documents

You do not need a framework to get reliability. You need discipline.

# Gemfile (conceptual)
# gem "rubyllm"
# gem "pg"
# gem "pgvector"
# gem "concurrent-ruby"
# gem "jsonl"

require "logger"
require "concurrent-ruby"
require "pg"
require "rubyllm"

LOGGER = Logger.new($stdout, level: :info)
DB = PG.connect(ENV.fetch("DATABASE_URL"))

def upsert_embedding(doc_id:, chunk_id:, vector:, text:)
  DB.exec_params(<<~SQL, [doc_id, chunk_id, vector.pack('F*'), text])
    INSERT INTO embeddings (document_id, chunk_id, embedding, content)
    VALUES ($1, $2, $3, $4)
    ON CONFLICT (document_id, chunk_id)
    DO UPDATE SET embedding = EXCLUDED.embedding, content = EXCLUDED.content;
  SQL
end

def backoff(retries)
  sleep([2 ** retries * 0.25, 8].min)
end

def embed_with_retry(text, model: "text-embedding-3-large")
  retries = 0
  begin
    RubyLLM.embed(text:, model:)
  rescue => e
    raise if retries >= 5
    LOGGER.warn("embed failed: #{e.class} #{e.message}; retry=#{retries}")
    backoff(retries)
    retries += 1
    retry
  end
end

files = Dir.glob("./docs/**/*.pdf")
pool  = Concurrent::FixedThreadPool.new(Integer(ENV.fetch("CONCURRENCY", 8)))

files.each_with_index do |path, i|
  pool.post do
    doc_id = File.basename(path)
    chunks = Chunker.from_pdf(path, size: 800, overlap: 120) # your implementation
    chunks.each_with_index do |chunk, j|
      vec = embed_with_retry(chunk.text)
      upsert_embedding(doc_id: doc_id, chunk_id: j, vector: vec, text: chunk.text)
    end
    LOGGER.info("indexed #{doc_id} (#{chunks.size} chunks) [#{i+1}/#{files.size}]")
  end
end

pool.shutdown
pool.wait_for_termination
LOGGER.info("done")

This script is boring on purpose which is exactly what you want.

Robust error handling, logging, and retries in Ruby

Own your failure modes. Make them visible and recoverable.

Per‑document try/catch with exponential backoff.
Upserts for idempotency on reruns.
Structured logs with document and chunk context.

Now you can replay only what failed and prove it worked.

begin
  # process one document
rescue SpecificLoaderError => e
  LOGGER.error(doc: doc_id, err: e.message, type: "loader")
  DeadLetters.write(doc_id, reason: e.message)
rescue => e
  LOGGER.error(doc: doc_id, err: e.full_message, type: "unknown")
  raise # fail fast in batch mode
end

You elevate reliability without adding complexity.

Comparing operational visibility vs n8n

The differences jump off the page once you watch a large batch run.

Area	n8n canvas	Ruby script + RubyLLM
Log granularity	Node level	Per doc and per chunk with fields
Retries	Fixed attempts per node	Exponential backoff per item
Idempotency	Manual	Upserts keyed by document + chunk
Reprocessing	Rerun whole workflow	Targeted replay from dead‑letter store

Ruby makes the pipeline boring and debuggable which saves your weekend.

3. Hosting and Scaling the RAG Pipeline in Rails

Rails turns the script into a service with jobs, APIs, and dashboards. You keep control and gain comfort.

Modeling documents and embeddings in Rails (pgvector/Neighbor)

Represent vectors as first‑class data. Keep schema simple.

# db/migrate/XXXX_add_pgvector.rb
class AddPgvector < ActiveRecord::Migration[7.1]
  def change
    enable_extension "vector"
    create_table :documents do |t|
      t.string :external_id, null: false
      t.string :title
      t.jsonb  :metadata, default: {}
      t.timestamps
    end
    add_index :documents, :external_id, unique: true

    create_table :embeddings do |t|
      t.references :document, null: false, foreign_key: true
      t.integer :chunk_id, null: false
      t.vector  :embedding, limit: 1536 # match your model
      t.text    :content
      t.timestamps
    end
    add_index :embeddings, [:document_id, :chunk_id], unique: true
    add_index :embeddings, :embedding, using: :hnsw # pgvector 0.6+
  end
end

A clear schema reduces confusion and speeds up incident work.

Background jobs (Sidekiq/ActiveJob) for processing 5,000+ documents

Batch work belongs in jobs with bounded concurrency and backoff.

# app/jobs/index_document_job.rb
class IndexDocumentJob < ApplicationJob
  queue_as :embeddings

  retry_on StandardError, wait: :exponentially_longer, attempts: 10

  def perform(external_id)
    doc = Document.find_or_create_by!(external_id:)
    chunks = Chunker.for(doc).chunks
    chunks.each.with_index do |chunk, j|
      vec = RubyLLM.embed(text: chunk.text, model: ENV.fetch("EMBED_MODEL"))
      Embedding.upsert({ document_id: doc.id, chunk_id: j, embedding: vec, content: chunk.text }, unique_by: %i[document_id chunk_id])
    end
  end
end

You pick the queue size and workers which sets clear guardrails.

# config/sidekiq.yml
:concurrency: 12
:queues:
  - [embeddings, 5]
  - [default, 3]
  - [low, 1]

This keeps your API responsive while large batches run.

Implementing the retrieval + generation flow (RAGSearchService‑style)

Wrap retrieval and LLM calls behind a service object. Test it like any other class.

# app/services/rag_search_service.rb
class RAGSearchService
  TopK = 8

  def initialize(llm: RubyLLM)
    @llm = llm
  end

  def call(query)
    qvec = @llm.embed(text: query, model: ENV["EMBED_MODEL"]) # one call
    hits = Embedding.order(Embedding.arel_table[:embedding].cosine_distance(qvec)).limit(TopK)
    context = hits.map { |e| e.content }.join("\n---\n")
    @llm.chat!(system: "Answer with cited snippets.", messages: [
      { role: :user, content: prompt(query, context) }
    ])
  end

  private

  def prompt(query, context)
    <<~TXT
    Use the context to answer the question.
    Context:\n#{context}
    Question: #{query}
    TXT
  end
end

A controller exposes it and a health check watches it.

# app/controllers/search_controller.rb
class SearchController < ApplicationController
  def index
    render json: RAGSearchService.new.call(params.require(:q))
  end
end

Small pieces fit together without drama which is the whole point.

Monitoring, observability, and deployment considerations for production

Treat this like any Rails app. That is a feature not a bug.

Metrics: queue depth, job duration, embeddings/sec, error rate.
Logs: document_id, chunk_id, attempt, latency.
Alerts: dead‑letter backlog and elevated 5xx on search endpoint.

Ship with health checks and a clear rollback plan.

You cannot fix what you cannot see so wire up telemetry first.

4. Choosing Between n8n and Ruby/Rails for Your Next RAG Project

Use n8n when speed beats certainty. Use Rails when certainty beats speed.

When n8n is ‘good enough’

Keep the canvas if the stakes are low or the scale is tiny.

Prototypes, internal demos, and one‑off research.
Under 100 documents or fewer than 1,000 chunks end to end.
Non critical latency and manual babysitting is acceptable.

If the outcome is reversible you can stay visual for a while.

Clear signals you’ve outgrown no‑code for RAG

Certain smells tell you it is time to switch.

You cannot answer “which documents failed and why.”
Reruns duplicate vectors or corrupt state.
Batch runs exceed your change window.

If you feel dread before every run you already know the answer.

Migration path: from n8n prototype to Rails‑based RAG system

Move in three tight loops. Keep risk small and progress visible.

Extract embedding into a Ruby worker with pgvector writes.
Replace n8n chunking and loaders with tested Ruby modules.
Swap retrieval to Rails and leave the rest behind the API.

You end with a boring pipeline and a calmer team.

Step	What you keep	What you replace
1. Embedding worker	n8n triggers	Ruby job + upserts + metrics
2. Document ingest	n8n file nodes	Ruby loaders + deterministic chunker
3. Search API	n8n HTTP node	Rails controller + RAGSearchService

A checklist helps keep you honest.

pgvector enabled and indexed
Dead‑letter queue wired
Retry policy documented
Dashboards for QPS, errors, and queue depth

Momentum feels good when signal replaces guesswork.

Conclusion: Use n8n for Speed, Ruby/Rails for Safety and Scale

n8n gets you moving fast which is valuable. Ruby/Rails with RubyLLM gets you to reliable scale which is priceless.

Use n8n for demos and small internal tools.
Use Ruby/Rails for 5,000+ documents, strict retries, and audit trails.
Ship the pipeline as a Rails service with jobs and metrics.

Choose the tool that fits the blast radius not just the first hour of work.

💡

Quick recap: prototype in n8n, embed and store in Ruby, then graduate to Rails for jobs, APIs, and observability. Your future self will thank you.

Self‑Host n8n with Docker: Volumes, Backups, Scaling

Zapier vs Make vs n8n: The Credentials War Explained