Friday, August 29, 2025

From LLMs to RAG, leveraging the best available tools tailored for isolated enterprise environments


From LLMs to RAG, using the best available tools.

We are kicking off this blog series with the ambition of designing the perfect on-prem AI architecture for businesses.

Enterprises everywhere face the same challenge: how to harness the power of LLMs while keeping sensitive business data fully under control. Crucially, they want these benefits without compromising security.

Organizations are asking for solutions that combine powerful large language models with a controlled, trustworthy flow of high-quality data securely managed in a fully isolated environment to avoid any risk of leakage. At the same time, there is a growing preference for open-source technologies, trusted for their transparency, flexibility, and strong security track record.
It is crucial that this architecture can seamlessly integrate with the company’s existing information systems, ensuring compatibility with current identity and authorization providers. Beyond the technical solution, clients are also seeking an extended model that includes the management of the AI system’s deployment and evolution, integrated with their existing quality pipelines, along with the ability to debug and audit how context information is being incorporated and utilized within the AI system.

Our proposal is an enterprise-ready AI architecture that starts small as a prototype focused on specific business processes but is designed to grow. Each component can be replaced or upgraded over time, ensuring long-term flexibility and performance improvements without vendor lock-in.

Enterprise RAG Architecture (On-Prem, Isolated)

Our proposed architecture is modular, open-source friendly, and fully isolated from external networks. It can start as a prototype and grow into a production-grade system without vendor lock-in.

With these three layers, enterprises get a scalable, auditable, and secure RAG environment: capable of powering digital assistants, integrating with business systems, and evolving over time.


 Layer 1    AI & Retrieval

  • LLM Serving (LLaMA, Mistral, etc. via vLLM/TGI/Ollama) → Natural language understanding & generation.

  • Retrieval Layer (LlamaIndex / LangChain) → Orchestrates RAG workflows.

  • Vector Database + Re-ranking (FAISS/Qdrant + BGE/ColBERT) → Semantic search with high accuracy.

 Layer 2    Data & Storage

  • PostgreSQL → Metadata, context, audit logs.

  • MinIO (S3) → Raw documents, versions, derived chunks.

  • Ingestion/ETL Pipeline (Airflow/Prefect) → Parsing, chunking, embedding, indexing.

 Layer 3    Security & Operations

  • Auth & Access Control (Keycloak / SSO) → Role-based security.

  • Observability (Prometheus, Grafana, ELK) → Monitor performance & quality.

  • Secrets & Encryption (Vault/HSM) → Protect data & credentials.

  • Caching (Redis) → Faster responses, lower cost.

 



Technology Overview and Interoperability


Why this stack “clicks”: shared standards (S3 API, SQL, OIDC/OAuth2, REST/gRPC, OpenTelemetry, Prometheus metrics), rich SDKs/connectors in LlamaIndex/LangChain, and loose coupling (object store as source of truth; vector DB as an index; Postgres for control/audit). This keeps every component replaceable without breaking the whole.

Category

Component

Function

Open Source?

Interfaces & Integration

AI & Retrieval

LLaMA / Mistral

LLMs for NLU/NLG

LLaMA (community license), Mistral (Apache-2.0)

Served via vLLM/TGI/Ollama, OpenAI-style HTTP APIs

vLLM / TGI / Ollama

High-throughput model serving

Yes (Apache-2.0 / MIT)

REST, WebSocket, OpenAI-compatible APIs

LlamaIndex / LangChain

RAG orchestration & pipelines

Yes (OSS, MIT)

Python/JS SDKs, connectors, REST

FAISS / Qdrant

Vector search & retrieval

Yes (MIT / Apache-2.0)

C++/Python APIs, REST/gRPC

Re-rankers (BGE / ColBERT)

Improves retrieval precision

Yes (Apache-2.0 / MIT)

Python models, REST wrappers

Data & Storage

PostgreSQL + JSONB

Metadata, context, audit logs

Yes (PostgreSQL license)

SQL, JDBC/ODBC, logical replication

MinIO (S3)

Object storage for documents

Yes (AGPL-3.0)

S3 API (HTTP), SDKs

Airflow / Prefect

ETL, ingestion, scheduling

Yes (Apache-2.0)

Python DAGs/flows, REST, CLI

Security & Operations

Keycloak

Auth, SSO, RBAC

Yes (Apache-2.0)

OIDC, OAuth2, SAML

Prometheus + Grafana

Metrics & dashboards

Yes (Apache-2.0 / AGPL-3.0 core)

Prometheus scrape, Grafana UI/API

ELK / OpenSearch

Logs & search

ELK (SSPL/Elastic), OpenSearch (Apache-2.0)

REST/JSON, Dashboards

OpenTelemetry

Standard for traces/metrics/logs

Yes (Apache-2.0)

OTLP (gRPC/HTTP), SDKs

Vault / HSM

Secrets & encryption

Vault (BSL), HSM (proprietary)

REST API, PKCS#11, KMIP

Redis / Valkey

Caching & semantic keys

Redis (RSAL), Valkey (Apache-2.0)

RESP/TCP, TLS, client SDKs

 With this foundation in place, the real question becomes: where can AI assistants deliver the most immediate value?

From Architecture to Impact: Who Benefits First


The goal of this post is to introduce our journey toward implementing a secure, enterprise-ready RAG system. This is a starting point: in the following posts, we will move from architecture to practice, exploring how AI assistants can be applied to specific business domains.

This is just the beginning. Over the coming posts, we’ll show how these assistants can be trained and deployed, turning architectural vision into measurable operational impact.

Future posts will focus on building specialized agents for areas such as:

  • Procurement Assistant
    Helps teams draft, review, and manage purchase orders and supplier contracts. Can answer questions like “What are the terms of supplier X?” or “Show me all contracts expiring this quarter.”

  • Inventory & Supply Chain Assistant
    Provides quick insights on stock levels, reorder points, and supply chain risks. Can suggest replenishment actions or flag unusual consumption patterns.

  • Contract Compliance Assistant
    Monitors agreements and alerts users when obligations, deadlines, or renewal dates are approaching. Helps ensure compliance without manual tracking.

  • Operations Dashboard Assistant
    A conversational layer over KPIs (orders processed, delivery times, costs, SLAs). Lets managers ask, “What’s the backlog in order processing today?”

  • Customer Support Knowledge Assistant
    Provides employees with instant access to resolution steps for common customer or user issues, reducing response time and improving consistency.

  • Training & Onboarding Assistant
    Guides new employees through internal processes and documentation, answering “how-to” questions about operational workflows.

  • Financial Operations Assistant
    Supports teams by retrieving contract values, invoice statuses, or forecasting budget impacts from changes in orders or suppliers.