Kelzodi blog: AI in Practice: August 2025

From LLMs to RAG, using the best available tools.

We are kicking off this blog series with the ambition of designing the perfect on-prem AI architecture for businesses.

Enterprises everywhere face the same challenge: how to harness the power of LLMs while keeping sensitive business data fully under control. Crucially, they want these benefits without compromising security.

Organizations are asking for solutions that combine powerful large language models with a controlled, trustworthy flow of high-quality data securely managed in a fully isolated environment to avoid any risk of leakage. At the same time, there is a growing preference for open-source technologies, trusted for their transparency, flexibility, and strong security track record.
It is crucial that this architecture can seamlessly integrate with the company’s existing information systems, ensuring compatibility with current identity and authorization providers. Beyond the technical solution, clients are also seeking an extended model that includes the management of the AI system’s deployment and evolution, integrated with their existing quality pipelines, along with the ability to debug and audit how context information is being incorporated and utilized within the AI system.

Our proposal is an enterprise-ready AI architecture that starts small as a prototype focused on specific business processes but is designed to grow. Each component can be replaced or upgraded over time, ensuring long-term flexibility and performance improvements without vendor lock-in.

Enterprise RAG Architecture (On-Prem, Isolated)

Our proposed architecture is modular, open-source friendly, and fully isolated from external networks. It can start as a prototype and grow into a production-grade system without vendor lock-in.

With these three layers, enterprises get a scalable, auditable, and secure RAG environment: capable of powering digital assistants, integrating with business systems, and evolving over time.

Layer 1 AI & Retrieval

LLM Serving (LLaMA, Mistral, etc. via vLLM/TGI/Ollama) → Natural language understanding & generation.
Retrieval Layer (LlamaIndex / LangChain) → Orchestrates RAG workflows.
Vector Database + Re-ranking (FAISS/Qdrant + BGE/ColBERT) → Semantic search with high accuracy.

Layer 2 Data & Storage

PostgreSQL → Metadata, context, audit logs.
MinIO (S3) → Raw documents, versions, derived chunks.
Ingestion/ETL Pipeline (Airflow/Prefect) → Parsing, chunking, embedding, indexing.

Layer 3 Security & Operations

Auth & Access Control (Keycloak / SSO) → Role-based security.
Observability (Prometheus, Grafana, ELK) → Monitor performance & quality.
Secrets & Encryption (Vault/HSM) → Protect data & credentials.
Caching (Redis) → Faster responses, lower cost.

Technology Overview and Interoperability

Why this stack “clicks”: shared standards (S3 API, SQL, OIDC/OAuth2, REST/gRPC, OpenTelemetry, Prometheus metrics), rich SDKs/connectors in LlamaIndex/LangChain, and loose coupling (object store as source of truth; vector DB as an index; Postgres for control/audit). This keeps every component replaceable without breaking the whole.

Category	Component	Function	Open Source?	Interfaces & Integration
AI & Retrieval	LLaMA / Mistral	LLMs for NLU/NLG	LLaMA (community license), Mistral (Apache-2.0)	Served via vLLM/TGI/Ollama, OpenAI-style HTTP APIs
	vLLM / TGI / Ollama	High-throughput model serving	Yes (Apache-2.0 / MIT)	REST, WebSocket, OpenAI-compatible APIs
	LlamaIndex / LangChain	RAG orchestration & pipelines	Yes (OSS, MIT)	Python/JS SDKs, connectors, REST
	FAISS / Qdrant	Vector search & retrieval	Yes (MIT / Apache-2.0)	C++/Python APIs, REST/gRPC
	Re-rankers (BGE / ColBERT)	Improves retrieval precision	Yes (Apache-2.0 / MIT)	Python models, REST wrappers
Data & Storage	PostgreSQL + JSONB	Metadata, context, audit logs	Yes (PostgreSQL license)	SQL, JDBC/ODBC, logical replication
	MinIO (S3)	Object storage for documents	Yes (AGPL-3.0)	S3 API (HTTP), SDKs
	Airflow / Prefect	ETL, ingestion, scheduling	Yes (Apache-2.0)	Python DAGs/flows, REST, CLI
Security & Operations	Keycloak	Auth, SSO, RBAC	Yes (Apache-2.0)	OIDC, OAuth2, SAML
	Prometheus + Grafana	Metrics & dashboards	Yes (Apache-2.0 / AGPL-3.0 core)	Prometheus scrape, Grafana UI/API
	ELK / OpenSearch	Logs & search	ELK (SSPL/Elastic), OpenSearch (Apache-2.0)	REST/JSON, Dashboards
	OpenTelemetry	Standard for traces/metrics/logs	Yes (Apache-2.0)	OTLP (gRPC/HTTP), SDKs
	Vault / HSM	Secrets & encryption	Vault (BSL), HSM (proprietary)	REST API, PKCS#11, KMIP
	Redis / Valkey	Caching & semantic keys	Redis (RSAL), Valkey (Apache-2.0)	RESP/TCP, TLS, client SDKs

With this foundation in place, the real question becomes: where can AI assistants deliver the most immediate value?

From Architecture to Impact: Who Benefits First

The goal of this post is to introduce our journey toward implementing a secure, enterprise-ready RAG system. This is a starting point: in the following posts, we will move from architecture to practice, exploring how AI assistants can be applied to specific business domains.

This is just the beginning. Over the coming posts, we’ll show how these assistants can be trained and deployed, turning architectural vision into measurable operational impact.

Future posts will focus on building specialized agents for areas such as:

Procurement Assistant
Helps teams draft, review, and manage purchase orders and supplier contracts. Can answer questions like “What are the terms of supplier X?” or “Show me all contracts expiring this quarter.”
Inventory & Supply Chain Assistant
Provides quick insights on stock levels, reorder points, and supply chain risks. Can suggest replenishment actions or flag unusual consumption patterns.
Contract Compliance Assistant
Monitors agreements and alerts users when obligations, deadlines, or renewal dates are approaching. Helps ensure compliance without manual tracking.
Operations Dashboard Assistant
A conversational layer over KPIs (orders processed, delivery times, costs, SLAs). Lets managers ask, “What’s the backlog in order processing today?”
Customer Support Knowledge Assistant
Provides employees with instant access to resolution steps for common customer or user issues, reducing response time and improving consistency.
Training & Onboarding Assistant
Guides new employees through internal processes and documentation, answering “how-to” questions about operational workflows.
Financial Operations Assistant
Supports teams by retrieving contract values, invoice statuses, or forecasting budget impacts from changes in orders or suppliers.

Friday, August 29, 2025

From LLMs to RAG, leveraging the best available tools tailored for isolated enterprise environments