Monday, June 29, 2026

Using AI to Preserve Knowledge and Accelerate Maintenance of Enterprise Java Applications

Introduction

Enterprise Java applications often remain in production for decades, supporting critical business processes such as inventory management, order processing, logistics, finance, manufacturing and customer services. Although these systems continue to deliver considerable business value, organizations frequently encounter a common problem: the gradual loss of technical knowledge.

Developers move to other projects, documentation becomes outdated, and maintenance increasingly depends on a small number of specialists who understand the application's internal behavior. Eventually, the greatest risk is no longer the technology itself, but the disappearance of the knowledge required to maintain and evolve it.

Artificial Intelligence offers an opportunity to fundamentally change this situation.

The objective is not to replace the software engineer responsible for maintaining the application. Enterprise software maintenance still requires experienced professionals capable of understanding software architecture, making design decisions, validating business rules and ensuring software quality.

Instead, the goal is to provide an experienced Java architect or software engineer with an intelligent assistant capable of understanding the application almost as quickly as its original development team. Once the initial learning phase is complete, the AI continuously assists the engineer by accelerating incident resolution, improving the quality and safety of software changes, generating technical documentation, identifying architectural dependencies, and even helping understand the business processes implemented by the application.

Rather than replacing engineers, the proposed solution creates a permanent knowledge platform that captures years of technical expertise and transforms it into an intelligent maintenance assistant available throughout the application's entire lifecycle.

Proposed Solution

The proposed solution is based on an on-premise AI platform combining modern Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), semantic search, automated reverse engineering and continuous operational monitoring.

Rather than training a custom model specifically for the application, the platform continuously builds a structured knowledge repository by analysing every available source of information. The LLM consults this repository through a RAG architecture, allowing it to answer questions using accurate, up-to-date and verifiable project knowledge.

The implementation naturally evolves through four project phases:

Architecture Preparation
Knowledge Acquisition
RAG Construction
Automated Reverse Engineering

Once these phases have been completed, the platform becomes an AI-powered maintenance assistant that continuously evolves alongside the application.

Recommended Architecture

The proposed architecture is composed of several independent but highly integrated layers.

Knowledge Sources

The platform continuously ingests information from multiple technical and functional sources:

Java source code
Relational Database metadata
Configuration files
User Interface descriptions
Functional documentation
Technical documentation
Source code repositories
Application logs
Monitoring platforms
Issue management systems
Architecture diagrams
Deployment pipelines

Ingestion Layer

The ingestion layer is responsible for collecting and normalising information.

Typical responsibilities include:

Repository connectors
Metadata extraction
Source code parsing
Document parsing
Content chunking
Metadata enrichment
Continuous synchronization

Knowledge Base

The processed information is transformed into semantic embeddings and stored inside the knowledge repository.

The knowledge base typically contains:

Vector Database
Semantic Index
Metadata Index
Knowledge Graph
Hybrid Search Index

This repository becomes the technical memory of the application.

AI Orchestration Layer

The orchestration layer represents the intelligence of the platform.

Instead of simply forwarding user questions to the LLM, this layer builds the complete execution context.

Its responsibilities include:

Query Understanding
Semantic Search
Context Assembly
Prompt Orchestration
Tool Calling
Conversation Memory
Context Compression
Security Policies
LLM Interaction
Response Generation
Source Citation

Prompt Orchestration is one of the most important components of the solution. It dynamically constructs the final prompt sent to the LLM by combining retrieved documents, source code, database metadata, operational information, previous conversation context and task-specific instructions.

Applications

Once deployed, the platform provides several capabilities:

AI Technical Assistant
Incident Analyzer
Code Explorer
Dependency Explorer
Business Process Explorer
Documentation Generator
Change Impact Analyzer
Refactoring Assistant

Live Operational Connections

To keep the knowledge continuously updated, the platform should maintain live connections with operational systems such as:

Source Code Repository
Relational Database metadata
Logging Platform
Monitoring Platform
CI/CD Pipeline
Issue Tracking System
Documentation Repository

These connectors ensure that the assistant continuously learns from new deployments, incidents and software evolution.

Information Required by the AI Platform

The effectiveness of the assistant depends directly on the quality and completeness of the information supplied during the knowledge acquisition process.

Source Code

The complete Java application should be indexed, including:

Controllers
Services
Repositories
Entities
DTOs
Mappers
Validation rules
Batch jobs
Scheduled tasks
Security configuration
Integration services
Unit and integration tests

The objective is to understand not only individual classes but also the relationships between components.

Relational Database

The AI should analyse the complete logical database model:

Tables
Columns
Relationships
Primary Keys
Foreign Keys
Constraints
Views
Functions
Stored Procedures
Triggers
Indexes

This information allows the assistant to reconstruct the application's data model.

User Interface

Business knowledge is often better represented by the application's user interface than by the source code itself.

Useful information includes:

Screen captures
Navigation flows
Field descriptions
User manuals
Functional specifications

This enables the AI to relate technical implementation with actual business operations.

Documentation

Every available document should be incorporated:

Functional Specifications
Technical Specifications
Architecture Documents
Deployment Guides
API Documentation
Integration Specifications
Existing Diagrams

Operational Knowledge

Production experience represents one of the most valuable knowledge sources.

The platform should ingest:

Historical incidents
Root Cause Analysis reports
Previous fixes
Frequently executed SQL queries
Production logs
Stack traces
Monitoring alerts

Over time, operational knowledge becomes part of the assistant's expertise.

Phase 1 Architecture Preparation

The first phase focuses on designing the AI ecosystem.

Typical activities include:

Selecting the on-premise LLM
Selecting the Vector Database
Designing the RAG architecture
Designing the AI Orchestration Layer
Defining metadata models
Defining security policies
Identifying information sources
Designing update mechanisms
Defining governance policies

At this stage no application knowledge has yet been generated. The objective is to prepare the platform.

Phase 2 Knowledge Acquisition

Once the infrastructure is available, the platform begins collecting information from every available source.

Inputs include:

Source Code
Relational Database metadata
Documentation
User Interfaces
Configuration Files
Architecture Diagrams
Logs
Monitoring Information
Incident History

Each source is parsed, divided into semantic chunks, enriched with metadata and stored inside the knowledge repository.

This phase creates the knowledge foundation upon which the entire system will operate.

Phase 3 RAG Construction

After acquiring the available knowledge, the Retrieval-Augmented Generation platform is built.

Activities include:

Embedding generation
Vector indexing
Metadata indexing
Knowledge graph construction
Hybrid search configuration
Semantic retrieval optimisation
Prompt template design
Prompt orchestration workflows
Tool integration
Context ranking
Response validation

The resulting RAG platform allows the AI to retrieve accurate and relevant technical information before generating any response.

Unlike traditional LLM usage, every answer is grounded in the organization's own technical knowledge.

Phase 4 Automated Reverse Engineering

Once sufficient knowledge has been collected, the AI begins reconstructing the application's architecture automatically.

This is where the platform starts generating new technical knowledge rather than simply indexing existing information.

Technical Models

The AI can automatically generate:

Application Architecture
Layer Dependencies
Component Relationships
Service Catalogue
API Catalogue
Package Dependencies
Deployment Architecture

Data Models

The assistant reconstructs:

Logical Data Models
Entity Relationships
Data Flows
Database Dependencies

Business Models

Business knowledge extracted from code and documentation includes:

Business Entities
Business Rules
Validation Rules
Decision Logic
Domain Concepts

Workflow Reconstruction

The AI can automatically identify workflows such as:

Order Creation
Order Validation
Inventory Allocation
Stock Reservation
Inventory Updates
Shipment
Order Cancellation
Inventory Adjustments

Dependency Analysis

The assistant identifies:

Cross-module dependencies
Component interactions
Service dependencies
Data dependencies
External integrations

Change Impact Analysis

The platform can estimate:

Components affected by a modification
Potential regressions
Downstream impacts
Risk areas

Automatic Documentation

The reverse engineering process continuously generates documentation such as:

Architecture Documentation
Technical Documentation
API Documentation
Data Model Documentation
Business Process Documentation
Sequence Diagrams
Component Diagrams
Workflow Diagrams
Dependency Diagrams

At this point, the organization has effectively rebuilt the technical knowledge of the application, even if much of the original documentation has been lost.

Operational AI-Assisted Maintenance

Once the platform has completed the reverse engineering process, it becomes an intelligent assistant for day-to-day software maintenance.

Incident Analysis

The assistant can:

Analyse production incidents
Explain stack traces
Suggest root causes
Recommend diagnostic SQL queries
Identify affected components

Business Understanding

Engineers can ask questions such as:

How is inventory updated?
Which validations occur before an order is confirmed?
What happens when an order is cancelled?
Which services update stock levels?

Code Understanding

The assistant explains:

Business logic
Algorithms
Class responsibilities
Method interactions
Design patterns
Technical decisions

Change Impact Analysis

Before modifying the application, the AI can identify:

Impacted services
Impacted database objects
Affected APIs
Dependencies
Potential regressions

Safe Change Assistance

Rather than changing the software automatically, the assistant proposes improvements for human review.

Typical outputs include:

Implementation suggestions
Refactoring opportunities
SQL improvements
Performance recommendations
Security improvements
Regression test suggestions

Human validation remains mandatory before any deployment.

Documentation Assistance

The platform continuously generates and updates:

Technical documentation
Business documentation
Architecture diagrams
API descriptions
Operational guides

Continuous Knowledge Evolution

The platform is that it never stops learning.

Every software release enriches the knowledge repository through:

New source code
Database schema evolution
New documentation
Production incidents
Monitoring information
User feedback
Deployment history

The RAG repository continuously evolves, making the assistant increasingly accurate and valuable over time.

Instead of becoming obsolete, the knowledge platform grows alongside the application.

Conclusion

The proposed solution should not be viewed as an attempt to automate software maintenance or replace experienced software engineers.

Its real purpose is to preserve the technical knowledge accumulated over many years and provide software architects and maintenance engineers with an intelligent assistant capable of understanding both the software architecture and the underlying business processes in a fraction of the time traditionally required.

By combining modern Large Language Models, Retrieval-Augmented Generation, automated reverse engineering, semantic search and continuously evolving operational knowledge, organizations can transform legacy enterprise applications into self-documented systems supported by AI.

The result is not autonomous maintenance, but AI-Augmented Software Engineering: significantly faster onboarding of new maintainers, quicker incident resolution, safer software evolution, continuously updated documentation, deeper understanding of business processes, and ultimately, a substantial reduction in the long-term maintenance cost and risk of enterprise applications.

It is worth noting that several commercial solutions already pursue a similar vision of AI-assisted software engineering. Among them, Sourcegraph Cody is probably one of the closest, providing semantic code search, repository-wide understanding, Retrieval-Augmented Generation (RAG), and AI-assisted development over large codebases. However, the approach proposed in this article aims to go beyond source code analysis. It envisions a unified software knowledge platform that combines source code, relational database metadata, user interface descriptions, technical and functional documentation, operational logs, monitoring data, incident history, and deployment information into a continuously evolving knowledge repository. The objective is not only to assist developers while writing code, but to reconstruct the application's technical architecture, business processes, workflows, dependencies, and operational knowledge, creating a long-term AI companion for software maintenance, onboarding, architectural understanding, and safer system evolution.

Monday, June 1, 2026

Why the Future of Document Management Is Not Another ECM

For more than two decades, Enterprise Content Management (ECM) systems have been the foundation of corporate document management.

They have proven their value by providing secure storage, version control, workflows, permissions, audit trails, records management and compliance capabilities. Platforms such as SharePoint, Oracle WebCenter Content (WCC), OpenText, Alfresco and many others continue to manage millions of business-critical documents every day.

The problem is not that these systems have failed.

The problem is that the expectations of users have changed.

The New Challenge: Knowledge Discovery

Traditionally, document management was focused on storing and retrieving documents.

Users knew what they were looking for:

Find a document.
Locate the latest version.
Check who approved it.
Review its history.

Artificial Intelligence introduces a completely different expectation.

Users now want answers rather than documents.

They want to ask questions such as:

Which systems are impacted by this change?
What requirements are affected by this decision?
Which documents contain conflicting information?
What risks are associated with this component?
What knowledge already exists about this topic?

Answering these questions requires much more than document storage and keyword search.

It requires understanding relationships, context, dependencies and knowledge hidden inside thousands of documents.

This is something that traditional ECM platforms were never designed to do.

Why Simply Adding AI Is Not Enough

Many current initiatives attempt to connect existing ECM repositories to chatbots or generic AI assistants.

While this approach can produce impressive demonstrations, it often struggles in real-world environments.

The reason is simple.

Documents are distributed across multiple repositories, each with its own structure, permissions, metadata models and lifecycle rules.

A typical organization may store information in:

SharePoint
Oracle WebCenter Content
OpenText
Alfresco
File shares
PLM systems
Email archives

Each repository contains valuable knowledge, but none of them provides a complete picture.

Adding an AI assistant on top of a single repository does not solve the fragmentation problem.

The Reality: Nobody Wants to Replace Their ECM

This is where many proposed solutions become unrealistic.

Large organizations have invested years, sometimes decades, building their document management ecosystems.

They have:

Millions of documents.
Complex workflows.
Regulatory requirements.
Existing integrations.
Thousands of users.

No organization wants to hear:

"Replace your entire ECM infrastructure."

The cost, risk and disruption would be enormous.

In practice, most organizations will continue using their existing ECM platforms for many years.

And that is perfectly reasonable.

A Different Approach

Instead of replacing existing ECM systems, a more realistic strategy is to preserve them as the official systems of record.

Their role remains essential:

Document storage.
Version control.
Security.
Compliance.
Auditability.
Records management.

What changes is what sits above them.

Rather than building yet another ECM, organizations should introduce a new layer capable of connecting all existing repositories and transforming the information they contain into usable knowledge.

This new layer becomes the bridge between traditional document management and modern AI capabilities.

Preserving Existing ECM Investments

One of the key advantages of this approach is that it does not require organizations to replace their existing ECM platforms.

Systems such as SharePoint, Oracle WebCenter Content, OpenText or other repositories can continue operating exactly as they do today. Documents remain in their current locations, managed by the same permissions, workflows and governance processes already in place.

The new platform simply connects to these repositories through their existing APIs and content services.

Rather than moving documents, the platform discovers and indexes knowledge from multiple sources while leaving the original content untouched.

This significantly reduces risk, cost and implementation effort.

Starting Small

Perhaps the most important aspect of this architecture is that it does not require a massive transformation project from day one.

A first version could start with a very simple user interface:

A unified search screen.
An AI-powered chat interface.
Basic document discovery across repositories.
Simple knowledge exploration capabilities.

Behind the scenes, the platform would connect to existing ECMs and gradually build a unified knowledge layer.

Additional capabilities such as impact analysis, knowledge graphs, intelligent agents and advanced workflows could then be introduced incrementally.

This allows organizations to begin generating value immediately while evolving towards a much more powerful Document Intelligence Platform over time.

Instead of replacing existing systems, the new platform enhances them, bringing AI-powered knowledge discovery to repositories that organizations already trust and depend on.

From Multiple Repositories to a Unified Knowledge Layer

The objective is not to migrate documents.

The objective is to unify access to knowledge.

In this model, existing repositories remain untouched:

SharePoint continues managing SharePoint documents.
Oracle WCC continues managing WCC content.
Other repositories continue performing their current role.

Above them, a new intelligence layer is introduced.

This layer is responsible for:

Discovering information across repositories.
Extracting knowledge from documents.
Understanding relationships between information.
Building semantic indexes.
Applying security and permission rules.
Providing AI-powered search and analysis capabilities.

Users no longer need to know where information is stored.

They interact with a unified knowledge platform capable of accessing multiple repositories behind the scenes.

The Next Generation of Document Management

The future is unlikely to be another standalone ECM platform.

Instead, it is likely to be a new architecture where existing ECMs continue acting as trusted repositories while a new intelligence layer provides AI-driven discovery, analysis and knowledge management capabilities.

This approach protects previous investments, reduces migration risks and enables organizations to benefit from Artificial Intelligence without disrupting their existing document management landscape.

The challenge is no longer managing documents.

The challenge is understanding and exploiting the knowledge contained within them.

And that requires a new architectural foundation.

Proposed Technology Stack

At this stage, the objective is not to build every component from scratch. Instead, the platform should leverage proven technologies for each layer while focusing development efforts on the areas that create real business value.

For the user interface and application layer, a rapid development platform such as Jmix provides an excellent starting point. It enables the fast creation of enterprise-grade user interfaces, administration screens, workflows, dashboards and security models, allowing the project to deliver working functionality in a relatively short timeframe.

For the intelligence layer, modern open-source AI technologies provide the foundation for knowledge discovery and semantic search. Large Language Models (LLMs) can be deployed locally to ensure full control over data, while vector databases can be used to support semantic retrieval and knowledge exploration.

The platform can therefore be structured around several complementary layers:

Application Layer: User interfaces, administration, workflows and dashboards.
Knowledge Layer: Unified access to information coming from multiple repositories.
AI Layer: Semantic search, document understanding, summarization and knowledge discovery.
Security and Governance Layer: Permissions, auditability, classification and compliance.
Repository Layer: Existing ECMs and other systems of record.

A possible implementation could combine:

Jmix for rapid enterprise application development.
Spring Boot for backend services and integration.
Local LLMs for secure AI processing.
Vector databases for semantic search and retrieval.
Knowledge graph technologies for relationship discovery and impact analysis.
Existing ECM platforms as trusted systems of record.

The key point is that none of these technologies replace the current repositories. Instead, they work together to create a new layer of intelligence capable of discovering, connecting and exploiting the knowledge already stored across the organization.

This approach allows the project to start with a simple and practical first release while providing a clear path towards a much more advanced Document Intelligence Platform in future iterations.

Recommended Technologies

AI Layer

The AI Layer should be built using proven open-source technologies rather than developed from scratch. The objective is to leverage mature components and focus development efforts on the capabilities that create real business value.

Capability	Recommended Technologies	Purpose
Large Language Models (LLMs)	Llama, Mistral, Mixtral, Qwen	Document understanding, summarization, question answering and reasoning
LLM Runtime / Inference Engine	vLLM, Ollama, Text Generation Inference (TGI)	Efficient execution of AI models, either for production or development environments
Vector Database	Qdrant, pgvector, Milvus	Semantic search, embeddings storage and Retrieval-Augmented Generation (RAG)
Knowledge Graph	Neo4j, ArangoDB	Relationship discovery, dependency mapping and impact analysis
Agent Framework	Dify, LangGraph	AI workflows, intelligent assistants and agent orchestration

For an initial release, a combination of Jmix, Spring Boot, Dify, vLLM and Qdrant could provide a fast path towards a working platform with AI-powered search, document chat and semantic retrieval capabilities.

As the platform evolves, more advanced technologies such as LangGraph and Neo4j can be introduced to support sophisticated agent workflows, relationship analysis and knowledge discovery scenarios.

The key point is that these technologies are not the product itself. They are building blocks. The real value lies in the intelligence layer built on top of them, including knowledge federation, document classification, metadata extraction, relationship discovery, impact analysis and security-aware access to information across multiple repositories.

Knowledge Layer

The Knowledge Layer is the core of the platform. Its role is to connect existing repositories, normalize their information, apply security rules and transform distributed documents into usable knowledge.

Capability	Recommended Technologies	Purpose
Repository Connectors	REST APIs, CMIS, Microsoft Graph API, Oracle WCC APIs	Connect to SharePoint, WCC, ECMs and other repositories without replacing them
Integration and Synchronization	Spring Boot, Apache Camel, Kafka, RabbitMQ	Move metadata, events and document updates between repositories and the intelligence platform
Metadata Federation	PostgreSQL, Oracle, Elasticsearch / OpenSearch	Normalize metadata from different systems into a common searchable model
Security Federation	LDAP, Active Directory, Keycloak, OAuth2 / OpenID Connect	Preserve permissions, roles and identity rules across repositories
Knowledge Processing	Apache Tika, OCR engines, custom extraction services	Extract text, structure and relevant information from documents
Search Indexing	OpenSearch, Elasticsearch	Support fast keyword search, filtering and faceted navigation

This layer is especially important because it prevents the platform from becoming just another isolated repository. Instead, it acts as a bridge between existing ECMs and the new AI capabilities.

The key idea is that documents can remain in their current systems of record, while the Knowledge Layer creates a unified view of their metadata, content, permissions and relationships.

In other words, the Knowledge Layer is what allows the platform to connect SharePoint, WCC, other ECMs, PLM systems, databases and file shares under a common intelligence model.

Presentation Layer

The Presentation Layer is responsible for providing simple, powerful and user-friendly access to the platform. It should allow users to search, explore, analyse and interact with knowledge without needing to know where documents are physically stored.

Capability	Recommended Technologies	Purpose
Enterprise UI Development	Jmix, Vaadin	Rapid development of enterprise screens, administration panels, dashboards and workflows
Advanced Web Interfaces	React, Angular, Vue	Build richer user experiences such as AI search, document exploration and visual analysis
AI Chat Interface	Dify UI, custom React UI, Vaadin components	Provide conversational access to documents and knowledge
Dashboards and Analytics	Jmix dashboards, Apache Superset, Grafana	Display document metrics, usage, quality indicators and knowledge insights
Graph Visualization	Neo4j Bloom, Cytoscape.js, React Flow, D3.js	Visualize relationships between documents, systems, requirements and risks
Document Viewer	PDF.js, OnlyOffice, Collabora	Preview documents, compare versions and display extracted knowledge next to the original content

For the first version, Jmix remains a very appropriate option because it allows the team to build useful enterprise interfaces quickly, including search screens, metadata views, administration panels and basic workflows.

Later, more advanced interfaces can be introduced using React or specialized visualization libraries for AI chat, relationship graphs, impact analysis and document intelligence dashboards.

The main goal of this layer is to make the platform feel simple for the user, even if the underlying architecture connects many repositories, AI services and knowledge sources behind the scenes.

Example of a Unified Search and Knowledge Discovery Experience

The Unified Search and Knowledge Discovery screen is the primary entry point to the platform. It combines traditional document search with AI-powered knowledge discovery, allowing users to search across multiple repositories through a single interface.

Users can perform natural language queries, apply advanced filters and explore documents stored in different systems such as SharePoint, WCC, engineering repositories and other ECM platforms.

Search results are enriched with AI-generated summaries, metadata, relationships and impact information, helping users understand not only which documents exist, but also how they are connected to other systems, requirements, reports and business processes.

The interface is organized into three main areas:

Advanced Filters Panel: Allows users to refine searches by repository, document type, classification, status, program, owner, date range and tags.
Results Panel: Displays matching documents together with summaries, metadata and related knowledge.
Knowledge Panel: Provides additional context, including relationships, dependencies, impact analysis and AI-generated insights.

This approach transforms document retrieval into knowledge discovery, enabling users to find information faster, understand its context and assess its potential impact across the organization.