Monday, June 29, 2026

 

Using AI to Preserve Knowledge and Accelerate Maintenance of Enterprise Java Applications

Introduction

Enterprise Java applications often remain in production for decades, supporting critical business processes such as inventory management, order processing, logistics, finance, manufacturing and customer services. Although these systems continue to deliver considerable business value, organizations frequently encounter a common problem: the gradual loss of technical knowledge.

Developers move to other projects, documentation becomes outdated, and maintenance increasingly depends on a small number of specialists who understand the application's internal behavior. Eventually, the greatest risk is no longer the technology itself, but the disappearance of the knowledge required to maintain and evolve it.

Artificial Intelligence offers an opportunity to fundamentally change this situation.

The objective is not to replace the software engineer responsible for maintaining the application. Enterprise software maintenance still requires experienced professionals capable of understanding software architecture, making design decisions, validating business rules and ensuring software quality.

Instead, the goal is to provide an experienced Java architect or software engineer with an intelligent assistant capable of understanding the application almost as quickly as its original development team. Once the initial learning phase is complete, the AI continuously assists the engineer by accelerating incident resolution, improving the quality and safety of software changes, generating technical documentation, identifying architectural dependencies, and even helping understand the business processes implemented by the application.

Rather than replacing engineers, the proposed solution creates a permanent knowledge platform that captures years of technical expertise and transforms it into an intelligent maintenance assistant available throughout the application's entire lifecycle.

 

Proposed Solution

The proposed solution is based on an on-premise AI platform combining modern Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), semantic search, automated reverse engineering and continuous operational monitoring.

Rather than training a custom model specifically for the application, the platform continuously builds a structured knowledge repository by analysing every available source of information. The LLM consults this repository through a RAG architecture, allowing it to answer questions using accurate, up-to-date and verifiable project knowledge.

The implementation naturally evolves through four project phases:

  1. Architecture Preparation
  2. Knowledge Acquisition
  3. RAG Construction
  4. Automated Reverse Engineering

Once these phases have been completed, the platform becomes an AI-powered maintenance assistant that continuously evolves alongside the application.

 

Recommended Architecture

The proposed architecture is composed of several independent but highly integrated layers.

Knowledge Sources

The platform continuously ingests information from multiple technical and functional sources:

  • Java source code
  • Relational Database metadata
  • Configuration files
  • User Interface descriptions
  • Functional documentation
  • Technical documentation
  • Source code repositories
  • Application logs
  • Monitoring platforms
  • Issue management systems
  • Architecture diagrams
  • Deployment pipelines

 

Ingestion Layer

The ingestion layer is responsible for collecting and normalising information.

Typical responsibilities include:

  • Repository connectors
  • Metadata extraction
  • Source code parsing
  • Document parsing
  • Content chunking
  • Metadata enrichment
  • Continuous synchronization

 

Knowledge Base

The processed information is transformed into semantic embeddings and stored inside the knowledge repository.

The knowledge base typically contains:

  • Vector Database
  • Semantic Index
  • Metadata Index
  • Knowledge Graph
  • Hybrid Search Index

This repository becomes the technical memory of the application.

 

AI Orchestration Layer

The orchestration layer represents the intelligence of the platform.

Instead of simply forwarding user questions to the LLM, this layer builds the complete execution context.

Its responsibilities include:

  • Query Understanding
  • Semantic Search
  • Context Assembly
  • Prompt Orchestration
  • Tool Calling
  • Conversation Memory
  • Context Compression
  • Security Policies
  • LLM Interaction
  • Response Generation
  • Source Citation

Prompt Orchestration is one of the most important components of the solution. It dynamically constructs the final prompt sent to the LLM by combining retrieved documents, source code, database metadata, operational information, previous conversation context and task-specific instructions.

 

Applications

Once deployed, the platform provides several capabilities:

  • AI Technical Assistant
  • Incident Analyzer
  • Code Explorer
  • Dependency Explorer
  • Business Process Explorer
  • Documentation Generator
  • Change Impact Analyzer
  • Refactoring Assistant

 

Live Operational Connections

To keep the knowledge continuously updated, the platform should maintain live connections with operational systems such as:

  • Source Code Repository
  • Relational Database metadata
  • Logging Platform
  • Monitoring Platform
  • CI/CD Pipeline
  • Issue Tracking System
  • Documentation Repository

These connectors ensure that the assistant continuously learns from new deployments, incidents and software evolution.

 

Information Required by the AI Platform

The effectiveness of the assistant depends directly on the quality and completeness of the information supplied during the knowledge acquisition process.

Source Code

The complete Java application should be indexed, including:

  • Controllers
  • Services
  • Repositories
  • Entities
  • DTOs
  • Mappers
  • Validation rules
  • Batch jobs
  • Scheduled tasks
  • Security configuration
  • Integration services
  • Unit and integration tests

The objective is to understand not only individual classes but also the relationships between components.

 

Relational Database

The AI should analyse the complete logical database model:

  • Tables
  • Columns
  • Relationships
  • Primary Keys
  • Foreign Keys
  • Constraints
  • Views
  • Functions
  • Stored Procedures
  • Triggers
  • Indexes

This information allows the assistant to reconstruct the application's data model.

 

User Interface

Business knowledge is often better represented by the application's user interface than by the source code itself.

Useful information includes:

  • Screen captures
  • Navigation flows
  • Field descriptions
  • User manuals
  • Functional specifications

This enables the AI to relate technical implementation with actual business operations.

 

Documentation

Every available document should be incorporated:

  • Functional Specifications
  • Technical Specifications
  • Architecture Documents
  • Deployment Guides
  • API Documentation
  • Integration Specifications
  • Existing Diagrams

 

Operational Knowledge

Production experience represents one of the most valuable knowledge sources.

The platform should ingest:

  • Historical incidents
  • Root Cause Analysis reports
  • Previous fixes
  • Frequently executed SQL queries
  • Production logs
  • Stack traces
  • Monitoring alerts

Over time, operational knowledge becomes part of the assistant's expertise.


Phase 1 Architecture Preparation

The first phase focuses on designing the AI ecosystem.

Typical activities include:

  • Selecting the on-premise LLM
  • Selecting the Vector Database
  • Designing the RAG architecture
  • Designing the AI Orchestration Layer
  • Defining metadata models
  • Defining security policies
  • Identifying information sources
  • Designing update mechanisms
  • Defining governance policies

At this stage no application knowledge has yet been generated. The objective is to prepare the platform.

 

Phase 2 Knowledge Acquisition

Once the infrastructure is available, the platform begins collecting information from every available source.

Inputs include:

  • Source Code
  • Relational Database metadata
  • Documentation
  • User Interfaces
  • Configuration Files
  • Architecture Diagrams
  • Logs
  • Monitoring Information
  • Incident History

Each source is parsed, divided into semantic chunks, enriched with metadata and stored inside the knowledge repository.

This phase creates the knowledge foundation upon which the entire system will operate.

 

Phase 3 RAG Construction

After acquiring the available knowledge, the Retrieval-Augmented Generation platform is built.

Activities include:

  • Embedding generation
  • Vector indexing
  • Metadata indexing
  • Knowledge graph construction
  • Hybrid search configuration
  • Semantic retrieval optimisation
  • Prompt template design
  • Prompt orchestration workflows
  • Tool integration
  • Context ranking
  • Response validation

The resulting RAG platform allows the AI to retrieve accurate and relevant technical information before generating any response.

Unlike traditional LLM usage, every answer is grounded in the organization's own technical knowledge.

 

Phase 4  Automated Reverse Engineering

Once sufficient knowledge has been collected, the AI begins reconstructing the application's architecture automatically.

This is where the platform starts generating new technical knowledge rather than simply indexing existing information.

Technical Models

The AI can automatically generate:

  • Application Architecture
  • Layer Dependencies
  • Component Relationships
  • Service Catalogue
  • API Catalogue
  • Package Dependencies
  • Deployment Architecture

 

Data Models

The assistant reconstructs:

  • Logical Data Models
  • Entity Relationships
  • Data Flows
  • Database Dependencies

 

Business Models

Business knowledge extracted from code and documentation includes:

  • Business Entities
  • Business Rules
  • Validation Rules
  • Decision Logic
  • Domain Concepts

 

Workflow Reconstruction

The AI can automatically identify workflows such as:

  • Order Creation
  • Order Validation
  • Inventory Allocation
  • Stock Reservation
  • Inventory Updates
  • Shipment
  • Order Cancellation
  • Inventory Adjustments

 

Dependency Analysis

The assistant identifies:

  • Cross-module dependencies
  • Component interactions
  • Service dependencies
  • Data dependencies
  • External integrations

 

Change Impact Analysis

The platform can estimate:

  • Components affected by a modification
  • Potential regressions
  • Downstream impacts
  • Risk areas

 

Automatic Documentation

The reverse engineering process continuously generates documentation such as:

  • Architecture Documentation
  • Technical Documentation
  • API Documentation
  • Data Model Documentation
  • Business Process Documentation
  • Sequence Diagrams
  • Component Diagrams
  • Workflow Diagrams
  • Dependency Diagrams

At this point, the organization has effectively rebuilt the technical knowledge of the application, even if much of the original documentation has been lost.

 

Operational AI-Assisted Maintenance

Once the platform has completed the reverse engineering process, it becomes an intelligent assistant for day-to-day software maintenance.

Incident Analysis

The assistant can:

  • Analyse production incidents
  • Explain stack traces
  • Suggest root causes
  • Recommend diagnostic SQL queries
  • Identify affected components

 

Business Understanding

Engineers can ask questions such as:

  • How is inventory updated?
  • Which validations occur before an order is confirmed?
  • What happens when an order is cancelled?
  • Which services update stock levels?

 

Code Understanding

The assistant explains:

  • Business logic
  • Algorithms
  • Class responsibilities
  • Method interactions
  • Design patterns
  • Technical decisions

 

Change Impact Analysis

Before modifying the application, the AI can identify:

  • Impacted services
  • Impacted database objects
  • Affected APIs
  • Dependencies
  • Potential regressions

 

Safe Change Assistance

Rather than changing the software automatically, the assistant proposes improvements for human review.

Typical outputs include:

  • Implementation suggestions
  • Refactoring opportunities
  • SQL improvements
  • Performance recommendations
  • Security improvements
  • Regression test suggestions

Human validation remains mandatory before any deployment.

 

Documentation Assistance

The platform continuously generates and updates:

  • Technical documentation
  • Business documentation
  • Architecture diagrams
  • API descriptions
  • Operational guides

 

Continuous Knowledge Evolution

The platform is that it never stops learning.

Every software release enriches the knowledge repository through:

  • New source code
  • Database schema evolution
  • New documentation
  • Production incidents
  • Monitoring information
  • User feedback
  • Deployment history

The RAG repository continuously evolves, making the assistant increasingly accurate and valuable over time.

Instead of becoming obsolete, the knowledge platform grows alongside the application.


Conclusion

The proposed solution should not be viewed as an attempt to automate software maintenance or replace experienced software engineers.

Its real purpose is to preserve the technical knowledge accumulated over many years and provide software architects and maintenance engineers with an intelligent assistant capable of understanding both the software architecture and the underlying business processes in a fraction of the time traditionally required.

By combining modern Large Language Models, Retrieval-Augmented Generation, automated reverse engineering, semantic search and continuously evolving operational knowledge, organizations can transform legacy enterprise applications into self-documented systems supported by AI.

The result is not autonomous maintenance, but AI-Augmented Software Engineering: significantly faster onboarding of new maintainers, quicker incident resolution, safer software evolution, continuously updated documentation, deeper understanding of business processes, and ultimately, a substantial reduction in the long-term maintenance cost and risk of enterprise applications.

It is worth noting that several commercial solutions already pursue a similar vision of AI-assisted software engineering. Among them, Sourcegraph Cody is probably one of the closest, providing semantic code search, repository-wide understanding, Retrieval-Augmented Generation (RAG), and AI-assisted development over large codebases. However, the approach proposed in this article aims to go beyond source code analysis. It envisions a unified software knowledge platform that combines source code, relational database metadata, user interface descriptions, technical and functional documentation, operational logs, monitoring data, incident history, and deployment information into a continuously evolving knowledge repository. The objective is not only to assist developers while writing code, but to reconstruct the application's technical architecture, business processes, workflows, dependencies, and operational knowledge, creating a long-term AI companion for software maintenance, onboarding, architectural understanding, and safer system evolution.

Monday, June 1, 2026

Why the Future of Document Management Is Not Another ECM

For more than two decades, Enterprise Content Management (ECM) systems have been the foundation of corporate document management.

They have proven their value by providing secure storage, version control, workflows, permissions, audit trails, records management and compliance capabilities. Platforms such as SharePoint, Oracle WebCenter Content (WCC), OpenText, Alfresco and many others continue to manage millions of business-critical documents every day.

The problem is not that these systems have failed.

The problem is that the expectations of users have changed.

The New Challenge: Knowledge Discovery

Traditionally, document management was focused on storing and retrieving documents.

Users knew what they were looking for:

  • Find a document.

  • Locate the latest version.

  • Check who approved it.

  • Review its history.

Artificial Intelligence introduces a completely different expectation.

Users now want answers rather than documents.

They want to ask questions such as:

  • Which systems are impacted by this change?

  • What requirements are affected by this decision?

  • Which documents contain conflicting information?

  • What risks are associated with this component?

  • What knowledge already exists about this topic?

Answering these questions requires much more than document storage and keyword search.

It requires understanding relationships, context, dependencies and knowledge hidden inside thousands of documents.

This is something that traditional ECM platforms were never designed to do.

Why Simply Adding AI Is Not Enough

Many current initiatives attempt to connect existing ECM repositories to chatbots or generic AI assistants.

While this approach can produce impressive demonstrations, it often struggles in real-world environments.

The reason is simple.

Documents are distributed across multiple repositories, each with its own structure, permissions, metadata models and lifecycle rules.

A typical organization may store information in:

  • SharePoint

  • Oracle WebCenter Content

  • OpenText

  • Alfresco

  • File shares

  • PLM systems

  • Email archives

Each repository contains valuable knowledge, but none of them provides a complete picture.

Adding an AI assistant on top of a single repository does not solve the fragmentation problem.

The Reality: Nobody Wants to Replace Their ECM

This is where many proposed solutions become unrealistic.

Large organizations have invested years, sometimes decades, building their document management ecosystems.

They have:

  • Millions of documents.

  • Complex workflows.

  • Regulatory requirements.

  • Existing integrations.

  • Thousands of users.

No organization wants to hear:

"Replace your entire ECM infrastructure."

The cost, risk and disruption would be enormous.

In practice, most organizations will continue using their existing ECM platforms for many years.

And that is perfectly reasonable.

A Different Approach

Instead of replacing existing ECM systems, a more realistic strategy is to preserve them as the official systems of record.

Their role remains essential:

  • Document storage.

  • Version control.

  • Security.

  • Compliance.

  • Auditability.

  • Records management.

What changes is what sits above them.

Rather than building yet another ECM, organizations should introduce a new layer capable of connecting all existing repositories and transforming the information they contain into usable knowledge.

This new layer becomes the bridge between traditional document management and modern AI capabilities.


Preserving Existing ECM Investments

One of the key advantages of this approach is that it does not require organizations to replace their existing ECM platforms.

Systems such as SharePoint, Oracle WebCenter Content, OpenText or other repositories can continue operating exactly as they do today. Documents remain in their current locations, managed by the same permissions, workflows and governance processes already in place.

The new platform simply connects to these repositories through their existing APIs and content services.

Rather than moving documents, the platform discovers and indexes knowledge from multiple sources while leaving the original content untouched.

This significantly reduces risk, cost and implementation effort.

Starting Small

Perhaps the most important aspect of this architecture is that it does not require a massive transformation project from day one.

A first version could start with a very simple user interface:

  • A unified search screen.
  • An AI-powered chat interface.
  • Basic document discovery across repositories.
  • Simple knowledge exploration capabilities.

Behind the scenes, the platform would connect to existing ECMs and gradually build a unified knowledge layer.

Additional capabilities such as impact analysis, knowledge graphs, intelligent agents and advanced workflows could then be introduced incrementally.

This allows organizations to begin generating value immediately while evolving towards a much more powerful Document Intelligence Platform over time.

Instead of replacing existing systems, the new platform enhances them, bringing AI-powered knowledge discovery to repositories that organizations already trust and depend on.

From Multiple Repositories to a Unified Knowledge Layer

The objective is not to migrate documents.

The objective is to unify access to knowledge.

In this model, existing repositories remain untouched:

  • SharePoint continues managing SharePoint documents.

  • Oracle WCC continues managing WCC content.

  • Other repositories continue performing their current role.

Above them, a new intelligence layer is introduced.

This layer is responsible for:

  • Discovering information across repositories.

  • Extracting knowledge from documents.

  • Understanding relationships between information.

  • Building semantic indexes.

  • Applying security and permission rules.

  • Providing AI-powered search and analysis capabilities.

Users no longer need to know where information is stored.

They interact with a unified knowledge platform capable of accessing multiple repositories behind the scenes.

The Next Generation of Document Management

The future is unlikely to be another standalone ECM platform.

Instead, it is likely to be a new architecture where existing ECMs continue acting as trusted repositories while a new intelligence layer provides AI-driven discovery, analysis and knowledge management capabilities.

This approach protects previous investments, reduces migration risks and enables organizations to benefit from Artificial Intelligence without disrupting their existing document management landscape.

The challenge is no longer managing documents.

The challenge is understanding and exploiting the knowledge contained within them.

And that requires a new architectural foundation.

Proposed Technology Stack

At this stage, the objective is not to build every component from scratch. Instead, the platform should leverage proven technologies for each layer while focusing development efforts on the areas that create real business value.

For the user interface and application layer, a rapid development platform such as Jmix provides an excellent starting point. It enables the fast creation of enterprise-grade user interfaces, administration screens, workflows, dashboards and security models, allowing the project to deliver working functionality in a relatively short timeframe.

For the intelligence layer, modern open-source AI technologies provide the foundation for knowledge discovery and semantic search. Large Language Models (LLMs) can be deployed locally to ensure full control over data, while vector databases can be used to support semantic retrieval and knowledge exploration.

The platform can therefore be structured around several complementary layers:

  • Application Layer: User interfaces, administration, workflows and dashboards.
  • Knowledge Layer: Unified access to information coming from multiple repositories.
  • AI Layer: Semantic search, document understanding, summarization and knowledge discovery.
  • Security and Governance Layer: Permissions, auditability, classification and compliance.
  • Repository Layer: Existing ECMs and other systems of record.

A possible implementation could combine:

  • Jmix for rapid enterprise application development.
  • Spring Boot for backend services and integration.
  • Local LLMs for secure AI processing.
  • Vector databases for semantic search and retrieval.
  • Knowledge graph technologies for relationship discovery and impact analysis.
  • Existing ECM platforms as trusted systems of record.

The key point is that none of these technologies replace the current repositories. Instead, they work together to create a new layer of intelligence capable of discovering, connecting and exploiting the knowledge already stored across the organization.

This approach allows the project to start with a simple and practical first release while providing a clear path towards a much more advanced Document Intelligence Platform in future iterations.

 


Recommended Technologies

AI Layer

The AI Layer should be built using proven open-source technologies rather than developed from scratch. The objective is to leverage mature components and focus development efforts on the capabilities that create real business value.

Capability

Recommended Technologies

Purpose

Large Language Models (LLMs)

Llama, Mistral, Mixtral, Qwen

Document understanding, summarization, question answering and reasoning

LLM Runtime / Inference Engine

vLLM, Ollama, Text Generation Inference (TGI)

Efficient execution of AI models, either for production or development environments

Vector Database

Qdrant, pgvector, Milvus

Semantic search, embeddings storage and Retrieval-Augmented Generation (RAG)

Knowledge Graph

Neo4j, ArangoDB

Relationship discovery, dependency mapping and impact analysis

Agent Framework

Dify, LangGraph

AI workflows, intelligent assistants and agent orchestration

For an initial release, a combination of Jmix, Spring Boot, Dify, vLLM and Qdrant could provide a fast path towards a working platform with AI-powered search, document chat and semantic retrieval capabilities.

As the platform evolves, more advanced technologies such as LangGraph and Neo4j can be introduced to support sophisticated agent workflows, relationship analysis and knowledge discovery scenarios.

The key point is that these technologies are not the product itself. They are building blocks. The real value lies in the intelligence layer built on top of them, including knowledge federation, document classification, metadata extraction, relationship discovery, impact analysis and security-aware access to information across multiple repositories.

Knowledge Layer

The Knowledge Layer is the core of the platform. Its role is to connect existing repositories, normalize their information, apply security rules and transform distributed documents into usable knowledge.

Capability

Recommended Technologies

Purpose

Repository Connectors

REST APIs, CMIS, Microsoft Graph API, Oracle WCC APIs

Connect to SharePoint, WCC, ECMs and other repositories without replacing them

Integration and Synchronization

Spring Boot, Apache Camel, Kafka, RabbitMQ

Move metadata, events and document updates between repositories and the intelligence platform

Metadata Federation

PostgreSQL, Oracle, Elasticsearch / OpenSearch

Normalize metadata from different systems into a common searchable model

Security Federation

LDAP, Active Directory, Keycloak, OAuth2 / OpenID Connect

Preserve permissions, roles and identity rules across repositories

Knowledge Processing

Apache Tika, OCR engines, custom extraction services

Extract text, structure and relevant information from documents

Search Indexing

OpenSearch, Elasticsearch

Support fast keyword search, filtering and faceted navigation

This layer is especially important because it prevents the platform from becoming just another isolated repository. Instead, it acts as a bridge between existing ECMs and the new AI capabilities.

The key idea is that documents can remain in their current systems of record, while the Knowledge Layer creates a unified view of their metadata, content, permissions and relationships.

In other words, the Knowledge Layer is what allows the platform to connect SharePoint, WCC, other ECMs, PLM systems, databases and file shares under a common intelligence model.

Presentation Layer

The Presentation Layer is responsible for providing simple, powerful and user-friendly access to the platform. It should allow users to search, explore, analyse and interact with knowledge without needing to know where documents are physically stored.

Capability

Recommended Technologies

Purpose

Enterprise UI Development

Jmix, Vaadin

Rapid development of enterprise screens, administration panels, dashboards and workflows

Advanced Web Interfaces

React, Angular, Vue

Build richer user experiences such as AI search, document exploration and visual analysis

AI Chat Interface

Dify UI, custom React UI, Vaadin components

Provide conversational access to documents and knowledge

Dashboards and Analytics

Jmix dashboards, Apache Superset, Grafana

Display document metrics, usage, quality indicators and knowledge insights

Graph Visualization

Neo4j Bloom, Cytoscape.js, React Flow, D3.js

Visualize relationships between documents, systems, requirements and risks

Document Viewer

PDF.js, OnlyOffice, Collabora

Preview documents, compare versions and display extracted knowledge next to the original content


For the first version, Jmix remains a very appropriate option because it allows the team to build useful enterprise interfaces quickly, including search screens, metadata views, administration panels and basic workflows.

Later, more advanced interfaces can be introduced using React or specialized visualization libraries for AI chat, relationship graphs, impact analysis and document intelligence dashboards.

The main goal of this layer is to make the platform feel simple for the user, even if the underlying architecture connects many repositories, AI services and knowledge sources behind the scenes.

Example of a Unified Search and Knowledge Discovery Experience

The Unified Search and Knowledge Discovery screen is the primary entry point to the platform. It combines traditional document search with AI-powered knowledge discovery, allowing users to search across multiple repositories through a single interface.

Users can perform natural language queries, apply advanced filters and explore documents stored in different systems such as SharePoint, WCC, engineering repositories and other ECM platforms.

Search results are enriched with AI-generated summaries, metadata, relationships and impact information, helping users understand not only which documents exist, but also how they are connected to other systems, requirements, reports and business processes.

The interface is organized into three main areas:

  • Advanced Filters Panel: Allows users to refine searches by repository, document type, classification, status, program, owner, date range and tags.
  • Results Panel: Displays matching documents together with summaries, metadata and related knowledge.
  • Knowledge Panel: Provides additional context, including relationships, dependencies, impact analysis and AI-generated insights.

This approach transforms document retrieval into knowledge discovery, enabling users to find information faster, understand its context and assess its potential impact across the organization.