Governance and AI for Development and Maintenance Environments
Today, many organizations rely on specialized companies to provide full support for the Development and Maintenance (D+M) of their Information Systems. In this context, ensuring quality, availability and continuous evolution of IT systems is an ongoing challenge.
Our proposal is to apply the AI architecture introduced in our first post, as support to this outsourcing model, in order to enhance both operational efficiency and strategic governance.
Management first: TOGAF and ITIL as reference frameworks
Any digital transformation project must be aligned with the business and operational management methodologies already established in the organization. It is not only about adopting new technologies, but doing so in a way that strengthens the existing management framework.
Two reference frameworks stand out in this regard:
-
TOGAF, as the Enterprise Architecture framework, which structures business vision, data and technology architectures.
-
ITIL, as the IT Service Management framework, which defines operational best practices for handling incidents, problems, changes and continual improvement.
In our case, the approach should be top-down: starting with TOGAF’s vision and architecture phases, and landing on ITIL’s operational processes that ensure value delivery to the customer.
Where to focus in TOGAF and ITIL
Although both frameworks are broad, we can identify the most relevant aspects for controlling an AI project applied to D+M of information systems:
-
TOGAF
-
Phase D: Technology Architecture, where monitoring, observability and automation platforms are defined.
-
Phase C: Information Systems Architecture, concerning operational data and logs as inputs for AI.
-
-
ITIL
-
Incident Management, to ensure fast response to service interruptions.
-
Problem Management, to analyze root causes and prevent recurrence.
-
Event Management, to monitor systems and detect anomalies in real time.
-
Capacity and Availability Management, to anticipate needs and meet SLAs.
-
Continual Improvement, to measure and optimize outcomes.
-
These are the processes where the integration of operational AI can make a tangible difference.
AI architecture applied to operations: Prometheus, Grafana and ELK
The following diagram illustrates how Prometheus, Grafana and ELK act as the operational backbone of our AI architecture, linking the governance layers of TOGAF and ITIL with the advanced automation capabilities of AIOps.
Once the management framework is established, we can map it to the elements of our AI architecture that provide operational support. We have selected three well-established open-source components:
-
Prometheus:
-
Real-time monitoring of metrics.
-
Collects performance data from servers, applications and databases.
-
Enables threshold-based alerts and anomaly detection.
-
-
Grafana:
-
Visualization platform that integrates metrics and logs into unified dashboards.
-
Ideal for SLA tracking, capacity KPIs and continual improvement reporting.
-
Bridges communication between IT teams and business stakeholders.
-
-
ELK Stack (Elasticsearch, Logstash, Kibana):
-
Centralizes and structures logs from applications, databases and infrastructure.
-
Allows fast search and historical pattern analysis.
-
Facilitates incident investigation and problem management with full traceability.
-
Decision automation and support to D+M
The combination of these tools does not only provide visibility, but also automates IT operations:
-
Immediate detection of anomalies in logs and metrics.
-
Automatic alert generation in case of incidents.
-
Event correlation to identify root causes.
-
Dashboards to evaluate the impact of infrastructure changes.
-
Historical data for capacity planning and forecasting.
Together, they create an environment where operational decisions are driven by data and intelligent automation, aligned with ITIL and TOGAF governance.
Moving towards AIOps
This approach naturally leads us to the concept of AIOps (Artificial Intelligence for IT Operations), where AI does not only collect information but also analyzes, explains and automatically suggests actions.
Prometheus, Grafana and ELK provide the technical foundation upon which more advanced AI components (LLM, RAG) can be integrated, so that systems not only detect problems, but also interpret them and recommend solutions.
Conclusion
In Development and Maintenance of Information Systems, the key is not only having the best technology, but aligning it with management methodologies that ensure order, quality and value to the customer.
By integrating our AI architecture with TOGAF and ITIL, and supporting it with open-source tools like Prometheus, Grafana and ELK, we achieve a proactive, automated and continuously improving system.
This approach turns AI into a natural ally of D+M of information systems, strengthening enterprise and operational governance, and paving the way for a full adoption of AIOps in the future.