In March of this year, the theme of an influential online conference, DevOps fwdays’23, was DevOps practices and tools. The speakers were developers and engineers from SoftServe, Spotify, Luxoft, Snyk, Xebia | Xpirit, Solidify, Zencore, Mondoo, and others.
Mykyta Savin, DevOps Infrastructure Architect at P2H, delivered a presentation on how a small mistake can block production and what to do in such cases (“How we block production. Triangulate issue, fix and postmortem”).
We are sharing this case as it will come in handy for anyone having similar issues.
Brief product description
P2H has developed an e-government platform for a client from Saudi Arabia. The platform’s work is related to facilitating interaction with the labor market, and the product’s target audience is the country’s citizens and businesses.
The development has been ongoing for several years and is constantly changing and expanding with new services. The platform is based on an asynchronous architecture, which takes into account the idiosyncrasies of working with integration points in the Saudi Arabian government.
Tech Stack and processes
- Microservice architecture
- Front end: Vue.js, React.js
- Back end: Ruby, Ruby on Rails, Java, PHP
- Message broker: RabbitMQ
- Global cache: Elasticsearch
- Infrastructure: Docker
- Monitoring, observation, and tracing: Grafana, Grafana Loki, Grafana Tempo, Prometheus, OpenTelemetry, Vector
- Integrations: IBM APP Connect, IBM API Connect, Absher, Unifonic, Mada, SADAD, and more
The project is based on a microservice architecture. Over a hundred microservices are currently in production, most of which are written in Ruby. New microservices are being launched in Java. The Enterprise Service Bus (ESB) pattern and the RabbitMQ message broker were chosen to organize the project’s asynchronous nature.
The storage layer is built on Elasticsearch and PostgreSQL, and the infrastructure uses Docker, Docker Compose, and an internal provider from Saudi Arabia to meet the data locality requirements of the government regulator. Grafana Stack is used for monitoring, along with numerous integration points with various ministries and private institutions. RabbitMQ functions as a cluster of four nodes accessible through the prod-rabbit-new-lb load balancer.