Z-ENG: Kubernetes application monitoring and pinging
An application running in a Kubernetes cluster usually consists of several components and services. The Kubernetes health check feature is able to periodically ping internal components and restart them in the event of a failure. However, it does not notify us of the error. There are several external services available that ping our application from the outside and notify us in case of an error (e.g. Freshping). The problem with this, however, is that in order to ping a system with many services, each service must be made available from outside the cluster, which raises security concerns.
The task is to create a Go or ASP.NET Core application that runs within the cluster, pings our internal services from inside, and publishes the status to the outside. Through the configurability of the application, it also allows us to group our internal services and publish aggregate availability information.
Basic knowledge of Kubernetes is essential to solve this task. You are not required to have much experience; you can learn further during the work, but the topic is not suitable for getting to know Kubernetes.
If you are unsure about this, please contact me before applying.
The application can be written in Go or using ASP.NET Core. The application runs containerized within the cluster and based on the configuration, it pings the specified internal services with a given frequency and makes this status available on a simple web interface. Internal pinging can be configured similarly to Kubernetes health checks (e.g. interval, timeout, failure rate, ...).
The application also has a public http endpoint where the status of each internal service can be queried from the outside. The external, independent ping service calls this endpoint and identifies in the URL the internal service whose status it is interested in.
Internal endpoints can be grouped by configuration: for example, you can specify that internal services A, B, and C all fall into the "backend" category, and if the status of the "backend" service is queried, it is available when both A, B, and C are all available. This way, only aggregated information is returned to the external, independent ping service (and thus, for example, the internal structure of the system is not published).
Further functionality in aggregation is deciding how many instances of a horizontally scaled component should be available. If service X is running 3 instances, the application is considered healthy if at least 2 instances are available and respond to the internal ping. In this case, although there is a transient error in the application, no error is reported to the external, independent ping service because the system as a whole is still operational. These thresholds are also controlled by configuration.
Goals to reach until the end of the semester
Depending on the course level (project topic / BSc thesis / MSc diploma project), we will dteremine the objectives by considering the following items and planning for one or two semesters of work.
Minimum requirements (i.e., necessary for a passing grade)
- Go / ASP.NET Core application works and pings internal services
- The status of all internal services can be queried via the app
- Configuration is file-based
- The service runs in a Docker container
- Periodicity is adjustable
Expected requirements (for grade 4)
- In addition to the periodicity, timeout, success rate, etc. are also configurable
- The availability of services can be aggregated
- The endpoints to be pinged are automatically detected by the system using the Kubernetes API at startup, the configuration is taken from annotations
- The status of all internal services displayed in a simple web interface
- The application is publicly available on GitHub
- GitHub Actions CI pipeline builds the Docker container and publishes it (Docker Hub or GitHub Container Registry)
Requirements for an excellent grade
- Pinges not only the individual services, but also the pods behind them (this is required for the next point)
- Support for a "minimum number of N or M%" rule for horizontally scaled applications
- The endpoints to be pinged are constantly monitored via the Kubernetes API, and when a new service / pod is added or changed, it is also included in the pinged list.
- Helm chart is created for the application
Please reach out to me before applying to the topic. Find me on Teams or via email and please explain why you are interested in this topic.
Basic knowledge of Kubernetes