summaryrefslogtreecommitdiffstats
path: root/docs/sections/services/ves-hv/healthcheck-and-monitoring.rst
blob: 8437180dac4a1b548e96e6c48f37d88d93fe6951 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
.. This work is licensed under a Creative Commons Attribution 4.0 International License.
.. http://creativecommons.org/licenses/by/4.0

.. _healthcheck_and_monitoring:

Healthcheck and Monitoring
==========================

Healthcheck
-----------
Inside HV-VES docker container runs a small HTTP service for healthcheck. Port for healthchecks can be configured
at deployment using ``--health-check-api-port`` command line option or via `VESHV_HEALTHCHECK_API_PORT` environment variable (for details see :ref:`deployment`).

This service exposes endpoint **GET /health/ready** which returns a **HTTP 200 OK** when HV-VES is healthy
and ready for connections. Otherwise it returns a **HTTP 503 Service Unavailable** message with a short reason of unhealthiness.


Monitoring
----------
HV-VES collector allows to collect metrics data at runtime. To serve this purpose HV-VES application exposes an endpoint **GET /monitoring/prometheus** 
which returns a **HTTP 200 OK** message with a specific data in its body. Returned data is in a format readable by Prometheus service.
Prometheus endpoint shares a port with healthchecks.

Metrics provided by HV-VES metrics:

+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
|           Name of metric                      |     Unit     |              Description                                                                 |
+===============================================+==============+==========================================================================================+
| hvves_clients_rejected_cause_total            |  cause/piece | number of rejected clients grouped by cause                                              |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_clients_rejected_total                  |     piece    | total number of rejected clients                                                         |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_connections_active                      |     piece    | number of currently active connections                                                   |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_connections_total                       |     piece    | total number of connections                                                              |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_data_received_bytes_total               |     bytes    | total number of received bytes                                                           |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_disconnections_total                    |     piece    | total number of disconnections                                                           |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_messages_dropped_cause_total            |  cause/piece | number of dropped messages grouped by cause                                              |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_messages_dropped_total                  |     piece    | total number of dropped messages                                                         |
+-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
| hvves_messages_latency_seconds_count          |     piece    | latency is a time between         |  counter for number of latency occurance             |
+-----------------------------------------------+--------------+ message.header.lastEpochMicrosec  +------------------------------------------------------+
| hvves_messages_latency_seconds_max            |    seconds   | and time when data has been sent  |  maximal observed latency                            |
+-----------------------------------------------+--------------+ from HV-VES to Kafka              +------------------------------------------------------+
| hvves_messages_latency_seconds_sum            |    seconds   |                                   |  sum of latency parameter from each message          |
+-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
| hvves_messages_processing_time_seconds_count  |     piece    | processing time is time meassured |  counter for number of processing time occurance     |
+-----------------------------------------------+--------------+ between decoding of WTP message   +------------------------------------------------------+
| hvves_messages_processing_time_seconds_max    |    seconds   | and time when data has been sent  |  maximal processing time                             |
+-----------------------------------------------+--------------+ From HV-VES to Kafka              +------------------------------------------------------+
| hvves_messages_processing_time_seconds_sum    |    seconds   |                                   |  sum of processing time from each message            |
+-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
| hvves_messages_received_payload_bytes_total   |     bytes    | total number of received payload bytes                                                   |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_messages_received_total                 |     piece    | total number of received messages                                                        |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_messages_sent_topic_total               |  topic/piece | number of sent messages grouped by topic                                                 |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
| hvves_messages_sent_total                     |     piece    | number of sent messages                                                                  |
+-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+

JVM metrics:

- jvm_buffer_memory_used_bytes
- jvm_classes_unloaded_total
- jvm_gc_memory_promoted_bytes_total
- jvm_buffer_total_capacity_bytes
- jvm_threads_live
- jvm_classes_loaded
- jvm_gc_memory_allocated_bytes_total
- jvm_threads_daemon
- jvm_buffer_count
- jvm_gc_pause_seconds_count
- jvm_gc_pause_seconds_sum
- jvm_gc_pause_seconds_max
- jvm_gc_max_data_size_bytes
- jvm_memory_committed_bytes
- jvm_gc_live_data_size_bytes
- jvm_memory_max_bytes
- jvm_memory_used_bytes
- jvm_threads_peak

Sample response for **GET monitoring/prometheus**:

.. literalinclude:: metrics_sample_response.txt