1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
.. This work is licensed under a Creative Commons Attribution 4.0
International License. http://creativecommons.org/licenses/by/4.0
.. _docs_Datalake_Handler_MS:
Architecture
------------
Background
~~~~~~~~~~
There are large amount of data flowing among ONAP components, mostly via DMaaP and Web Services.
For example, all events/feed collected by DCAE collectors go through DMaaP.
DMaaP is backed by Kafka, which is a system for Publish-Subscribe,
where data is not meant to be permanent and gets deleted after certain retention period.
Kafka is not a database, means that data there is not for query.
Though some components may store processed result into their local databases, most of the raw data will eventually lost.
We should provide a systematic way to store these raw data, and even the processed result,
which will serve as the source for data analytics and machine learning, providing insight to the network operation.
Relations with Other ONAP Components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The architecture below depicts the DataLake MS as a part of ONAP. Only the relevant interactions and components are shown.
.. image:: ./arch.PNG
Note that not all data storage systems in the picture are supported. In R6, the following storage are supported:
- MongoDB
- Couchbase
- Elasticsearch and Kibana
- HDFS
Depending on demands, new systems may be added to the supported list. In the following we use the term database for the storage,
even though HDFS is a file system (but with simple settings, it can be treats as a database, e.g. Hive.)
Note that once the data is stored in databases, other ONAP components and systems will directly query data from the databases,
without interacting with DataLake Handler.
Description
~~~~~~~~~~~
DataLate Handler's main function is to monitor and persist data flow through DMaaP. The databases are outside of ONAP scope,
since the data is expected to be huge, and a database may be a complicated cluster consisting of thousand of nodes.
Admin UI
~~~~~~~~
A system administrator uses DataLake Admin UI to:
- Configure external database connections, such as host, port, login.
- Configure which Topics to monitor, which databases to store the data for each Topic.
- Pre-configured 3rd Party Tools dashboards and templates.
This UI tool is used to manage all the Dayalake settings stored in MariaDB. Here is the database schema:
.. image:: ./dbschema.PNG
Feeder
~~~~~~
Architecture
.. image:: ./feeder-arch.PNG
Features
- Read data directly from Kafka for performance.
- Support for pluggable databases. To add a new database, we only need to implement its corrosponding service.
- Support REST API for inter-component communications. Besides managing DatAlake settings in MariaDB,
Admin UI also use this API to start/stop Feeder, query Feeder status and statistics.
- Use MariaDB to store settings.
- Support data processing features. Before persisting data, data can be massaged in Feeder.
Currently two features are implemented: Correlate Cleared Message (in org.onap.datalake.feeder.service.db.ElasticsearchService)
and Flatten JSON Array (org.onap.datalake.feeder.service.StoreService).
- Connection to Kafka and DBs are secured
Links
~~~~~
- DataLake Development Environment Setup https://wiki.onap.org/display/DW/DataLake+Development+Environment+Setup
- Source Code https://gerrit.onap.org/r/gitweb?p=dcaegen2/services.git;a=tree;f=components/datalake-handler;hb=HEAD
|