summaryrefslogtreecommitdiffstats
path: root/vnfs/DAaaS/README.md
blob: 302cdd6e294af2940d9cce9b1489c1ac2f4087cf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
# Distributed Analytics Framework


## Pre-requisites
| Required   | Version |
|------------|---------|
| Kubernetes | 1.12.3+ |
| Docker CE  | 18.09+  |
| Helm       | >=2.12.1 and <=2.13.1 |
## Download Framework
```bash
git clone https://github.com/onap/demo.git
DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
```

## Install Rook-Ceph for Persistent Storage
Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
```bash
cd $DA_WORKING_DIR/00-init/rook-ceph
helm install -n rook . -f values.yaml --namespace=rook-ceph-system
```
Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.

```bash
$ kubectl get pods -n rook-ceph-system
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-agent-9wszf                1/1     Running   0          121s
rook-ceph-agent-xnbt8                1/1     Running   0          121s
rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
rook-discover-bvj65                  1/1     Running   0          133s
rook-discover-nbfrp                  1/1     Running   0          133s
```
```bash
$ kubectl -n rook-ceph get pod
NAME                                   READY   STATUS      RESTARTS   AGE
rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
rook-ceph-mon-c-57545897fc-j576h       1/1     Running     0          85s
rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
```

#### Troubleshooting Rook-Ceph installation

In case your machine had rook previously installed successfully or unsuccessfully
and you are attempting a fresh installation of rook operator, you may face some issues.
Lets help you with that.

* First check if there are some rook CRDs existing :
```
kubectl get crds | grep rook
```
If this return results like :
```
otc@otconap7 /var/lib/rook $  kc get crds | grep rook
cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
volumes.rook.io                     2019-07-19T18:19:05Z
```
then you should delete these previously existing rook based CRDs by generating a delete 
manifest file by these commands and then deleting those files:
```
helm template -n rook . -f values.yaml > ~/delete.yaml
kc delete -f ~/delete.yaml
```

After this, delete the below directory in all the nodes.
```
sudo rm -rf /var/lib/rook/
```
Now, again attempt : 
```
helm install -n rook . -f values.yaml --namespace=rook-ceph-system
```

## Install Operator package
### Build docker images
#### collectd-operator
```bash
cd $DA_WORKING_DIR/../microservices

## Note: The image tag and respository in the Collectd-operator helm charts needs to match the IMAGE_NAME
IMAGE_NAME=dcr.cluster.local:32644/collectd-operator:latest
./build_image.sh collectd-operator $IMAGE_NAME
```
#### visualization-operator
```bash
cd $DA_WORKING_DIR/../microservices/visualization-operator

## Note: The image tag and respository in the Visualization-operator helm charts needs to match the IMAGE_NAME
IMAGE_NAME=dcr.cluster.local:32644/visualization-operator:latest
./build/build_image.sh $IMAGE_NAME
```

### Install the Operator Package
```bash
cd $DA_WORKING_DIR/operator
helm install -n operator . -f values.yaml --namespace=operator
```
Check for the status of the pods in operator namespace. Check if Prometheus operator pods are in Ready state.
```bash
kubectl get pods -n operator
NAME                                                      READY   STATUS    RESTARTS
m3db-operator-0                                           1/1     Running   0
op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
```
#### Troubleshooting Operator installation
Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.

1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state. 

2. Delete all the resources and CRDs associated with operator package.
```bash
#NOTE: Use the same release name and namespace as in installation of operator package in the previous section
cd $DA_WORKING_DIR/operator
helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
cd ../
kubectl delete -f delete_operator.yaml
```
## Install Collection package
Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
```bash
Default (For custom collectd skip this section)
=======
cd $DA_WORKING_DIR/collection
helm install -n cp . -f values.yaml --namespace=edge1

Custom Collectd
===============
1. Build the custom collectd image
2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
3. Push the image to docker registry using the command
4. docker push ${COLLECTD_IMAGE_NAME}
5. Edit the values.yaml and change the image repository and tag using 
   COLLECTD_IMAGE_NAME appropriately.
6. Place the collectd.conf in 
   $DA_WORKING_DIR/collection/charts/collectd/resources

7. cd $DA_WORKING_DIR/collection
8. helm install -n cp . -f values.yaml --namespace=edge1
```

#### Verify Collection package
* Check if all pods are up in edge1 namespace
* Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
```
$ kubectl get pods -n edge1
NAME                                      READY   STATUS    RESTARTS   AGE
cp-cadvisor-8rk2b                       1/1     Running   0          15s
cp-cadvisor-nsjr6                       1/1     Running   0          15s
cp-collectd-h5krd                       1/1     Running   0          23s
cp-collectd-jc9m2                       1/1     Running   0          23s
cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s

$ kubectl get svc -n edge1
NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)  
cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
prometheus-operated             ClusterIP   None           <none>        9090/TCP
```
#### Configure Collectd Plugins
1. Using the sample [collectdglobal.yaml](microservices/collectd-operator/examples/collectd/collectdglobal.yaml), Configure the CollectdGlobal CR
2. If there are additional Types.db files to update, Copy the additional types.db files to resources folder. 
3. Create a ConfigMap to load the types.db and update the configMap with name of the ConfigMap created.
4. Create and configure the required CollectdPlugin CRs. Use these samples as a reference [cpu_collectdplugin_cr.yaml](microservices/collectd-operator/examples/collectd/cpu_collectdplugin_cr.yaml), [prometheus_collectdplugin_cr.yaml](microservices/collectd-operator/examples/collectd/prometheus_collectdplugin_cr.yaml).
4. Use the same namespace where the collection package was installed.
5. Assuming it is edge1, create the config resources that are applicable. Apply the following commands in the same order.
```yaml
# Note: 
## 1. Create Configmap is optional and required only if additional types.db file needs to be mounted.
## 2. Add/Remove --from-file accordingly. Use the correct file name based on the context.
kubectl create configmap typesdb-configmap --from-file ./resource/[FILE_NAME1] --from-file ./resource/[FILE_NAME2]
kubectl create -f edge1 collectdglobal.yaml
kubectl create -f edge1 [PLUGIN_NAME1]_collectdplugin_cr.yaml
kubectl create -f edge1 [PLUGIN_NAME2]_collectdplugin_cr.yaml
kubectl create -f edge1 [PLUGIN_NAME3]_collectdplugin_cr.yaml
...
```
#### Configure Grafana Datasources
Using the sample [prometheus_grafanadatasource_cr.yaml](microservices/visualization-operator/examples/grafana/prometheus_grafanadatasource_cr.yaml), Configure the GrafanaDataSource CR by running the command below
```yaml
kubectl create -f [DATASOURCE_NAME1]_grafanadatasource_cr.yaml
kubectl create -f [DATASOURCE_NAME2]_grafanadatasource_cr.yaml
...
```

## Install Minio Model repository
* Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
```bash
cd $DA_WORKING_DIR/minio

Edit the values.yaml to set the credentials to access the minio UI.
Default values are
accessKey: "onapdaas"
secretKey: "onapsecretdaas"

helm install -n minio . -f values.yaml --namespace=edge1
```

## Install Messaging platform

We have currently support strimzi based kafka operator.
Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
Use the below command :
```
helm install . -f values.yaml  --name sko --namespace=test
```

NOTE: Make changes in the values.yaml if required.

Once the strimzi operator ready, you shall get a pod like :

```
strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
```

Once this done, install the kafka package like any other helm charts you have.
Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
```
helm install --name kafka-cluster charts/kafka/
```

Once this done, you should have the following pods up and running.

```
kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
kafka-cluster-kafka-0                           2/2     Running   0          48m
kafka-cluster-kafka-1                           2/2     Running   0          48m
kafka-cluster-kafka-2                           2/2     Running   0          48m
kafka-cluster-zookeeper-0                       2/2     Running   0          49m
kafka-cluster-zookeeper-1                       2/2     Running   0          49m
kafka-cluster-zookeeper-2                       2/2     Running   0          49m
```

You should have the following services when do a ```kubectl get svc```

```
kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
```
#### Testing messaging 

You can test your kafka brokers by creating a simple producer and consumer.

Producer : 
```
kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
 ```
 Consumer :
 ```

kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
```

## Install Training Package

#### Install M3DB (Time series Data lake)
##### Pre-requisites
1.  kubernetes cluster with atleast 3 nodes
2.  Etcd operator, M3DB operator
3.  Node labelled with zone and region.

```bash
## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))

kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1

kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
```
```bash
cd $DA_WORKING_DIR/training-core/charts/m3db
helm install -n m3db . -f values.yaml --namespace training
```
```
$ kubectl get pods -n training
NAME                   READY   STATUS    RESTARTS   AGE
m3db-cluster-rep0-0    1/1     Running   0          103s
m3db-cluster-rep1-0    1/1     Running   0          83s
m3db-cluster-rep1-0    1/1     Running   0          62s
m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
m3db-etcd-lfs96hngz6   1/1     Running   0          67s
m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
```

##### Configure remote write from Prometheus to M3DB
```bash
cd $DA_WORKING_DIR/day2_configs/prometheus/
```
```yaml
cat << EOF > add_m3db_remote.yaml
spec:
  remoteWrite:
  - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
    writeRelabelConfigs:
      - targetLabel: metrics_storage
        replacement: m3db_remote
EOF
```
```bash
kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
```
Verify the prometheus GUI to see if the m3db remote write is enabled.