aboutsummaryrefslogtreecommitdiffstats
path: root/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
blob: fcef4fa17b4a811f79b5529f1b304b69a3c05fbc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
layout: global
title: HDFS on Kubernetes
---
# HDFS on Kubernetes
Repository holding helm charts for running Hadoop Distributed File System (HDFS)
on Kubernetes.

See [charts/README.md](charts/README.md) for how to run the charts.

See [tests/README.md](tests/README.md) for how to run integration tests for
HDFS on Kubernetes.


# Troubleshooting

In case some pods are in pending state, check by using kubectl describe command.
If describe shows :
```
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  7s (x20 over 66s)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
```

Then make sure you have the storage provisioner up and running.
In our case, its rook that we support.
So, rook should be up and be the default storage proviosner.

```
NAME                        PROVISIONER          AGE
rook-ceph-block (default)   ceph.rook.io/block   132m
```

Delete all the previous unbound PVCs like below :
```
NAME                           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-hdfs1-zookeeper-0         Pending                                                     108m
editdir-hdfs1-journalnode-0    Pending                                                     108m
metadatadir-hdfs1-namenode-0   Pending                                                     108m
```

```
kubectl delete pvc/data-hdfs1-zookeeper-0
kubectl delete pvc/editdir-hdfs1-journalnode-0
kubectl delete pvc/metadatadir-hdfs1-namenode-0 
```

#### If the dataNode restarts with the error:
```
19/07/19 21:22:55 FATAL datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to hdfs1-namenode-1.hdfs1-namenode.hdfs1.svc.cluster.local/XXX.YY.ZZ.KK:8020. Exiting. 
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
```

* SOLUTION:  Make sure that whatever host path you set for the dataNode is deleted and doesnt exist before you run the hdfs helm chart.
```
    - name: hdfs-data-0
      hostPath:
        path: /hdfs-data
```
In case you are reinstalling the HDFS, delete the host path : /hdfs-data 
before you proceed or else the above error shall come.