Adding few troubleshooting guidelines for HDFS.

Issue-ID: ONAPARC-391 Change-Id: Ic4cd3376aa668b46da1890a09aaf3f461a04e254 Signed-off-by: Rajamohan Raj <rajamohan.raj@intel.com>
author: Rajamohan Raj <rajamohan.raj@intel.com> 2019-07-19 22:32:58 +0000
committer: Rajamohan Raj <rajamohan.raj@intel.com> 2019-07-19 22:36:59 +0000
commit: 18909c10d00e4591f931475a1082edb209b66071 (patch)
tree: b37273cf9aa20e0ad4d32b55b465cd3757cb6d54 /vnfs/DAaaS/deploy/training-core
parent: 779fd6928d724de3731d36f92af04e904ed25693 (diff)
1 files changed, 55 insertions, 0 deletions
diff --git a/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md b/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
index ca694a19..fcef4fa1 100644
--- a/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
+++ b/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
@@ -10,3 +10,58 @@ See [charts/README.md](charts/README.md) for how to run the charts.
 
 See [tests/README.md](tests/README.md) for how to run integration tests for
 HDFS on Kubernetes.
+
+
+# Troubleshooting
+
+In case some pods are in pending state, check by using kubectl describe command.
+If describe shows :
+```
+  Type     Reason            Age                From               Message
+  ----     ------            ----               ----               -------
+  Warning  FailedScheduling  7s (x20 over 66s)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
+```
+
+Then make sure you have the storage provisioner up and running.
+In our case, its rook that we support.
+So, rook should be up and be the default storage proviosner.
+
+```
+NAME                        PROVISIONER          AGE
+rook-ceph-block (default)   ceph.rook.io/block   132m
+```
+
+Delete all the previous unbound PVCs like below :
+```
+NAME                           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
+data-hdfs1-zookeeper-0         Pending                                                     108m
+editdir-hdfs1-journalnode-0    Pending                                                     108m
+metadatadir-hdfs1-namenode-0   Pending                                                     108m
+```
+
+```
+kubectl delete pvc/data-hdfs1-zookeeper-0
+kubectl delete pvc/editdir-hdfs1-journalnode-0
+kubectl delete pvc/metadatadir-hdfs1-namenode-0 
+```
+
+#### If the dataNode restarts with the error:
+```
+19/07/19 21:22:55 FATAL datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to hdfs1-namenode-1.hdfs1-namenode.hdfs1.svc.cluster.local/XXX.YY.ZZ.KK:8020. Exiting. 
+java.io.IOException: All specified directories are failed to load.
+        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
+        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
+        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
+        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
+        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
+        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
+```
+
+* SOLUTION:  Make sure that whatever host path you set for the dataNode is deleted and doesnt exist before you run the hdfs helm chart.
+```
+    - name: hdfs-data-0
+      hostPath:
+        path: /hdfs-data
+```
+In case you are reinstalling the HDFS, delete the host path : /hdfs-data 
+before you proceed or else the above error shall come.
+\ No newline at end of file
author	Rajamohan Raj <rajamohan.raj@intel.com>	2019-07-19 22:32:58 +0000
committer	Rajamohan Raj <rajamohan.raj@intel.com>	2019-07-19 22:36:59 +0000
commit	18909c10d00e4591f931475a1082edb209b66071 (patch)
tree	b37273cf9aa20e0ad4d32b55b465cd3757cb6d54 /vnfs/DAaaS/deploy/training-core
parent	779fd6928d724de3731d36f92af04e904ed25693 (diff)