aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--vnfs/DAaaS/README.md36
-rw-r--r--vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md55
2 files changed, 91 insertions, 0 deletions
diff --git a/vnfs/DAaaS/README.md b/vnfs/DAaaS/README.md
index 4b6fcf50..de701fd4 100644
--- a/vnfs/DAaaS/README.md
+++ b/vnfs/DAaaS/README.md
@@ -44,6 +44,42 @@ rook-ceph-osd-prepare-vx2rz 0/2 Completed 0 60s
rook-ceph-tools-5bd5cdb949-j68kk 1/1 Running 0 53s
```
+#### Troubleshooting Rook-Ceph installation
+
+In case your machine had rook previously installed successfully or unsuccessfully
+and you are attempting a fresh installation of rook operator, you may face some issues.
+Lets help you with that.
+
+* First check if there are some rook CRDs existing :
+```
+kubectl get crds | grep rook
+```
+If this return results like :
+```
+otc@otconap7 /var/lib/rook $ kc get crds | grep rook
+cephblockpools.ceph.rook.io 2019-07-19T18:19:05Z
+cephclusters.ceph.rook.io 2019-07-19T18:19:05Z
+cephfilesystems.ceph.rook.io 2019-07-19T18:19:05Z
+cephobjectstores.ceph.rook.io 2019-07-19T18:19:05Z
+cephobjectstoreusers.ceph.rook.io 2019-07-19T18:19:05Z
+volumes.rook.io 2019-07-19T18:19:05Z
+```
+then you should delete these previously existing rook based CRDs by generating a delete
+manifest file by these commands and then deleting those files:
+```
+helm template -n rook . -f values.yaml > ~/delete.yaml
+kc delete -f ~/delete.yaml
+```
+
+After this, delete the below directory in all the nodes.
+```
+sudo rm -rf /var/lib/rook/
+```
+Now, again attempt :
+```
+helm install -n rook . -f values.yaml --namespace=rook-ceph-system
+```
+
#### Install Operator package
```bash
cd $DA_WORKING_DIR/operator
diff --git a/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md b/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
index ca694a19..fcef4fa1 100644
--- a/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
+++ b/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md
@@ -10,3 +10,58 @@ See [charts/README.md](charts/README.md) for how to run the charts.
See [tests/README.md](tests/README.md) for how to run integration tests for
HDFS on Kubernetes.
+
+
+# Troubleshooting
+
+In case some pods are in pending state, check by using kubectl describe command.
+If describe shows :
+```
+ Type Reason Age From Message
+ ---- ------ ---- ---- -------
+ Warning FailedScheduling 7s (x20 over 66s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
+```
+
+Then make sure you have the storage provisioner up and running.
+In our case, its rook that we support.
+So, rook should be up and be the default storage proviosner.
+
+```
+NAME PROVISIONER AGE
+rook-ceph-block (default) ceph.rook.io/block 132m
+```
+
+Delete all the previous unbound PVCs like below :
+```
+NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
+data-hdfs1-zookeeper-0 Pending 108m
+editdir-hdfs1-journalnode-0 Pending 108m
+metadatadir-hdfs1-namenode-0 Pending 108m
+```
+
+```
+kubectl delete pvc/data-hdfs1-zookeeper-0
+kubectl delete pvc/editdir-hdfs1-journalnode-0
+kubectl delete pvc/metadatadir-hdfs1-namenode-0
+```
+
+#### If the dataNode restarts with the error:
+```
+19/07/19 21:22:55 FATAL datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to hdfs1-namenode-1.hdfs1-namenode.hdfs1.svc.cluster.local/XXX.YY.ZZ.KK:8020. Exiting.
+java.io.IOException: All specified directories are failed to load.
+ at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
+ at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
+ at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
+ at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
+ at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
+ at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
+```
+
+* SOLUTION: Make sure that whatever host path you set for the dataNode is deleted and doesnt exist before you run the hdfs helm chart.
+```
+ - name: hdfs-data-0
+ hostPath:
+ path: /hdfs-data
+```
+In case you are reinstalling the HDFS, delete the host path : /hdfs-data
+before you proceed or else the above error shall come. \ No newline at end of file