aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authormrichomme <morgan.richomme@orange.com>2021-11-09 17:26:29 +0100
committerMorgan Richomme <morgan.richomme@orange.com>2021-11-16 12:01:33 +0000
commit7ff70854842a9c67e567d471dc05984fa6a22af8 (patch)
treeb88f6ba5a16bd3f03c27c2449a8d2df87279655e
parentb7abebe52794170cbc591616832e4cdbd3503701 (diff)
[S3P] Add S3P Istanbul documentation
This section includes the results of - CI tests - stability tests - resiliency tests Issue-ID: INT-1988 Signed-off-by: mrichomme <morgan.richomme@orange.com> Change-Id: I1ccaf28014e6e68022caf723796ffd166363f02b (cherry picked from commit 8bb34f0622352d0aaaa17e832a05af17eae79bed)
-rw-r--r--docs/files/csv/s3p-instantiation.csv6
-rw-r--r--docs/files/csv/s3p-sdc.csv6
-rw-r--r--docs/files/s3p/istanbul-dashboard.pngbin0 -> 60652 bytes
-rw-r--r--docs/files/s3p/istanbul_daily_healthcheck.pngbin0 -> 21941 bytes
-rw-r--r--docs/files/s3p/istanbul_daily_infrastructure_healthcheck.pngbin0 -> 21499 bytes
-rw-r--r--docs/files/s3p/istanbul_daily_security.pngbin0 -> 16609 bytes
-rw-r--r--docs/files/s3p/istanbul_daily_smoke.pngbin0 -> 21629 bytes
-rw-r--r--docs/files/s3p/istanbul_instantiation_stability_10.pngbin0 -> 90935 bytes
-rw-r--r--docs/files/s3p/istanbul_resiliency.pngbin0 -> 15880 bytes
-rw-r--r--docs/files/s3p/istanbul_sdc_stability.pngbin0 -> 75166 bytes
-rw-r--r--docs/integration-s3p.rst504
11 files changed, 172 insertions, 344 deletions
diff --git a/docs/files/csv/s3p-instantiation.csv b/docs/files/csv/s3p-instantiation.csv
new file mode 100644
index 000000000..6b3febd3d
--- /dev/null
+++ b/docs/files/csv/s3p-instantiation.csv
@@ -0,0 +1,6 @@
+Parameters;Istanbul;Honolulu
+Number of tests;1310;1410
+Global success rate;97%;96%
+Min duration;193s;81s
+Max duration;2128s;2000s
+mean duration;564s;530s \ No newline at end of file
diff --git a/docs/files/csv/s3p-sdc.csv b/docs/files/csv/s3p-sdc.csv
new file mode 100644
index 000000000..f89fef24a
--- /dev/null
+++ b/docs/files/csv/s3p-sdc.csv
@@ -0,0 +1,6 @@
+Parameters;Istanbul;Honolulu
+Number of tests;1085;715
+Global success rate;92%;93%
+Min duration;111s;80s
+Max duration;799s;1128s
+mean duration;366s;565s \ No newline at end of file
diff --git a/docs/files/s3p/istanbul-dashboard.png b/docs/files/s3p/istanbul-dashboard.png
new file mode 100644
index 000000000..f8bad42ad
--- /dev/null
+++ b/docs/files/s3p/istanbul-dashboard.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_daily_healthcheck.png b/docs/files/s3p/istanbul_daily_healthcheck.png
new file mode 100644
index 000000000..e1cf16ae6
--- /dev/null
+++ b/docs/files/s3p/istanbul_daily_healthcheck.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_daily_infrastructure_healthcheck.png b/docs/files/s3p/istanbul_daily_infrastructure_healthcheck.png
new file mode 100644
index 000000000..1e8877d0e
--- /dev/null
+++ b/docs/files/s3p/istanbul_daily_infrastructure_healthcheck.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_daily_security.png b/docs/files/s3p/istanbul_daily_security.png
new file mode 100644
index 000000000..605edb140
--- /dev/null
+++ b/docs/files/s3p/istanbul_daily_security.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_daily_smoke.png b/docs/files/s3p/istanbul_daily_smoke.png
new file mode 100644
index 000000000..cdeb999da
--- /dev/null
+++ b/docs/files/s3p/istanbul_daily_smoke.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_instantiation_stability_10.png b/docs/files/s3p/istanbul_instantiation_stability_10.png
new file mode 100644
index 000000000..73749572a
--- /dev/null
+++ b/docs/files/s3p/istanbul_instantiation_stability_10.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_resiliency.png b/docs/files/s3p/istanbul_resiliency.png
new file mode 100644
index 000000000..567a98c5c
--- /dev/null
+++ b/docs/files/s3p/istanbul_resiliency.png
Binary files differ
diff --git a/docs/files/s3p/istanbul_sdc_stability.png b/docs/files/s3p/istanbul_sdc_stability.png
new file mode 100644
index 000000000..67346cb0d
--- /dev/null
+++ b/docs/files/s3p/istanbul_sdc_stability.png
Binary files differ
diff --git a/docs/integration-s3p.rst b/docs/integration-s3p.rst
index b73a49318..b41a37323 100644
--- a/docs/integration-s3p.rst
+++ b/docs/integration-s3p.rst
@@ -20,10 +20,14 @@ CI results
----------
As usual, a daily CI chain dedicated to the release is created after RC0.
-A Honolulu chain has been created on the 6th of April 2021.
+An Istanbul chain has been created on the 5th of November 2021.
The daily results can be found in `LF daily results web site
-<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_honolulu/2021-04/>`_.
+<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_istanbul/>`_.
+
+.. image:: files/s3p/istanbul-dashboard.png
+ :align: center
+
Infrastructure Healthcheck Tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -31,8 +35,9 @@ Infrastructure Healthcheck Tests
These tests deal with the Kubernetes/Helm tests on ONAP cluster.
The global expected criteria is **75%**.
-The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in
-Kubernetes as well as the onap-helm tests are expected to be PASS.
+
+The onap-k8s and onap-k8s-teardown, providing a snapshop of the onap namespace
+in Kubernetes, as well as the onap-helm tests are expected to be PASS.
nodeport_check_certs test is expected to fail. Even tremendous progress have
been done in this area, some certificates (unmaintained, upstream or integration
@@ -40,7 +45,7 @@ robot pods) are still not correct due to bad certificate issuers (Root CA
certificate non valid) or extra long validity. Most of the certificates have
been installed using cert-manager and will be easily renewable.
-.. image:: files/s3p/honolulu_daily_infrastructure_healthcheck.png
+.. image:: files/s3p/istanbul_daily_infrastructure_healthcheck.png
:align: center
Healthcheck Tests
@@ -49,16 +54,9 @@ Healthcheck Tests
These tests are the traditionnal robot healthcheck tests and additional tests
dealing with a single component.
-Some tests (basic_onboard, basic_cds) may fail episodically due to the fact that
-the startup of the SDC is sometimes not fully completed.
-
-The same test is run as first step of smoke tests and is usually PASS.
-The mechanism to detect that all the components are fully operational may be
-improved, timer based solutions are not robust enough.
-
The expectation is **100% OK**.
-.. image:: files/s3p/honolulu_daily_healthcheck.png
+.. image:: files/s3p/istanbul_daily_healthcheck.png
:align: center
Smoke Tests
@@ -69,13 +67,13 @@ See the :ref:`the Integration Test page <integration-tests>` for details.
The expectation is **100% OK**.
-.. figure:: files/s3p/honolulu_daily_smoke.png
+.. figure:: files/s3p/istanbul_daily_smoke.png
:align: center
-An error has been detected on the SDNC preventing the basic_vm_macro to work.
-See `SDNC-1529 <https://jira.onap.org/browse/SDNC-1529/>`_ for details.
-We may also notice that SO timeouts occured more frequently than in Guilin.
-See `SO-3584 <https://jira.onap.org/browse/SO-3584>`_ for details.
+An error has been reported since Guilin (https://jira.onap.org/browse/SDC-3508) on
+a possible race condition in SDC preventing the completion of the certification in
+SDC and leading to onboarding errors.
+This error may occur in case of parallel processing.
Security Tests
~~~~~~~~~~~~~~
@@ -83,243 +81,126 @@ Security Tests
These tests are tests dealing with security.
See the :ref:`the Integration Test page <integration-tests>` for details.
-The expectation is **66% OK**. The criteria is met.
+Waivers have been granted on different projects for the different tests.
+The list of waivers can be found in
+https://git.onap.org/integration/seccom/tree/waivers?h=istanbul.
-It may even be above as 2 fail tests are almost correct:
+The expectation is **100% OK**. The criteria is met.
-- The unlimited pod test is still fail due testing pod (DCAE-tca).
-- The nonssl tests is FAIL due to so and so-etsi-sol003-adapter, which were
- supposed to be managed with the ingress (not possible for this release) and
- got a waiver in Frankfurt. The pods cds-blueprints-processor-http and aws-web
- are used for tests.
-
-.. figure:: files/s3p/honolulu_daily_security.png
+.. figure:: files/s3p/istanbul_daily_security.png
:align: center
Resiliency tests
----------------
The goal of the resiliency testing was to evaluate the capability of the
-Honolulu solution to survive a stop or restart of a Kubernetes control or
-worker node.
-
-Controller node resiliency
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-By default the ONAP solution is installed with 3 controllers for high
-availability. The test for controller resiliency can be described as follows:
-
-- Run tests: check that they are PASS
-- Stop a controller node: check that the node appears in NotReady state
-- Run tests: check that they are PASS
-
-2 tests were performed on the weekly honolulu lab. No problem was observed on
-controller shutdown, tests were still PASS with a stoped controller node.
-
-More details can be found in <https://jira.onap.org/browse/TEST-309>.
-
-Worker node resiliency
-~~~~~~~~~~~~~~~~~~~~~~
-
-In community weekly lab, the ONAP pods are distributed on 12 workers. The goal
-of the test was to evaluate the behavior of the pod on a worker restart
-(disaster scenario assuming that the node was moved accidentally from Ready to
-NotReady state).
-The original conditions of such tests may be different as the Kubernetes
-scheduler does not distribute the pods on the same worker from an installation
-to another.
-
-The test procedure can be described as follows:
-
-- Run tests: check that they are PASS (Healthcheck and basic_vm used)
-- Check that all the workers are in ready state
- ::
- $ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- compute01-onap-honolulu Ready <none> 18h v1.19.9
- compute02-onap-honolulu Ready <none> 18h v1.19.9
- compute03-onap-honolulu Ready <none> 18h v1.19.9
- compute04-onap-honolulu Ready <none> 18h v1.19.9
- compute05-onap-honolulu Ready <none> 18h v1.19.9
- compute06-onap-honolulu Ready <none> 18h v1.19.9
- compute07-onap-honolulu Ready <none> 18h v1.19.9
- compute08-onap-honolulu Ready <none> 18h v1.19.9
- compute09-onap-honolulu Ready <none> 18h v1.19.9
- compute10-onap-honolulu Ready <none> 18h v1.19.9
- compute11-onap-honolulu Ready <none> 18h v1.19.9
- compute12-onap-honolulu Ready <none> 18h v1.19.9
- control01-onap-honolulu Ready master 18h v1.19.9
- control02-onap-honolulu Ready master 18h v1.19.9
- control03-onap-honolulu Ready master 18h v1.19.9
-
-- Select a worker, list the impacted pods
- ::
- $ kubectl get pod -n onap --field-selector spec.nodeName=compute01-onap-honolulu
- NAME READY STATUS RESTARTS AGE
- onap-aaf-fs-7b6648db7f-shcn5 1/1 Running 1 22h
- onap-aaf-oauth-5896545fb7-x6grg 1/1 Running 1 22h
- onap-aaf-sms-quorumclient-2 1/1 Running 1 22h
- onap-aai-modelloader-86d95c994b-87tsh 2/2 Running 2 22h
- onap-aai-schema-service-75575cb488-7fxs4 2/2 Running 2 22h
- onap-appc-cdt-58cb4766b6-vl78q 1/1 Running 1 22h
- onap-appc-db-0 2/2 Running 4 22h
- onap-appc-dgbuilder-5bb94d46bd-h2gbs 1/1 Running 1 22h
- onap-awx-0 4/4 Running 4 22h
- onap-cassandra-1 1/1 Running 1 22h
- onap-cds-blueprints-processor-76f8b9b5c7-hb5bg 1/1 Running 1 22h
- onap-dmaap-dr-db-1 2/2 Running 5 22h
- onap-ejbca-6cbdb7d6dd-hmw6z 1/1 Running 1 22h
- onap-kube2msb-858f46f95c-jws4m 1/1 Running 1 22h
- onap-message-router-0 1/1 Running 1 22h
- onap-message-router-kafka-0 1/1 Running 1 22h
- onap-message-router-kafka-1 1/1 Running 1 22h
- onap-message-router-kafka-2 1/1 Running 1 22h
- onap-message-router-zookeeper-0 1/1 Running 1 22h
- onap-multicloud-794c6dffc8-bfwr8 2/2 Running 2 22h
- onap-multicloud-starlingx-58f6b86c55-mff89 3/3 Running 3 22h
- onap-multicloud-vio-584d556876-87lxn 2/2 Running 2 22h
- onap-music-cassandra-0 1/1 Running 1 22h
- onap-netbox-nginx-8667d6675d-vszhb 1/1 Running 2 22h
- onap-policy-api-6dbf8485d7-k7cpv 1/1 Running 1 22h
- onap-policy-clamp-be-6d77597477-4mffk 1/1 Running 1 22h
- onap-policy-pap-785bd79759-xxhvx 1/1 Running 1 22h
- onap-policy-xacml-pdp-7d8fd58d59-d4m7g 1/1 Running 6 22h
- onap-sdc-be-5f99c6c644-dcdz8 2/2 Running 2 22h
- onap-sdc-fe-7577d58fb5-kwxpj 2/2 Running 2 22h
- onap-sdc-wfd-fe-6997567759-gl9g6 2/2 Running 2 22h
- onap-sdnc-dgbuilder-564d6475fd-xwwrz 1/1 Running 1 22h
- onap-sdnrdb-master-0 1/1 Running 1 22h
- onap-so-admin-cockpit-6c5b44694-h4d2n 1/1 Running 1 21h
- onap-so-etsi-sol003-adapter-c9bf4464-pwn97 1/1 Running 1 21h
- onap-so-sdc-controller-6899b98b8b-hfgvc 2/2 Running 2 21h
- onap-vfc-mariadb-1 2/2 Running 4 21h
- onap-vfc-nslcm-6c67677546-xcvl2 2/2 Running 2 21h
- onap-vfc-vnflcm-78ff4d8778-sgtv6 2/2 Running 2 21h
- onap-vfc-vnfres-6c96f9ff5b-swq5z 2/2 Running 2 21h
-
-- Stop the worker (shutdown the machine for baremetal or the VM if you installed
- your Kubernetes on top of an OpenStack solution)
-- Wait for the pod eviction procedure completion (5 minutes)
- ::
- $ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- compute01-onap-honolulu NotReady <none> 18h v1.19.9
- compute02-onap-honolulu Ready <none> 18h v1.19.9
- compute03-onap-honolulu Ready <none> 18h v1.19.9
- compute04-onap-honolulu Ready <none> 18h v1.19.9
- compute05-onap-honolulu Ready <none> 18h v1.19.9
- compute06-onap-honolulu Ready <none> 18h v1.19.9
- compute07-onap-honolulu Ready <none> 18h v1.19.9
- compute08-onap-honolulu Ready <none> 18h v1.19.9
- compute09-onap-honolulu Ready <none> 18h v1.19.9
- compute10-onap-honolulu Ready <none> 18h v1.19.9
- compute11-onap-honolulu Ready <none> 18h v1.19.9
- compute12-onap-honolulu Ready <none> 18h v1.19.9
- control01-onap-honolulu Ready master 18h v1.19.9
- control02-onap-honolulu Ready master 18h v1.19.9
- control03-onap-honolulu Ready master 18h v1.19.9
-
-- Run the tests: check that they are PASS
-
-.. warning::
- In these conditions, **the tests will never be PASS**. In fact several components
- will remeain in INIT state.
- A procedure is required to ensure a clean restart.
-
-List the non running pods::
-
- $ kubectl get pods -n onap --field-selector status.phase!=Running | grep -v Completed
- NAME READY STATUS RESTARTS AGE
- onap-appc-dgbuilder-5bb94d46bd-sxmmc 0/1 Init:3/4 15 156m
- onap-cds-blueprints-processor-76f8b9b5c7-m7nmb 0/1 Init:1/3 0 156m
- onap-portal-app-595bd6cd95-bkswr 0/2 Init:0/4 84 23h
- onap-portal-db-config-6s75n 0/2 Error 0 23h
- onap-portal-db-config-7trzx 0/2 Error 0 23h
- onap-portal-db-config-jt2jl 0/2 Error 0 23h
- onap-portal-db-config-mjr5q 0/2 Error 0 23h
- onap-portal-db-config-qxvdt 0/2 Error 0 23h
- onap-portal-db-config-z8c5n 0/2 Error 0 23h
- onap-sdc-be-5f99c6c644-kplqx 0/2 Init:2/5 14 156
- onap-vfc-nslcm-6c67677546-86mmj 0/2 Init:0/1 15 156m
- onap-vfc-vnflcm-78ff4d8778-h968x 0/2 Init:0/1 15 156m
- onap-vfc-vnfres-6c96f9ff5b-kt9rz 0/2 Init:0/1 15 156m
-
-Some pods are not rescheduled (i.e. onap-awx-0 and onap-cassandra-1 above)
-because they are part of a statefulset. List the statefulset objects::
-
- $ kubectl get statefulsets.apps -n onap | grep -v "1/1" | grep -v "3/3"
- NAME READY AGE
- onap-aaf-sms-quorumclient 2/3 24h
- onap-appc-db 2/3 24h
- onap-awx 0/1 24h
- onap-cassandra 2/3 24h
- onap-dmaap-dr-db 2/3 24h
- onap-message-router 0/1 24h
- onap-message-router-kafka 0/3 24h
- onap-message-router-zookeeper 2/3 24h
- onap-music-cassandra 2/3 24h
- onap-sdnrdb-master 2/3 24h
- onap-vfc-mariadb 2/3 24h
-
-For the pods being part of the statefulset, a forced deleteion is required.
-As an example if we consider the statefulset onap-sdnrdb-master, we must follow
-the procedure::
-
- $ kubectl get pods -n onap -o wide |grep onap-sdnrdb-master
- onap-sdnrdb-master-0 1/1 Terminating 1 24h 10.42.3.92 node1
- onap-sdnrdb-master-1 1/1 Running 1 24h 10.42.1.122 node2
- onap-sdnrdb-master-2 1/1 Running 1 24h 10.42.2.134 node3
-
- $ kubectl delete -n onap pod onap-sdnrdb-master-0 --force
- warning: Immediate deletion does not wait for confirmation that the running
- resource has been terminated. The resource may continue to run on the cluster
- indefinitely.
- pod "onap-sdnrdb-master-0" force deleted
-
- $ kubectl get pods |grep onap-sdnrdb-master
- onap-sdnrdb-master-0 0/1 PodInitializing 0 11s
- onap-sdnrdb-master-1 1/1 Running 1 24h
- onap-sdnrdb-master-2 1/1 Running 1 24h
-
- $ kubectl get pods |grep onap-sdnrdb-master
- onap-sdnrdb-master-0 1/1 Running 0 43s
- onap-sdnrdb-master-1 1/1 Running 1 24h
- onap-sdnrdb-master-2 1/1 Running 1 24h
-
-Once all the statefulset are properly restarted, the other components shall
-continue their restart properly.
-Once the restart of the pods is completed, the tests are PASS.
+Istanbul solution to survive a stop or restart of a Kubernetes worker node.
+
+This test has been automated thanks to the
+Litmus chaos framework(https://litmuschaos.io/) and automated in the CI on the
+weekly chains.
+
+2 additional tests based on Litmus chaos scenario have been added but will be tuned
+in Jakarta.
+
+- node cpu hog (temporary increase of CPU on 1 kubernetes node)
+- node memory hog (temporary increase of Memory on 1 kubernetes node)
+
+The main test for Istanbul is node drain corresponding to the resiliency scenario
+previously managed manually.
+
+The system under test is defined in OOM.
+The resources are described in the table below:
+
+.. code-block:: shell
+
+ +-------------------------+-------+--------+--------+
+ | Name | vCPUs | Memory | Disk |
+ +-------------------------+-------+--------+--------+
+ | compute12-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute11-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute10-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute09-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute08-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute07-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute06-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute05-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute04-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute03-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute02-onap-istanbul | 16 | 24Go | 10 Go |
+ | compute01-onap-istanbul | 16 | 24Go | 10 Go |
+ | etcd03-onap-istanbul | 4 | 6Go | 10 Go |
+ | etcd02-onap-istanbul | 4 | 6Go | 10 Go |
+ | etcd01-onap-istanbul | 4 | 6Go | 10 Go |
+ | control03-onap-istanbul | 4 | 6Go | 10 Go |
+ | control02-onap-istanbul | 4 | 6Go | 10 Go |
+ | control01-onap-istanbul | 4 | 6Go | 10 Go |
+ +-------------------------+-------+--------+--------+
+
+
+The test sequence can be defined as follows:
+
+- Cordon a compute node (prevent any new scheduling)
+- Launch node drain chaos scenario, all the pods on the given compute node
+ are evicted
+
+Once all the pods have been evicted:
+
+- Uncordon the compute node
+- Replay a basic_vm test
+
+This test has been successfully executed.
+
+.. image:: files/s3p/istanbul_resiliency.png
+ :align: center
.. important::
- K8s node reboots/shutdown is showing some deficiencies in ONAP components in
- regard of their availability measured with HC results. Some pods may
- still fail to initialize after reboot/shutdown(pod rescheduled).
+ Please note that the chaos framework select one compute node (the first one by
+ default).
+ The distribution of the pods is random, on our target architecture about 15
+ pods are scheduled on each node. The chaos therefore affects only a limited
+ number of pods.
+
+For the Istanbul tests, the evicted pods (compute01) were:
+
- However cluster as a whole behaves as expected, pods are rescheduled after
- node shutdown (except pods being part of statefulset which need to be deleted
- forcibly - normal Kubernetes behavior)
+.. code-block:: shell
- On rebooted node, should its downtime not exceed eviction timeout, pods are
- restarted back after it is again available.
+ NAME READY STATUS RESTARTS AGE
+ onap-aaf-service-dbd8fc76b-vnmqv 1/1 Running 0 2d19h
+ onap-aai-graphadmin-5799bfc5bb-psfvs 2/2 Running 0 2d19h
+ onap-cassandra-1 1/1 Running 0 2d19h
+ onap-dcae-ves-collector-856fcb67bd-lb8sz 2/2 Running 0 2d19h
+ onap-dcaemod-distributor-api-85df84df49-zj9zn 1/1 Running 0 2d19h
+ onap-msb-consul-86975585d9-8nfs2 1/1 Running 0 2d19h
+ onap-multicloud-pike-88bb965f4-v2qc8 2/2 Running 0 2d19h
+ onap-netbox-nginx-5b9b57d885-hjv84 1/1 Running 0 2d19h
+ onap-portal-app-66d9f54446-sjhld 2/2 Running 0 2d19h
+ onap-sdnc-ueb-listener-5b6bb95c68-d24xr 1/1 Running 0 2d19h
+ onap-sdnc-web-8f5c9fbcc-2l8sp 1/1 Running 0 2d19h
+ onap-so-779655cb6b-9tzq4 2/2 Running 1 2d19h
+ onap-so-oof-adapter-54b5b99788-x7rlk 2/2 Running 0 2d19h
-Please see `Integration Resiliency page <https://jira.onap.org/browse/TEST-308>`_
-for details.
+In the future, it would be interesting to elaborate a resiliency testing strategy
+in order to check the eviction of all the critical components.
Stability tests
---------------
-Three stability tests have been performed in Honolulu:
+Stability tests have been performed on Istanbul release:
- SDC stability test
-- Simple instantiation test (basic_vm)
- Parallel instantiation test
+The results can be found in the weekly backend logs
+https://logs.onap.org/onap-integration/weekly/onap_weekly_pod4_istanbul.
+
SDC stability test
~~~~~~~~~~~~~~~~~~
In this test, we consider the basic_onboard automated test and we run 5
-simultaneous onboarding procedures in parallel during 72h.
+simultaneous onboarding procedures in parallel during 24h.
The basic_onboard test consists in the following steps:
@@ -329,13 +210,13 @@ The basic_onboard test consists in the following steps:
- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file
in SDC.
-The test has been initiated on the honolulu weekly lab on the 19th of April.
+The test has been initiated on the Istanbul weekly lab on the 14th of November.
As already observed in daily|weekly|gating chain, we got race conditions on
some tests (https://jira.onap.org/browse/INT-1918).
-The success rate is above 95% on the 100 first model upload and above 80%
-until we onboard more than 500 models.
+The success rate is expected to be above 95% on the 100 first model upload
+and above 80% until we onboard more than 500 models.
We may also notice that the function test_duration=f(time) increases
continuously. At the beginning the test takes about 200s, 24h later the same
@@ -345,31 +226,31 @@ explaining the linear decrease of the success rate.
The following graphs provides a good view of the SDC stability test.
-.. image:: files/s3p/honolulu_sdc_stability.png
+.. image:: files/s3p/istanbul_sdc_stability.png
:align: center
-.. important::
- SDC can support up to 100s models onboarding.
- The onbaording duration increases linearly with the number of onboarded
- models
- After a while, the SDC is no more usable.
- No major Cluster resource issues have been detected during the test. The
- memory consumption is however relatively high regarding the load.
-
-.. image:: files/s3p/honolulu_sdc_stability_resources.png
- :align: center
-
+.. csv-table:: S3P Onboarding stability results
+ :file: ./files/csv/s3p-sdc.csv
+ :widths: 60,20,20
+ :delim: ;
+ :header-rows: 1
-Simple stability test
-~~~~~~~~~~~~~~~~~~~~~
-
-This test consists on running the test basic_vm continuously during 72h.
-
-We observe the cluster metrics as well as the evolution of the test duration.
+.. important::
+ The onboarding duration increases linearly with the number of on-boarded
+ models, which is already reported and may be due to the fact that models
+ cannot be deleted. In fact the test client has to retrieve the list of
+ models, which is continuously increasing. No limit tests have been
+ performed.
+ However 1085 on-boarded models is already a vry high figure regarding the
+ possible ONAP usage.
+ Moreover the mean duration time is much lower in Istanbul.
+ It explains why it was possible to run 35% more tests within the same
+ time frame.
-The test basic_vm is described in :ref:`the Integration Test page <integration-tests>`.
+Parallel instantiations stability test
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The basic_vm test consists in the different following steps:
+The test is based on the single test (basic_vm) that can be described as follows:
- [SDC] VendorOnboardStep: Onboard vendor in SDC.
- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC.
@@ -391,102 +272,37 @@ The basic_vm test consists in the different following steps:
- [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module
described in YAML using SO a'la carte method.
-The test has been initiated on the Honolulu weekly lab on the 26th of April 2021.
-This test has been run after the test described in the next section.
-A first error occured after few hours (mariadbgalera), then the system
-automatically recovered for some hours before a full crash of the mariadb
-galera.
-
-::
-
- debian@control01-onap-honolulu:~$ kubectl get pod -n onap |grep mariadb-galera
- onap-mariadb-galera-0 1/2 CrashLoopBackOff 625 5d16h
- onap-mariadb-galera-1 1/2 CrashLoopBackOff 1134 5d16h
- onap-mariadb-galera-2 1/2 CrashLoopBackOff 407 5d16h
-
-
-It was unfortunately not possible to collect the root cause (logs of the first
-restart of onap-mariadb-galera-1).
-
-Community members reported that they already faced such issues and suggest to
-deploy a single maria instance instead of using MariaDB galera.
-Moreover, in Honolulu there were some changes in order to allign Camunda (SO)
-requirements for MariaDB galera..
-
-During the limited valid window, the success rate was about 78% (85% for the
-same test in Guilin).
-The duration of the test remain very variable as also already reported in Guilin
-(https://jira.onap.org/browse/SO-3419). The duration of the same test may vary
-from 500s to 2500s as illustrated in the following graph:
-
-.. image:: files/s3p/honolulu_so_stability_1_duration.png
- :align: center
-
-The changes in MariaDB galera seems to have introduced some issues leading to
-more unexpected timeouts.
-A troubleshooting campaign has been launched to evaluate possible evolutions in
-this area.
-
-Parallel instantiations stability test
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Still based on basic_vm, 5 instantiation attempts are done simultaneously on the
-ONAP solution during 48h.
+10 instantiation attempts are done simultaneously on the ONAP solution during 24h.
The results can be described as follows:
-.. image:: files/s3p/honolulu_so_stability_5.png
+.. image:: files/s3p/istanbul_instantiation_stability_10.png
:align: center
-For this test, we have to restart the SDNC once. The last failures are due to
-a certificate infrastructure issue and are independent from ONAP.
-
-Cluster metrics
-~~~~~~~~~~~~~~~
+.. csv-table:: S3P Instantiation stability results
+ :file: ./files/csv/s3p-instantiation.csv
+ :widths: 60,20,20
+ :delim: ;
+ :header-rows: 1
+
+The results are good with a success rate above 95%. After 24h more than 1300
+VNF have been created and deleted.
+
+As for SDC, we can observe a linear increase of the test duration. This issue
+has been reported since Guilin. For SDC as it is not possible to delete the
+models, it is possible to imagine that the duration increases due to the fact
+that the database of models continuously increases. Therefore the client has
+to retrieve an always bigger list of models.
+But for the instantiations, it is not the case as the references
+(module, VNF, service) are cleaned at the end of each test and all the tests
+use the same model. Then the duration of an instantiation test should be
+almost constant, which is not the case. Further investigations are needed.
.. important::
- No major cluster resource issues have been detected in the cluster metrics
-
-The metrics of the ONAP cluster have been recorded over the full week of
-stability tests:
-
-.. csv-table:: CPU
- :file: ./files/csv/stability_cluster_metric_cpu.csv
- :widths: 20,20,20,20,20
- :delim: ;
- :header-rows: 1
-
-.. image:: files/s3p/honolulu_weekly_cpu.png
- :align: center
-
-.. image:: files/s3p/honolulu_weekly_memory.png
- :align: center
-
-The Top Ten for CPU consumption is given in the table below:
-
-.. csv-table:: CPU
- :file: ./files/csv/stability_top10_cpu.csv
- :widths: 20,15,15,20,15,15
- :delim: ;
- :header-rows: 1
-
-CPU consumption is negligeable and not dimensioning. It shall be reconsider for
-use cases including extensive computation (loops, optimization algorithms).
-
-The Top Ten for Memory consumption is given in the table below:
-
-.. csv-table:: Memory
- :file: ./files/csv/stability_top10_memory.csv
- :widths: 20,15,15,20,15,15
- :delim: ;
- :header-rows: 1
-
-Without surprise, the Cassandra databases are using most of the memory.
-
-The Top Ten for Network consumption is given in the table below:
-
-.. csv-table:: Network
- :file: ./files/csv/stability_top10_net.csv
- :widths: 10,15,15,15,15,15,15
- :delim: ;
- :header-rows: 1
+ The test has been executed with the mariadb-galera replicaset set to 1
+ (3 by default). With this configuration the results during 24h are very
+ good. When set to 3, the error rate is higher and after some hours
+ most of the instantiation are failing.
+ However, even with a replicaset set to 1, a test on Master weekly chain
+ showed that the system is hitting another limit after about 35h
+ (https://jira.onap.org/browse/SO-3791).