aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authormrichomme <morgan.richomme@orange.com>2020-12-08 15:46:33 +0100
committermrichomme <morgan.richomme@orange.com>2020-12-08 17:44:45 +0100
commit7fee1429abc2927e3174e5bfc0bccda17a433822 (patch)
tree5584e51a14e4219a071476cc1463164adba4796e
parent2c4c61213c9f2f8fba83a2244fa7afe0a6feb481 (diff)
Update stability test page
Include results for - Daily Guilin CI page - 24 HC test - 6 days basic_vm test Issue-ID: INT-1776 Signed-off-by: mrichomme <morgan.richomme@orange.com> Change-Id: I219b87f1275e2ff48a2d4ffeecd6dc3a4fbbae11
-rw-r--r--docs/files/csv/stability_basic_vm.csv11
-rw-r--r--docs/files/csv/stability_cluster_metric_cpu.csv2
-rw-r--r--docs/files/csv/stability_cluster_metric_memory.csv2
-rw-r--r--docs/files/csv/stability_cluster_metric_network.csv2
-rw-r--r--docs/files/csv/stability_top10_memory.csv11
-rw-r--r--docs/files/s3p/basic_vm_duration.pngbin0 -> 36201 bytes
-rw-r--r--docs/files/s3p/basic_vm_duration_histo.pngbin0 -> 29154 bytes
-rw-r--r--docs/files/s3p/guilin_daily_healthcheck.pngbin0 -> 20733 bytes
-rw-r--r--docs/files/s3p/guilin_daily_infrastructure_healthcheck.pngbin0 -> 19414 bytes
-rw-r--r--docs/files/s3p/guilin_daily_security.pngbin0 -> 10143 bytes
-rw-r--r--docs/files/s3p/guilin_daily_smoke.pngbin0 -> 17422 bytes
-rw-r--r--docs/files/s3p/stability_sdnc_memory.pngbin0 -> 22416 bytes
-rw-r--r--docs/integration-s3p.rst260
-rw-r--r--docs/onap-integration-ci.rst (renamed from docs/repo-onap-integration-ci.rst)0
14 files changed, 282 insertions, 6 deletions
diff --git a/docs/files/csv/stability_basic_vm.csv b/docs/files/csv/stability_basic_vm.csv
new file mode 100644
index 000000000..5ff8d0807
--- /dev/null
+++ b/docs/files/csv/stability_basic_vm.csv
@@ -0,0 +1,11 @@
+Basic_vm metric;Value
+Number of PASS occurences;557
+Number of Raw FAIL Occurences;174
+Raw Success rate; 76%
+Corrected success rate; 86%
+Average duration of the test;549s (9m9s)
+Min duration;188s (3m8s)
+Max duration;2161 (36m1s)
+Median duration;271s (4m34s)
+% of Duration < 282s; 50%
+% of duration > 660s; 29%
diff --git a/docs/files/csv/stability_cluster_metric_cpu.csv b/docs/files/csv/stability_cluster_metric_cpu.csv
new file mode 100644
index 000000000..9259086ef
--- /dev/null
+++ b/docs/files/csv/stability_cluster_metric_cpu.csv
@@ -0,0 +1,2 @@
+Namespace;Pods;Workloads;Memory Usage;CPU Requests;CPU Requests %;CPU Limits;CPU Limits %
+onap;242;181;10.31;79.93;13%;247.2;4%
diff --git a/docs/files/csv/stability_cluster_metric_memory.csv b/docs/files/csv/stability_cluster_metric_memory.csv
new file mode 100644
index 000000000..40c6fa566
--- /dev/null
+++ b/docs/files/csv/stability_cluster_metric_memory.csv
@@ -0,0 +1,2 @@
+Namespace;Pods;Workloads;Memory Usage;Memory Requests;Memory Requests %;Memory Limits;Memory Limits %
+onap;242;181;160.70 GiB;193.13 GiB;83.21%;493.09 GiB;32.59%
diff --git a/docs/files/csv/stability_cluster_metric_network.csv b/docs/files/csv/stability_cluster_metric_network.csv
new file mode 100644
index 000000000..46f02a7f7
--- /dev/null
+++ b/docs/files/csv/stability_cluster_metric_network.csv
@@ -0,0 +1,2 @@
+Namespace;Current Receive Bandwidth;Current Transmit Bandwidth;Rate of Received Packets;Rate of Transmitted Packets;Rate of Received Packets Dropped;Rate of Transmitted Packets Dropped
+onap; 1.03 MBs; 1.07 MBs;5.08 kpps;5.02 kpps;0 pps;0 pps
diff --git a/docs/files/csv/stability_top10_memory.csv b/docs/files/csv/stability_top10_memory.csv
new file mode 100644
index 000000000..127d717ae
--- /dev/null
+++ b/docs/files/csv/stability_top10_memory.csv
@@ -0,0 +1,11 @@
+Pod;Memory Usage;Memory Requests;Memory Requests %;Memory Limits;Memory Limits %
+onap-sdnc-0;5.56 GiB;2 Gi;278%;4 GiB;139%
+onap-portal-cassandra;5.5 GiB;2.8 GiB;160%;3.75 GiB;146%
+onap-appc;5.28 GiB;2 GiB;264%;4 GiB; 132%
+onap-cassandra-1;4.7 GiB;2.5 GiB;188%;4 GiB;117%
+onap-cassandra-2;4.7 GiB;2.5 GiB;188%;4 GiB;117%
+onap-cassandra-3;4.7 GiB;2.5 GiB;188%;4 GiB;117%
+onap-dcae-cloudify-manager;4.7 GiB;2 GiB;233%;4 GiB;115%
+onap-clamp-dash-es;3.57 GiB; 2.5 GiB;143%;4 GiB;89%
+onap-so-bpmn-infra;3.51 GiB;1 GiB; 351%;4 GiB;88%
+onap-awx;3.21 GiB;6 GiB;53%;;
diff --git a/docs/files/s3p/basic_vm_duration.png b/docs/files/s3p/basic_vm_duration.png
new file mode 100644
index 000000000..71e522681
--- /dev/null
+++ b/docs/files/s3p/basic_vm_duration.png
Binary files differ
diff --git a/docs/files/s3p/basic_vm_duration_histo.png b/docs/files/s3p/basic_vm_duration_histo.png
new file mode 100644
index 000000000..d201d3b81
--- /dev/null
+++ b/docs/files/s3p/basic_vm_duration_histo.png
Binary files differ
diff --git a/docs/files/s3p/guilin_daily_healthcheck.png b/docs/files/s3p/guilin_daily_healthcheck.png
new file mode 100644
index 000000000..34a58ebda
--- /dev/null
+++ b/docs/files/s3p/guilin_daily_healthcheck.png
Binary files differ
diff --git a/docs/files/s3p/guilin_daily_infrastructure_healthcheck.png b/docs/files/s3p/guilin_daily_infrastructure_healthcheck.png
new file mode 100644
index 000000000..be24c02ce
--- /dev/null
+++ b/docs/files/s3p/guilin_daily_infrastructure_healthcheck.png
Binary files differ
diff --git a/docs/files/s3p/guilin_daily_security.png b/docs/files/s3p/guilin_daily_security.png
new file mode 100644
index 000000000..1d3d518c0
--- /dev/null
+++ b/docs/files/s3p/guilin_daily_security.png
Binary files differ
diff --git a/docs/files/s3p/guilin_daily_smoke.png b/docs/files/s3p/guilin_daily_smoke.png
new file mode 100644
index 000000000..5200c575e
--- /dev/null
+++ b/docs/files/s3p/guilin_daily_smoke.png
Binary files differ
diff --git a/docs/files/s3p/stability_sdnc_memory.png b/docs/files/s3p/stability_sdnc_memory.png
new file mode 100644
index 000000000..c381077f5
--- /dev/null
+++ b/docs/files/s3p/stability_sdnc_memory.png
Binary files differ
diff --git a/docs/integration-s3p.rst b/docs/integration-s3p.rst
index e1220a002..70294f0d6 100644
--- a/docs/integration-s3p.rst
+++ b/docs/integration-s3p.rst
@@ -5,14 +5,262 @@
ONAP Maturity Testing Notes
---------------------------
-Stability
-=========
+.. important::
+ The Release stability has been evaluated by:
-TODO
-A stability test is planned on the final Guilin dockers.
+ - The Daily Guilin CI/CD chain
+ - A simple 24h healthcheck verification
+ - A 7 days stability test
+
+.. note:
+ The scope of these tests remains limited and does not provide a full set of
+ KPIs to determinate the limits and the dimensioning of the ONAP solution.
CI results
==========
-A daily Guilin CI chain has been created after RC0.
-Due to policy changes in dockerhub (new quotas), the chain has been unstable.
+As usual, a daily CI chain dedicated to the release is created after RC0.
+A Daily Guilin has been created on the 18th of November 2020.
+
+Unfortunately several technical issues disturbed the chain:
+
+- Due to policy changes in DockerHub (new quotas), the installation chain was
+ not stable as the quota limit was rapidly reached. As a consequence the
+ installation was incomplete and most of the tests were failing. The problem
+ was fixed by the subscription of unlimitted account on DockerHub.
+- Due to an upgrade of the Git Jenkins plugin done by LF IT, the synchronization
+ of the miror of the xtesting repository, used daily to generate the test suite
+ dockers was corrupted. The dockers were built daily from Jenkins but with an
+ id from the 25th of September. As a consequence the tests reported lots of
+ failure because they were corresponding to Frankfurt tests without the
+ adaptations done for Guilin. The problem was fixed temporarily by moving to
+ GitLab.com Docker registry then by the downgrade of the plugin executed by LF
+ IT during Thanksgiving break.
+
+The first week of the Daily Guilin results are therefore not really usable.
+Most of the results from the `daily Guilin result portal
+<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/>`_
+are not trustable and may be misleading.
+The results became more stable from the the 6th of December.
+
+The graphs given hereafter are based on the data collected until the 8th of
+december. This Daily chain will be maintained during the Honolulu development
+cycle (Daily Master) and can be audited at any time. In case of reproducible
+errors, the integration team will open JIRA on Guilin.
+
+Several public Daily Guilin chains have been put in place, one in Orange
+(Helm v2) and one in DT (Helm v3). DT results are pushed in the test DB and can
+be observed in
+`ONAP Testing DT lab result page <http://testresults.opnfv.org/onap-integration/dt/dt.html>`_.
+
+Infrastructure Healthcheck Tests
+................................
+
+These tests deal with the Kubernetes/Helm tests on ONAP cluster.
+The global expected criteria is **50%** when installing with Helm 2.
+The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in
+kubernetes are expected to be PASS but two tests are expected to fail:
+
+- onap-helm (32/33 OK) due to the size of the SO helm chart (too big for Helm2).
+- nodeport_check_certs due to bad certificate issuers (Root CA certificate non
+ valid). In theory all the certificate shall be generated during the installation
+ and be valid for the 364 days after the installation. It is still not the case.
+ However, for the first time, no certificate was expired. Next certificates to
+ renew are:
+ - Music (2021-02-03)
+ - VID (2021-03-17)
+ - Message-router-external (2021-03-25)
+ - CDS-UI (2021-02-18)
+ - AAI and AAI-SPARKY-BE (2021-03-17)
+
+.. image:: files/s3p/guilin_daily_infrastructure_healthcheck.png
+ :align: center
+
+Healthcheck Tests
+.................
+
+These tests are the traditionnal robot healthcheck tests and additional tests
+dealing with a single component.
+
+The expectation is **100% OK**.
+
+.. image:: files/s3p/guilin_daily_healthcheck.png
+ :align: center
+
+Smoke Tests
+...........
+
+These tests are end to end tests.
+See the :ref:`the Integration Test page <integration-tests>` for details.
+
+The expectation is **100% OK**.
+
+.. figure:: files/s3p/guilin_daily_smoke.png
+ :align: center
+
+An error has been detected on the SDC when performing parallel tests.
+See `SDC-3366 <https://jira.onap.org/browse/SDC-3366>`_ for details.
+
+Security Tests
+..............
+
+These tests are tests dealing with security.
+See the :ref:`the Integration Test page <integration-tests>` for details.
+
+The expectation is **66% OK**. The criteria is met.
+
+It may even be above as 2 fail tests are almost correct:
+
+- the unlimited pod test is still fail due to only one pod: onap-ejbca.
+- the nonssl tests is FAIL due to so and os-vnfm adapter, which were supposed to
+ be managed with the ingress (not possible for this release) and got a waiver
+ in Frankfurt.
+
+.. figure:: files/s3p/guilin_daily_security.png
+ :align: center
+
+A simple 24h healthcheck verification
+=====================================
+
+This test consists in running the Healthcheck tests every 10 minutes during
+24h.
+
+The test was run from the 6th of december to the 7th of december.
+
+The success rate was 100%.
+
+The results are stored in the
+`test database <http://testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full>`_
+
+A 6 days stability test
+=======================
+
+This test consists on running the test basic_vm continuously during 1 week.
+
+We observe the cluster metrics as well as the evolution of the test duration.
+The test basic_vm is describe in :ref:`the Integration Test page <integration-tests>`.
+
+Within a long duration test context, the test will onboard a service once then
+instantiate this service multiple times. Before instantiating, it will
+systematically contact the SDC and the AAI to verify that the resources already
+exist. In this context the most impacted component is SO, which was delivered
+relatively late compared to the other components.
+
+Basic_vm test
+.............
+
+The basic_vm test consists in the different following steps:
+
+- [SDC] VendorOnboardStep: Onboard vendor in SDC.
+- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC.
+- [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC.
+- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file
+ in SDC.
+- [AAI] RegisterCloudRegionStep: Register cloud region.
+- [AAI] ComplexCreateStep: Create complex.
+- [AAI] LinkCloudRegionToComplexStep: Connect cloud region with complex.
+- [AAI] CustomerCreateStep: Create customer.
+- [AAI] CustomerServiceSubscriptionCreateStep: Create customer's service
+ subscription.
+- [AAI] ConnectServiceSubToCloudRegionStep: Connect service subscription with
+ cloud region.
+- [SO] YamlTemplateServiceAlaCarteInstantiateStep: Instantiate service described
+ in YAML using SO a'la carte method.
+- [SO] YamlTemplateVnfAlaCarteInstantiateStep: Instantiate vnf described in YAML
+ using SO a'la carte method.
+- [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module
+ described in YAML using SO a'la carte method.
+
+The test has been initiated on a weekly lab on the 2nd of december.
+The results provided hereafter correspond to the period from 2020-12-02 to
+2020-12-08.
+
+.. csv-table:: Basic_vm results
+ :file: ./files/csv/stability_basic_vm.csv
+ :widths: 70, 30
+ :delim: ;
+ :header-rows: 1
+
+.. note::
+
+ The corrected success rate excludes the FAIL results obtained during the SDNC
+ saturation phase.
+ The cause of the errors shall be analyzed more in details. The huge majority of
+ errors (79%) occurs on SO service creation, 18% on VNF creation and 3% on
+ module creation.
+
+.. important::
+ The test success rate is about 86%.
+ CPU consumption is low (see next section).
+ Memory consumption is high.
+
+ After ~ 24-48h, the test is systematically FAIL. The trace shows that the SDNC
+ is no more responding. This error required the manual restart of the SDNC.
+ It seems that the SDNC exceeds its limits set in OOM. The simple manual
+ restart (delete of the pod was enough, the test after the restart is PASS,
+ and keep most of the time PASS for the next 24-48h)
+
+We can observe the consequences of the manual restart of the SDNC on its memory
+graph as well as the memory threshold.
+
+.. figure:: files/s3p/stability_sdnc_memory.png
+ :align: center
+
+The duration of the test is increasing slowly over the week and can be described
+as follows:
+
+.. figure:: files/s3p/basic_vm_duration.png
+ :align: center
+
+If we consider the histogram, we can see the distribution of the duration.
+
+.. figure:: files/s3p/basic_vm_duration_histo.png
+ :align: center
+
+As a conclusion, the solution seems stable.
+
+The memory issue detected in the SDNC may be due to a bad sizing of the limits
+and requests in OOM but a problem of light memory leak cannot be exclude.
+The workaround consisting in restarting of the SDNC seems to fix the issue.
+The issue is tracked in `SDNC-1430 <https://jira.onap.org/browse/SDNC-1430>`_.
+Further study shall be done on this topic to consildate the detection of the
+root cause.
+
+Cluster metrics
+...............
+
+The Metrics of the ONAP cluster on this 6 days period are given by the
+following tables:
+
+.. csv-table:: CPU
+ :file: ./files/csv/stability_cluster_metric_cpu.csv
+ :widths: 20,10,10,10,10,10,10,10
+ :delim: ;
+ :header-rows: 1
+
+.. csv-table:: Memory
+ :file: ./files/csv/stability_cluster_metric_memory.csv
+ :widths: 20,10,10,10,10,10,10,10
+ :delim: ;
+ :header-rows: 1
+
+.. csv-table:: Network
+ :file: ./files/csv/stability_cluster_metric_network.csv
+ :widths: 10,15,15,15,15,15,15
+ :delim: ;
+ :header-rows: 1
+
+The Top Ten for Memory consumption is given in the table below:
+
+.. csv-table:: Memory
+ :file: ./files/csv/stability_top10_memory.csv
+ :widths: 20,15,15,20,15,15
+ :delim: ;
+ :header-rows: 1
+
+At least 9 components exceeds their Memory Requests. And 7 are over the Memory
+limits set in OOM: the 2 Opendaylight controllers and the cassandra Databases.
+
+As indicated CPU consumption is negligeable and not dimensioning.
+It shall be reconsider for use cases including extensive computation (loops,
+optimization algorithms).
diff --git a/docs/repo-onap-integration-ci.rst b/docs/onap-integration-ci.rst
index 150c82b40..150c82b40 100644
--- a/docs/repo-onap-integration-ci.rst
+++ b/docs/onap-integration-ci.rst