diff options
author | mrichomme <morgan.richomme@orange.com> | 2020-12-08 15:46:33 +0100 |
---|---|---|
committer | Morgan Richomme <morgan.richomme@orange.com> | 2020-12-09 08:31:33 +0000 |
commit | 3a44fec5e4cfc9baff5f423b092be51d509f31e0 (patch) | |
tree | 7df41b9bd1a19735ff24e86bd5c1cf51315af2b3 /docs | |
parent | 94604b88ba9558bc502b6af63b342dcb621760ff (diff) |
Update stability test page
Include results for
- Daily Guilin CI page
- 24 HC test
- 6 days basic_vm test
Issue-ID: INT-1776
Signed-off-by: mrichomme <morgan.richomme@orange.com>
Change-Id: I219b87f1275e2ff48a2d4ffeecd6dc3a4fbbae11
(cherry picked from commit 7fee1429abc2927e3174e5bfc0bccda17a433822)
Diffstat (limited to 'docs')
-rw-r--r-- | docs/files/csv/stability_basic_vm.csv | 11 | ||||
-rw-r--r-- | docs/files/csv/stability_cluster_metric_cpu.csv | 2 | ||||
-rw-r--r-- | docs/files/csv/stability_cluster_metric_memory.csv | 2 | ||||
-rw-r--r-- | docs/files/csv/stability_cluster_metric_network.csv | 2 | ||||
-rw-r--r-- | docs/files/csv/stability_top10_memory.csv | 11 | ||||
-rw-r--r-- | docs/files/s3p/basic_vm_duration.png | bin | 0 -> 36201 bytes | |||
-rw-r--r-- | docs/files/s3p/basic_vm_duration_histo.png | bin | 0 -> 29154 bytes | |||
-rw-r--r-- | docs/files/s3p/guilin_daily_healthcheck.png | bin | 0 -> 20733 bytes | |||
-rw-r--r-- | docs/files/s3p/guilin_daily_infrastructure_healthcheck.png | bin | 0 -> 19414 bytes | |||
-rw-r--r-- | docs/files/s3p/guilin_daily_security.png | bin | 0 -> 10143 bytes | |||
-rw-r--r-- | docs/files/s3p/guilin_daily_smoke.png | bin | 0 -> 17422 bytes | |||
-rw-r--r-- | docs/files/s3p/stability_sdnc_memory.png | bin | 0 -> 22416 bytes | |||
-rw-r--r-- | docs/integration-s3p.rst | 260 | ||||
-rw-r--r-- | docs/onap-integration-ci.rst (renamed from docs/repo-onap-integration-ci.rst) | 0 |
14 files changed, 282 insertions, 6 deletions
diff --git a/docs/files/csv/stability_basic_vm.csv b/docs/files/csv/stability_basic_vm.csv new file mode 100644 index 000000000..5ff8d0807 --- /dev/null +++ b/docs/files/csv/stability_basic_vm.csv @@ -0,0 +1,11 @@ +Basic_vm metric;Value +Number of PASS occurences;557 +Number of Raw FAIL Occurences;174 +Raw Success rate; 76% +Corrected success rate; 86% +Average duration of the test;549s (9m9s) +Min duration;188s (3m8s) +Max duration;2161 (36m1s) +Median duration;271s (4m34s) +% of Duration < 282s; 50% +% of duration > 660s; 29% diff --git a/docs/files/csv/stability_cluster_metric_cpu.csv b/docs/files/csv/stability_cluster_metric_cpu.csv new file mode 100644 index 000000000..9259086ef --- /dev/null +++ b/docs/files/csv/stability_cluster_metric_cpu.csv @@ -0,0 +1,2 @@ +Namespace;Pods;Workloads;Memory Usage;CPU Requests;CPU Requests %;CPU Limits;CPU Limits % +onap;242;181;10.31;79.93;13%;247.2;4% diff --git a/docs/files/csv/stability_cluster_metric_memory.csv b/docs/files/csv/stability_cluster_metric_memory.csv new file mode 100644 index 000000000..40c6fa566 --- /dev/null +++ b/docs/files/csv/stability_cluster_metric_memory.csv @@ -0,0 +1,2 @@ +Namespace;Pods;Workloads;Memory Usage;Memory Requests;Memory Requests %;Memory Limits;Memory Limits % +onap;242;181;160.70 GiB;193.13 GiB;83.21%;493.09 GiB;32.59% diff --git a/docs/files/csv/stability_cluster_metric_network.csv b/docs/files/csv/stability_cluster_metric_network.csv new file mode 100644 index 000000000..46f02a7f7 --- /dev/null +++ b/docs/files/csv/stability_cluster_metric_network.csv @@ -0,0 +1,2 @@ +Namespace;Current Receive Bandwidth;Current Transmit Bandwidth;Rate of Received Packets;Rate of Transmitted Packets;Rate of Received Packets Dropped;Rate of Transmitted Packets Dropped +onap; 1.03 MBs; 1.07 MBs;5.08 kpps;5.02 kpps;0 pps;0 pps diff --git a/docs/files/csv/stability_top10_memory.csv b/docs/files/csv/stability_top10_memory.csv new file mode 100644 index 000000000..127d717ae --- /dev/null +++ b/docs/files/csv/stability_top10_memory.csv @@ -0,0 +1,11 @@ +Pod;Memory Usage;Memory Requests;Memory Requests %;Memory Limits;Memory Limits % +onap-sdnc-0;5.56 GiB;2 Gi;278%;4 GiB;139% +onap-portal-cassandra;5.5 GiB;2.8 GiB;160%;3.75 GiB;146% +onap-appc;5.28 GiB;2 GiB;264%;4 GiB; 132% +onap-cassandra-1;4.7 GiB;2.5 GiB;188%;4 GiB;117% +onap-cassandra-2;4.7 GiB;2.5 GiB;188%;4 GiB;117% +onap-cassandra-3;4.7 GiB;2.5 GiB;188%;4 GiB;117% +onap-dcae-cloudify-manager;4.7 GiB;2 GiB;233%;4 GiB;115% +onap-clamp-dash-es;3.57 GiB; 2.5 GiB;143%;4 GiB;89% +onap-so-bpmn-infra;3.51 GiB;1 GiB; 351%;4 GiB;88% +onap-awx;3.21 GiB;6 GiB;53%;; diff --git a/docs/files/s3p/basic_vm_duration.png b/docs/files/s3p/basic_vm_duration.png Binary files differnew file mode 100644 index 000000000..71e522681 --- /dev/null +++ b/docs/files/s3p/basic_vm_duration.png diff --git a/docs/files/s3p/basic_vm_duration_histo.png b/docs/files/s3p/basic_vm_duration_histo.png Binary files differnew file mode 100644 index 000000000..d201d3b81 --- /dev/null +++ b/docs/files/s3p/basic_vm_duration_histo.png diff --git a/docs/files/s3p/guilin_daily_healthcheck.png b/docs/files/s3p/guilin_daily_healthcheck.png Binary files differnew file mode 100644 index 000000000..34a58ebda --- /dev/null +++ b/docs/files/s3p/guilin_daily_healthcheck.png diff --git a/docs/files/s3p/guilin_daily_infrastructure_healthcheck.png b/docs/files/s3p/guilin_daily_infrastructure_healthcheck.png Binary files differnew file mode 100644 index 000000000..be24c02ce --- /dev/null +++ b/docs/files/s3p/guilin_daily_infrastructure_healthcheck.png diff --git a/docs/files/s3p/guilin_daily_security.png b/docs/files/s3p/guilin_daily_security.png Binary files differnew file mode 100644 index 000000000..1d3d518c0 --- /dev/null +++ b/docs/files/s3p/guilin_daily_security.png diff --git a/docs/files/s3p/guilin_daily_smoke.png b/docs/files/s3p/guilin_daily_smoke.png Binary files differnew file mode 100644 index 000000000..5200c575e --- /dev/null +++ b/docs/files/s3p/guilin_daily_smoke.png diff --git a/docs/files/s3p/stability_sdnc_memory.png b/docs/files/s3p/stability_sdnc_memory.png Binary files differnew file mode 100644 index 000000000..c381077f5 --- /dev/null +++ b/docs/files/s3p/stability_sdnc_memory.png diff --git a/docs/integration-s3p.rst b/docs/integration-s3p.rst index e1220a002..70294f0d6 100644 --- a/docs/integration-s3p.rst +++ b/docs/integration-s3p.rst @@ -5,14 +5,262 @@ ONAP Maturity Testing Notes --------------------------- -Stability -========= +.. important:: + The Release stability has been evaluated by: -TODO -A stability test is planned on the final Guilin dockers. + - The Daily Guilin CI/CD chain + - A simple 24h healthcheck verification + - A 7 days stability test + +.. note: + The scope of these tests remains limited and does not provide a full set of + KPIs to determinate the limits and the dimensioning of the ONAP solution. CI results ========== -A daily Guilin CI chain has been created after RC0. -Due to policy changes in dockerhub (new quotas), the chain has been unstable. +As usual, a daily CI chain dedicated to the release is created after RC0. +A Daily Guilin has been created on the 18th of November 2020. + +Unfortunately several technical issues disturbed the chain: + +- Due to policy changes in DockerHub (new quotas), the installation chain was + not stable as the quota limit was rapidly reached. As a consequence the + installation was incomplete and most of the tests were failing. The problem + was fixed by the subscription of unlimitted account on DockerHub. +- Due to an upgrade of the Git Jenkins plugin done by LF IT, the synchronization + of the miror of the xtesting repository, used daily to generate the test suite + dockers was corrupted. The dockers were built daily from Jenkins but with an + id from the 25th of September. As a consequence the tests reported lots of + failure because they were corresponding to Frankfurt tests without the + adaptations done for Guilin. The problem was fixed temporarily by moving to + GitLab.com Docker registry then by the downgrade of the plugin executed by LF + IT during Thanksgiving break. + +The first week of the Daily Guilin results are therefore not really usable. +Most of the results from the `daily Guilin result portal +<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/>`_ +are not trustable and may be misleading. +The results became more stable from the the 6th of December. + +The graphs given hereafter are based on the data collected until the 8th of +december. This Daily chain will be maintained during the Honolulu development +cycle (Daily Master) and can be audited at any time. In case of reproducible +errors, the integration team will open JIRA on Guilin. + +Several public Daily Guilin chains have been put in place, one in Orange +(Helm v2) and one in DT (Helm v3). DT results are pushed in the test DB and can +be observed in +`ONAP Testing DT lab result page <http://testresults.opnfv.org/onap-integration/dt/dt.html>`_. + +Infrastructure Healthcheck Tests +................................ + +These tests deal with the Kubernetes/Helm tests on ONAP cluster. +The global expected criteria is **50%** when installing with Helm 2. +The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in +kubernetes are expected to be PASS but two tests are expected to fail: + +- onap-helm (32/33 OK) due to the size of the SO helm chart (too big for Helm2). +- nodeport_check_certs due to bad certificate issuers (Root CA certificate non + valid). In theory all the certificate shall be generated during the installation + and be valid for the 364 days after the installation. It is still not the case. + However, for the first time, no certificate was expired. Next certificates to + renew are: + - Music (2021-02-03) + - VID (2021-03-17) + - Message-router-external (2021-03-25) + - CDS-UI (2021-02-18) + - AAI and AAI-SPARKY-BE (2021-03-17) + +.. image:: files/s3p/guilin_daily_infrastructure_healthcheck.png + :align: center + +Healthcheck Tests +................. + +These tests are the traditionnal robot healthcheck tests and additional tests +dealing with a single component. + +The expectation is **100% OK**. + +.. image:: files/s3p/guilin_daily_healthcheck.png + :align: center + +Smoke Tests +........... + +These tests are end to end tests. +See the :ref:`the Integration Test page <integration-tests>` for details. + +The expectation is **100% OK**. + +.. figure:: files/s3p/guilin_daily_smoke.png + :align: center + +An error has been detected on the SDC when performing parallel tests. +See `SDC-3366 <https://jira.onap.org/browse/SDC-3366>`_ for details. + +Security Tests +.............. + +These tests are tests dealing with security. +See the :ref:`the Integration Test page <integration-tests>` for details. + +The expectation is **66% OK**. The criteria is met. + +It may even be above as 2 fail tests are almost correct: + +- the unlimited pod test is still fail due to only one pod: onap-ejbca. +- the nonssl tests is FAIL due to so and os-vnfm adapter, which were supposed to + be managed with the ingress (not possible for this release) and got a waiver + in Frankfurt. + +.. figure:: files/s3p/guilin_daily_security.png + :align: center + +A simple 24h healthcheck verification +===================================== + +This test consists in running the Healthcheck tests every 10 minutes during +24h. + +The test was run from the 6th of december to the 7th of december. + +The success rate was 100%. + +The results are stored in the +`test database <http://testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full>`_ + +A 6 days stability test +======================= + +This test consists on running the test basic_vm continuously during 1 week. + +We observe the cluster metrics as well as the evolution of the test duration. +The test basic_vm is describe in :ref:`the Integration Test page <integration-tests>`. + +Within a long duration test context, the test will onboard a service once then +instantiate this service multiple times. Before instantiating, it will +systematically contact the SDC and the AAI to verify that the resources already +exist. In this context the most impacted component is SO, which was delivered +relatively late compared to the other components. + +Basic_vm test +............. + +The basic_vm test consists in the different following steps: + +- [SDC] VendorOnboardStep: Onboard vendor in SDC. +- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC. +- [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC. +- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file + in SDC. +- [AAI] RegisterCloudRegionStep: Register cloud region. +- [AAI] ComplexCreateStep: Create complex. +- [AAI] LinkCloudRegionToComplexStep: Connect cloud region with complex. +- [AAI] CustomerCreateStep: Create customer. +- [AAI] CustomerServiceSubscriptionCreateStep: Create customer's service + subscription. +- [AAI] ConnectServiceSubToCloudRegionStep: Connect service subscription with + cloud region. +- [SO] YamlTemplateServiceAlaCarteInstantiateStep: Instantiate service described + in YAML using SO a'la carte method. +- [SO] YamlTemplateVnfAlaCarteInstantiateStep: Instantiate vnf described in YAML + using SO a'la carte method. +- [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module + described in YAML using SO a'la carte method. + +The test has been initiated on a weekly lab on the 2nd of december. +The results provided hereafter correspond to the period from 2020-12-02 to +2020-12-08. + +.. csv-table:: Basic_vm results + :file: ./files/csv/stability_basic_vm.csv + :widths: 70, 30 + :delim: ; + :header-rows: 1 + +.. note:: + + The corrected success rate excludes the FAIL results obtained during the SDNC + saturation phase. + The cause of the errors shall be analyzed more in details. The huge majority of + errors (79%) occurs on SO service creation, 18% on VNF creation and 3% on + module creation. + +.. important:: + The test success rate is about 86%. + CPU consumption is low (see next section). + Memory consumption is high. + + After ~ 24-48h, the test is systematically FAIL. The trace shows that the SDNC + is no more responding. This error required the manual restart of the SDNC. + It seems that the SDNC exceeds its limits set in OOM. The simple manual + restart (delete of the pod was enough, the test after the restart is PASS, + and keep most of the time PASS for the next 24-48h) + +We can observe the consequences of the manual restart of the SDNC on its memory +graph as well as the memory threshold. + +.. figure:: files/s3p/stability_sdnc_memory.png + :align: center + +The duration of the test is increasing slowly over the week and can be described +as follows: + +.. figure:: files/s3p/basic_vm_duration.png + :align: center + +If we consider the histogram, we can see the distribution of the duration. + +.. figure:: files/s3p/basic_vm_duration_histo.png + :align: center + +As a conclusion, the solution seems stable. + +The memory issue detected in the SDNC may be due to a bad sizing of the limits +and requests in OOM but a problem of light memory leak cannot be exclude. +The workaround consisting in restarting of the SDNC seems to fix the issue. +The issue is tracked in `SDNC-1430 <https://jira.onap.org/browse/SDNC-1430>`_. +Further study shall be done on this topic to consildate the detection of the +root cause. + +Cluster metrics +............... + +The Metrics of the ONAP cluster on this 6 days period are given by the +following tables: + +.. csv-table:: CPU + :file: ./files/csv/stability_cluster_metric_cpu.csv + :widths: 20,10,10,10,10,10,10,10 + :delim: ; + :header-rows: 1 + +.. csv-table:: Memory + :file: ./files/csv/stability_cluster_metric_memory.csv + :widths: 20,10,10,10,10,10,10,10 + :delim: ; + :header-rows: 1 + +.. csv-table:: Network + :file: ./files/csv/stability_cluster_metric_network.csv + :widths: 10,15,15,15,15,15,15 + :delim: ; + :header-rows: 1 + +The Top Ten for Memory consumption is given in the table below: + +.. csv-table:: Memory + :file: ./files/csv/stability_top10_memory.csv + :widths: 20,15,15,20,15,15 + :delim: ; + :header-rows: 1 + +At least 9 components exceeds their Memory Requests. And 7 are over the Memory +limits set in OOM: the 2 Opendaylight controllers and the cassandra Databases. + +As indicated CPU consumption is negligeable and not dimensioning. +It shall be reconsider for use cases including extensive computation (loops, +optimization algorithms). diff --git a/docs/repo-onap-integration-ci.rst b/docs/onap-integration-ci.rst index 150c82b40..150c82b40 100644 --- a/docs/repo-onap-integration-ci.rst +++ b/docs/onap-integration-ci.rst |