diff options
author | morganrol <morgan.richomme@orange.com> | 2021-04-06 19:57:44 +0200 |
---|---|---|
committer | Bartek Grzybowski <b.grzybowski@partner.samsung.com> | 2021-05-06 09:06:34 +0000 |
commit | 81dc70f5e575ae4a972fc50e038733a0dc8481aa (patch) | |
tree | 324edb7b75f89248860ab473abaf350210657e86 /docs/integration-s3p.rst | |
parent | a10322497f3e122a0fbd22f171dba88d131b1ae4 (diff) |
[DOC] Honolulu documentation
Update documentation for Honolulu
Issue-ID: INT-1888
Signed-off-by: morganrol <morgan.richomme@orange.com>
Change-Id: I66e6f397bcda898445e256dd8e22dfcb20847408
Diffstat (limited to 'docs/integration-s3p.rst')
-rw-r--r-- | docs/integration-s3p.rst | 524 |
1 files changed, 375 insertions, 149 deletions
diff --git a/docs/integration-s3p.rst b/docs/integration-s3p.rst index 38a76f995..fe0890a1f 100644 --- a/docs/integration-s3p.rst +++ b/docs/integration-s3p.rst @@ -1,108 +1,84 @@ +.. This work is licensed under a + Creative Commons Attribution 4.0 International License. .. _integration-s3p: -:orphan: - -ONAP Maturity Testing Notes ---------------------------- +Stability/Resiliency +==================== .. important:: The Release stability has been evaluated by: - - The Daily Guilin CI/CD chain - - A simple 24h healthcheck verification - - A 7 days stability test + - The daily Honolulu CI/CD chain + - Stability tests + - Resiliency tests .. note: The scope of these tests remains limited and does not provide a full set of KPIs to determinate the limits and the dimensioning of the ONAP solution. CI results -========== +---------- As usual, a daily CI chain dedicated to the release is created after RC0. -A Daily Guilin has been created on the 18th of November 2020. - -Unfortunately several technical issues disturbed the chain: - -- Due to policy changes in DockerHub (new quotas), the installation chain was - not stable as the quota limit was rapidly reached. As a consequence the - installation was incomplete and most of the tests were failing. The problem - was fixed by the subscription of unlimitted account on DockerHub. -- Due to an upgrade of the Git Jenkins plugin done by LF IT, the synchronization - of the miror of the xtesting repository, used daily to generate the test suite - dockers was corrupted. The dockers were built daily from Jenkins but with an - id from the 25th of September. As a consequence the tests reported lots of - failure because they were corresponding to Frankfurt tests without the - adaptations done for Guilin. The problem was fixed temporarily by moving to - GitLab.com Docker registry then by the downgrade of the plugin executed by LF - IT during Thanksgiving break. - -The first week of the Daily Guilin results are therefore not really usable. -Most of the results from the `daily Guilin result portal -<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_guilin/>`_ -are not trustable and may be misleading. -The results became more stable from the the 6th of December. - -The graphs given hereafter are based on the data collected until the 8th of -december. This Daily chain will be maintained during the Honolulu development -cycle (Daily Master) and can be audited at any time. In case of reproducible -errors, the integration team will open JIRA on Guilin. - -Several public Daily Guilin chains have been put in place, one in Orange -(Helm v2) and one in DT (Helm v3). DT results are pushed in the test DB and can -be observed in -`ONAP Testing DT lab result page <http://testresults.opnfv.org/onap-integration/dt/dt.html>`_. +A Honolulu chain has been created on the 6th of April 2021. + +The daily results can be found in `LF daily results web site +<https://logs.onap.org/onap-integration/daily/onap_daily_pod4_honolulu/2021-04/>`_. Infrastructure Healthcheck Tests -................................ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These tests deal with the Kubernetes/Helm tests on ONAP cluster. -The global expected criteria is **50%** when installing with Helm 2. -The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in -kubernetes are expected to be PASS but two tests are expected to fail: - -- onap-helm (32/33 OK) due to the size of the SO helm chart (too big for Helm2). -- nodeport_check_certs due to bad certificate issuers (Root CA certificate non - valid). In theory all the certificate shall be generated during the installation - and be valid for the 364 days after the installation. It is still not the case. - However, for the first time, no certificate was expired. Next certificates to - renew are: - - Music (2021-02-03) - - VID (2021-03-17) - - Message-router-external (2021-03-25) - - CDS-UI (2021-02-18) - - AAI and AAI-SPARKY-BE (2021-03-17) - -.. image:: files/s3p/guilin_daily_infrastructure_healthcheck.png + +The global expected criteria is **75%**. +The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in +Kubernetes as well as the onap-helm tests are expected to be PASS. + +nodeport_check_certs test is expected to fail. Even tremendous progress have +been done in this area, some certificates (unmaintained, upstream or integration +robot pods) are still not correct due to bad certificate issuers (Root CA +certificate non valid) or extra long validity. Most of the certificates have +been installed using cert-manager and will be easily renewable. + +.. image:: files/s3p/honolulu_daily_infrastructure_healthcheck.png :align: center Healthcheck Tests -................. +~~~~~~~~~~~~~~~~~ These tests are the traditionnal robot healthcheck tests and additional tests dealing with a single component. +Some tests (basic_onboard, basic_cds) may fail episodically due to the fact that +the startup of the SDC is sometimes not fully completed. + +The same test is run as first step of smoke tests and is usually PASS. +The mechanism to detect that all the components are fully operational may be +improved, timer based solutions are not robust enough. + The expectation is **100% OK**. -.. image:: files/s3p/guilin_daily_healthcheck.png +.. image:: files/s3p/honolulu_daily_healthcheck.png :align: center Smoke Tests -........... +~~~~~~~~~~~ -These tests are end to end tests. +These tests are end to end and automated use case tests. See the :ref:`the Integration Test page <integration-tests>` for details. The expectation is **100% OK**. -.. figure:: files/s3p/guilin_daily_smoke.png +.. figure:: files/s3p/honolulu_daily_smoke.png :align: center -An error has been detected on the SDC when performing parallel tests. -See `SDC-3366 <https://jira.onap.org/browse/SDC-3366>`_ for details. +An error has been detected on the SDNC preventing the basic_vm_macro to work. +See `SDNC-1529 <https://jira.onap.org/browse/SDNC-1529/>`_ for details. +We may also notice that SO timeouts occured more frequently than in Guilin. +See `SO-3584 <https://jira.onap.org/browse/SO-3584>`_ for details. Security Tests -.............. +~~~~~~~~~~~~~~ These tests are tests dealing with security. See the :ref:`the Integration Test page <integration-tests>` for details. @@ -111,43 +87,287 @@ The expectation is **66% OK**. The criteria is met. It may even be above as 2 fail tests are almost correct: -- the unlimited pod test is still fail due to only one pod: onap-ejbca. -- the nonssl tests is FAIL due to so and os-vnfm adapter, which were supposed to - be managed with the ingress (not possible for this release) and got a waiver - in Frankfurt. +- The unlimited pod test is still fail due testing pod (DCAE-tca). +- The nonssl tests is FAIL due to so and so-etsi-sol003-adapter, which were + supposed to be managed with the ingress (not possible for this release) and + got a waiver in Frankfurt. The pods cds-blueprints-processor-http and aws-web + are used for tests. -.. figure:: files/s3p/guilin_daily_security.png +.. figure:: files/s3p/honolulu_daily_security.png :align: center -A simple 24h healthcheck verification -===================================== +Resiliency tests +---------------- + +The goal of the resiliency testing was to evaluate the capability of the +Honolulu solution to survive a stop or restart of a Kubernetes control or +worker node. + +Controller node resiliency +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default the ONAP solution is installed with 3 controllers for high +availability. The test for controller resiliency can be described as follows: + +- Run tests: check that they are PASS +- Stop a controller node: check that the node appears in NotReady state +- Run tests: check that they are PASS + +2 tests were performed on the weekly honolulu lab. No problem was observed on +controller shutdown, tests were still PASS with a stoped controller node. + +More details can be found in <https://jira.onap.org/browse/TEST-309>. + +Worker node resiliency +~~~~~~~~~~~~~~~~~~~~~~ + +In community weekly lab, the ONAP pods are distributed on 12 workers. The goal +of the test was to evaluate the behavior of the pod on a worker restart +(disaster scenario assuming that the node was moved accidentally from Ready to +NotReady state). +The original conditions of such tests may be different as the Kubernetes +scheduler does not distribute the pods on the same worker from an installation +to another. + +The test procedure can be described as follows: + +- Run tests: check that they are PASS (Healthcheck and basic_vm used) +- Check that all the workers are in ready state + :: + $ kubectl get nodes + NAME STATUS ROLES AGE VERSION + compute01-onap-honolulu Ready <none> 18h v1.19.9 + compute02-onap-honolulu Ready <none> 18h v1.19.9 + compute03-onap-honolulu Ready <none> 18h v1.19.9 + compute04-onap-honolulu Ready <none> 18h v1.19.9 + compute05-onap-honolulu Ready <none> 18h v1.19.9 + compute06-onap-honolulu Ready <none> 18h v1.19.9 + compute07-onap-honolulu Ready <none> 18h v1.19.9 + compute08-onap-honolulu Ready <none> 18h v1.19.9 + compute09-onap-honolulu Ready <none> 18h v1.19.9 + compute10-onap-honolulu Ready <none> 18h v1.19.9 + compute11-onap-honolulu Ready <none> 18h v1.19.9 + compute12-onap-honolulu Ready <none> 18h v1.19.9 + control01-onap-honolulu Ready master 18h v1.19.9 + control02-onap-honolulu Ready master 18h v1.19.9 + control03-onap-honolulu Ready master 18h v1.19.9 + +- Select a worker, list the impacted pods + :: + $ kubectl get pod -n onap --field-selector spec.nodeName=compute01-onap-honolulu + NAME READY STATUS RESTARTS AGE + onap-aaf-fs-7b6648db7f-shcn5 1/1 Running 1 22h + onap-aaf-oauth-5896545fb7-x6grg 1/1 Running 1 22h + onap-aaf-sms-quorumclient-2 1/1 Running 1 22h + onap-aai-modelloader-86d95c994b-87tsh 2/2 Running 2 22h + onap-aai-schema-service-75575cb488-7fxs4 2/2 Running 2 22h + onap-appc-cdt-58cb4766b6-vl78q 1/1 Running 1 22h + onap-appc-db-0 2/2 Running 4 22h + onap-appc-dgbuilder-5bb94d46bd-h2gbs 1/1 Running 1 22h + onap-awx-0 4/4 Running 4 22h + onap-cassandra-1 1/1 Running 1 22h + onap-cds-blueprints-processor-76f8b9b5c7-hb5bg 1/1 Running 1 22h + onap-dmaap-dr-db-1 2/2 Running 5 22h + onap-ejbca-6cbdb7d6dd-hmw6z 1/1 Running 1 22h + onap-kube2msb-858f46f95c-jws4m 1/1 Running 1 22h + onap-message-router-0 1/1 Running 1 22h + onap-message-router-kafka-0 1/1 Running 1 22h + onap-message-router-kafka-1 1/1 Running 1 22h + onap-message-router-kafka-2 1/1 Running 1 22h + onap-message-router-zookeeper-0 1/1 Running 1 22h + onap-multicloud-794c6dffc8-bfwr8 2/2 Running 2 22h + onap-multicloud-starlingx-58f6b86c55-mff89 3/3 Running 3 22h + onap-multicloud-vio-584d556876-87lxn 2/2 Running 2 22h + onap-music-cassandra-0 1/1 Running 1 22h + onap-netbox-nginx-8667d6675d-vszhb 1/1 Running 2 22h + onap-policy-api-6dbf8485d7-k7cpv 1/1 Running 1 22h + onap-policy-clamp-be-6d77597477-4mffk 1/1 Running 1 22h + onap-policy-pap-785bd79759-xxhvx 1/1 Running 1 22h + onap-policy-xacml-pdp-7d8fd58d59-d4m7g 1/1 Running 6 22h + onap-sdc-be-5f99c6c644-dcdz8 2/2 Running 2 22h + onap-sdc-fe-7577d58fb5-kwxpj 2/2 Running 2 22h + onap-sdc-wfd-fe-6997567759-gl9g6 2/2 Running 2 22h + onap-sdnc-dgbuilder-564d6475fd-xwwrz 1/1 Running 1 22h + onap-sdnrdb-master-0 1/1 Running 1 22h + onap-so-admin-cockpit-6c5b44694-h4d2n 1/1 Running 1 21h + onap-so-etsi-sol003-adapter-c9bf4464-pwn97 1/1 Running 1 21h + onap-so-sdc-controller-6899b98b8b-hfgvc 2/2 Running 2 21h + onap-vfc-mariadb-1 2/2 Running 4 21h + onap-vfc-nslcm-6c67677546-xcvl2 2/2 Running 2 21h + onap-vfc-vnflcm-78ff4d8778-sgtv6 2/2 Running 2 21h + onap-vfc-vnfres-6c96f9ff5b-swq5z 2/2 Running 2 21h + +- Stop the worker (shutdown the machine for baremetal or the VM if you installed + your Kubernetes on top of an OpenStack solution) +- Wait for the pod eviction procedure completion (5 minutes) + :: + $ kubectl get nodes + NAME STATUS ROLES AGE VERSION + compute01-onap-honolulu NotReady <none> 18h v1.19.9 + compute02-onap-honolulu Ready <none> 18h v1.19.9 + compute03-onap-honolulu Ready <none> 18h v1.19.9 + compute04-onap-honolulu Ready <none> 18h v1.19.9 + compute05-onap-honolulu Ready <none> 18h v1.19.9 + compute06-onap-honolulu Ready <none> 18h v1.19.9 + compute07-onap-honolulu Ready <none> 18h v1.19.9 + compute08-onap-honolulu Ready <none> 18h v1.19.9 + compute09-onap-honolulu Ready <none> 18h v1.19.9 + compute10-onap-honolulu Ready <none> 18h v1.19.9 + compute11-onap-honolulu Ready <none> 18h v1.19.9 + compute12-onap-honolulu Ready <none> 18h v1.19.9 + control01-onap-honolulu Ready master 18h v1.19.9 + control02-onap-honolulu Ready master 18h v1.19.9 + control03-onap-honolulu Ready master 18h v1.19.9 + +- Run the tests: check that they are PASS + +.. warning:: + In these conditions, **the tests will never be PASS**. In fact several components + will remeain in INIT state. + A procedure is required to ensure a clean restart. + +List the non running pods:: + + $ kubectl get pods -n onap --field-selector status.phase!=Running | grep -v Completed + NAME READY STATUS RESTARTS AGE + onap-appc-dgbuilder-5bb94d46bd-sxmmc 0/1 Init:3/4 15 156m + onap-cds-blueprints-processor-76f8b9b5c7-m7nmb 0/1 Init:1/3 0 156m + onap-portal-app-595bd6cd95-bkswr 0/2 Init:0/4 84 23h + onap-portal-db-config-6s75n 0/2 Error 0 23h + onap-portal-db-config-7trzx 0/2 Error 0 23h + onap-portal-db-config-jt2jl 0/2 Error 0 23h + onap-portal-db-config-mjr5q 0/2 Error 0 23h + onap-portal-db-config-qxvdt 0/2 Error 0 23h + onap-portal-db-config-z8c5n 0/2 Error 0 23h + onap-sdc-be-5f99c6c644-kplqx 0/2 Init:2/5 14 156 + onap-vfc-nslcm-6c67677546-86mmj 0/2 Init:0/1 15 156m + onap-vfc-vnflcm-78ff4d8778-h968x 0/2 Init:0/1 15 156m + onap-vfc-vnfres-6c96f9ff5b-kt9rz 0/2 Init:0/1 15 156m + +Some pods are not rescheduled (i.e. onap-awx-0 and onap-cassandra-1 above) +because they are part of a statefulset. List the statefulset objects:: + + $ kubectl get statefulsets.apps -n onap | grep -v "1/1" | grep -v "3/3" + NAME READY AGE + onap-aaf-sms-quorumclient 2/3 24h + onap-appc-db 2/3 24h + onap-awx 0/1 24h + onap-cassandra 2/3 24h + onap-dmaap-dr-db 2/3 24h + onap-message-router 0/1 24h + onap-message-router-kafka 0/3 24h + onap-message-router-zookeeper 2/3 24h + onap-music-cassandra 2/3 24h + onap-sdnrdb-master 2/3 24h + onap-vfc-mariadb 2/3 24h + +For the pods being part of the statefulset, a forced deleteion is required. +As an example if we consider the statefulset onap-sdnrdb-master, we must follow +the procedure:: + + $ kubectl get pods -n onap -o wide |grep onap-sdnrdb-master + onap-sdnrdb-master-0 1/1 Terminating 1 24h 10.42.3.92 node1 + onap-sdnrdb-master-1 1/1 Running 1 24h 10.42.1.122 node2 + onap-sdnrdb-master-2 1/1 Running 1 24h 10.42.2.134 node3 + + $ kubectl delete -n onap pod onap-sdnrdb-master-0 --force + warning: Immediate deletion does not wait for confirmation that the running + resource has been terminated. The resource may continue to run on the cluster + indefinitely. + pod "onap-sdnrdb-master-0" force deleted + + $ kubectl get pods |grep onap-sdnrdb-master + onap-sdnrdb-master-0 0/1 PodInitializing 0 11s + onap-sdnrdb-master-1 1/1 Running 1 24h + onap-sdnrdb-master-2 1/1 Running 1 24h + + $ kubectl get pods |grep onap-sdnrdb-master + onap-sdnrdb-master-0 1/1 Running 0 43s + onap-sdnrdb-master-1 1/1 Running 1 24h + onap-sdnrdb-master-2 1/1 Running 1 24h + +Once all the statefulset are properly restarted, the other components shall +continue their restart properly. +Once the restart of the pods is completed, the tests are PASS. -This test consists in running the Healthcheck tests every 10 minutes during -24h. +.. important:: -The test was run from the 6th of december to the 7th of december. + K8s node reboots/shutdown is showing some deficiencies in ONAP components in + regard of their availability measured with HC results. Some pods may + still fail to initialize after reboot/shutdown(pod rescheduled). -The success rate was 100%. + However cluster as a whole behaves as expected, pods are rescheduled after + node shutdown (except pods being part of statefulset which need to be deleted + forcibly - normal Kubernetes behavior) -The results are stored in the -`test database <http://testresults.opnfv.org/onap/api/v1/results?pod_name=onap_daily_pod4_master-ONAP-oom&case_name=full>`_ + On rebooted node, should its downtime not exceed eviction timeout, pods are + restarted back after it is again available. -A 6 days stability test -======================= +Please see `Integration Resiliency page <https://jira.onap.org/browse/TEST-308>`_ +for details. -This test consists on running the test basic_vm continuously during 1 week. +Stability tests +--------------- -We observe the cluster metrics as well as the evolution of the test duration. -The test basic_vm is describe in :ref:`the Integration Test page <integration-tests>`. +Three stability tests have been performed in Honolulu: + +- SDC stability test +- Simple instantiation test (basic_vm) +- Parallel instantiation test + +SDC stability test +~~~~~~~~~~~~~~~~~~ + +In this test, we consider the basic_onboard automated test and we run 5 +simultaneous onboarding procedures in parallel during 72h. + +The basic_onboard test consists in the following steps: + +- [SDC] VendorOnboardStep: Onboard vendor in SDC. +- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC. +- [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC. +- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file + in SDC. + +The test has been initiated on the honolulu weekly lab on the 19th of April. + +As already observed in daily|weekly|gating chain, we got race conditions on +some tests (https://jira.onap.org/browse/INT-1918). + +The success rate is above 95% on the 100 first model upload and above 80% +until we onboard more than 500 models. + +We may also notice that the function test_duration=f(time) increases +continuously. At the beginning the test takes about 200s, 24h later the same +test will take around 1000s. +Finally after 36h, the SDC systematically answers with a 500 HTTP answer code +explaining the linear decrease of the success rate. + +The following graphs provides a good view of the SDC stability test. + +.. image:: files/s3p/honolulu_sdc_stability.png + :align: center + +.. important:: + SDC can support up to 100s models onboarding. + The onbaording duration increases linearly with the number of onboarded + models + After a while, the SDC is no more usable. + No major Cluster resource issues have been detected during the test. The + memory consumption is however relatively high regarding the load. -Within a long duration test context, the test will onboard a service once then -instantiate this service multiple times. Before instantiating, it will -systematically contact the SDC and the AAI to verify that the resources already -exist. In this context the most impacted component is SO, which was delivered -relatively late compared to the other components. +.. image:: files/s3p/honolulu_sdc_stability_resources.png + :align: center -Basic_vm test -............. + +Simple stability test +~~~~~~~~~~~~~~~~~~~~~ + +This test consists on running the test basic_vm continuously during 72h. + +We observe the cluster metrics as well as the evolution of the test duration. + +The test basic_vm is described in :ref:`the Integration Test page <integration-tests>`. The basic_vm test consists in the different following steps: @@ -171,84 +391,87 @@ The basic_vm test consists in the different following steps: - [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module described in YAML using SO a'la carte method. -The test has been initiated on a weekly lab on the 2nd of december. -The results provided hereafter correspond to the period from 2020-12-02 to -2020-12-08. +The test has been initiated on the Honolulu weekly lab on the 26th of April 2021. +This test has been run after the test described in the next section. +A first error occured after few hours (mariadbgalera), then the system +automatically recovered for some hours before a full crash of the mariadb +galera. -.. csv-table:: Basic_vm results - :file: ./files/csv/stability_basic_vm.csv - :widths: 70, 30 - :delim: ; - :header-rows: 1 +:: -.. note:: + debian@control01-onap-honolulu:~$ kubectl get pod -n onap |grep mariadb-galera + onap-mariadb-galera-0 1/2 CrashLoopBackOff 625 5d16h + onap-mariadb-galera-1 1/2 CrashLoopBackOff 1134 5d16h + onap-mariadb-galera-2 1/2 CrashLoopBackOff 407 5d16h - The corrected success rate excludes the FAIL results obtained during the SDNC - saturation phase. - The cause of the errors shall be analyzed more in details. The huge majority of - errors (79%) occurs on SO service creation, 18% on VNF creation and 3% on - module creation. -.. important:: - The test success rate is about 86%. - CPU consumption is low (see next section). - Memory consumption is high. +It was unfortunately not possible to collect the root cause (logs of the first +restart of onap-mariadb-galera-1). - After ~ 24-48h, the test is systematically FAIL. The trace shows that the SDNC - is no more responding. This error required the manual restart of the SDNC. - It seems that the SDNC exceeds its limits set in OOM. The simple manual - restart (delete of the pod was enough, the test after the restart is PASS, - and keep most of the time PASS for the next 24-48h) +Community members reported that they already faced such issues and suggest to +deploy a single maria instance instead of using MariaDB galera. +Moreover, in Honolulu there were some changes in order to allign Camunda (SO) +requirements for MariaDB galera.. -We can observe the consequences of the manual restart of the SDNC on its memory -graph as well as the memory threshold. +During the limited valid window, the success rate was about 78% (85% for the +same test in Guilin). +The duration of the test remain very variable as also already reported in Guilin +(https://jira.onap.org/browse/SO-3419). The duration of the same test may vary +from 500s to 2500s as illustrated in the following graph: -.. figure:: files/s3p/stability_sdnc_memory.png - :align: center +.. image:: files/s3p/honolulu_so_stability_1_duration.png + :align: center -The duration of the test is increasing slowly over the week and can be described -as follows: +The changes in MariaDB galera seems to have introduced some issues leading to +more unexpected timeouts. +A troubleshooting campaign has been launched to evaluate possible evolutions in +this area. -.. figure:: files/s3p/basic_vm_duration.png - :align: center +Parallel instantiations stability test +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If we consider the histogram, we can see the distribution of the duration. +Still based on basic_vm, 5 instantiation attempts are done simultaneously on the +ONAP solution during 48h. -.. figure:: files/s3p/basic_vm_duration_histo.png - :align: center +The results can be described as follows: -As a conclusion, the solution seems stable. +.. image:: files/s3p/honolulu_so_stability_5.png + :align: center -The memory issue detected in the SDNC may be due to a bad sizing of the limits -and requests in OOM but a problem of light memory leak cannot be exclude. -The workaround consisting in restarting of the SDNC seems to fix the issue. -The issue is tracked in `SDNC-1430 <https://jira.onap.org/browse/SDNC-1430>`_. -Further study shall be done on this topic to consildate the detection of the -root cause. +For this test, we have to restart the SDNC once. The last failures are due to +a certificate infrastructure issue and are independent from ONAP. Cluster metrics -............... +~~~~~~~~~~~~~~~ -The Metrics of the ONAP cluster on this 6 days period are given by the -following tables: +.. important:: + No major cluster resource issues have been detected in the cluster metrics + +The metrics of the ONAP cluster have been recorded over the full week of +stability tests: .. csv-table:: CPU :file: ./files/csv/stability_cluster_metric_cpu.csv - :widths: 20,10,10,10,10,10,10,10 + :widths: 20,20,20,20,20 :delim: ; :header-rows: 1 -.. csv-table:: Memory - :file: ./files/csv/stability_cluster_metric_memory.csv - :widths: 20,10,10,10,10,10,10,10 +.. image:: files/s3p/honolulu_weekly_cpu.png + :align: center + +.. image:: files/s3p/honolulu_weekly_memory.png + :align: center + +The Top Ten for CPU consumption is given in the table below: + +.. csv-table:: CPU + :file: ./files/csv/stability_top10_cpu.csv + :widths: 20,15,15,20,15,15 :delim: ; :header-rows: 1 -.. csv-table:: Network - :file: ./files/csv/stability_cluster_metric_network.csv - :widths: 10,15,15,15,15,15,15 - :delim: ; - :header-rows: 1 +CPU consumption is negligeable and not dimensioning. It shall be reconsider for +use cases including extensive computation (loops, optimization algorithms). The Top Ten for Memory consumption is given in the table below: @@ -258,9 +481,12 @@ The Top Ten for Memory consumption is given in the table below: :delim: ; :header-rows: 1 -At least 9 components exceeds their Memory Requests. And 7 are over the Memory -limits set in OOM: the 2 Opendaylight controllers and the cassandra Databases. +Without surprise, the Cassandra databases are using most of the memory. + +The Top Ten for Network consumption is given in the table below: -As indicated CPU consumption is negligeable and not dimensioning. -It shall be reconsider for use cases including extensive computation (loops, -optimization algorithms). +.. csv-table:: Network + :file: ./files/csv/stability_top10_net.csv + :widths: 10,15,15,15,15,15,15 + :delim: ; + :header-rows: 1 |