diff options
Diffstat (limited to 'docs/integration-s3p.rst')
-rw-r--r-- | docs/integration-s3p.rst | 339 |
1 files changed, 171 insertions, 168 deletions
diff --git a/docs/integration-s3p.rst b/docs/integration-s3p.rst index 49c67850f..13e36c17a 100644 --- a/docs/integration-s3p.rst +++ b/docs/integration-s3p.rst @@ -1,204 +1,207 @@ +.. This work is licensed under a + Creative Commons Attribution 4.0 International License. .. _integration-s3p: -ONAP Maturity Testing Notes ---------------------------- - -Historically integration team used to execute specific stability and resilience -tests on target release. For frankfurt a stability test was executed. -Openlab, based on Frankfurt RC0 dockers was also observed a long duration -period to evaluate the overall stability. -Finally the CI daily chain created at Frankfurt RC0 was also a precious indicator -to estimate the solution stability. - -No resilience or stress tests have been executed due to a lack of resources -and late availability of the release. The testing strategy shall be amended in -Guilin, several requirements have been created to improve the S3P testing domain. +:orphan: Stability ========= -ONAP stability was tested through a 72 hour test. -The intent of the 72 hour stability test is not to exhaustively test all -functions but to run a steady load against the system and look for issues like -memory leaks that cannot be found in the short duration install and functional -testing during the development cycle. - -Integration Stability Testing verifies that the ONAP platform remains fully -functional after running for an extended amounts of time. -This is done by repeated running tests against an ONAP instance for a period of -72 hours. - -:: - - **The 72 hour stability run result was PASS** - -The onboard and instantiate tests ran for over **115 hours** before environment -issues stopped the test. There were errors due to both tooling and environment -errors. - -The overall memory utilization only grew about **2%** on the work nodes despite -the environment issues. Interestingly the kubernetes ochestration node memory -grew more which could mean we are over driving the API's in some fashion. - -We did not limit other tenant activities in Windriver during this test run and -we saw the impact from things like the re-installation of SB00 in the tenant -and general network latency impacts that caused openstack to be slower to -instantiate. -For future stability runs we should go back to the process of shutting down -non-critical tenants in the test environment to free up host resources for -the test run (or other ways to prevent other testing from affecting the stability -run). - -The control loop tests were **100% successful** and the cycle time for the loop was -fairly consistent despite the environment issues. Future control loop stability -tests should consider doing more policy edit type activites and running more -control loop if host resources are available. The 10 second VES telemetry event -is quite aggressive so we are sending more load into the VES collector and TCA -engine during onset events than would be typical so adding additional loops -should factor that in. The jenkins jobs ran fairly well although the instantiate -Demo vFWCL took longer than usual and should be factored into future test planning. - - -Methodology -~~~~~~~~~~~ +.. important:: + The Release stability has been evaluated by: -The Stability Test has two main components: + - The daily CI/CD chain + - Stability tests -- Running "ete stability72hr" Robot suite periodically. This test suite - verifies that ONAP can instantiate vDNS, vFWCL, and VVG. -- Set up vFW Closed Loop to remain running, then check periodically that the - closed loop functionality is still working. +.. note: + The scope of these tests remains limited and does not provide a full set of + KPIs to determinate the limits and the dimensioning of the ONAP solution. -The integration-longevity tenant in Intel/Windriver environment was used for the -72 hour tests. +CI results +---------- -The onap-ci job for "Project windriver-longevity-release-manual" was used for -the deployment with the OOM set to frankfurt and Integration branches set to -master. Integration master was used so we could catch the latest updates to -integration scripts and vnf heat templates. +As usual, a daily CI chain dedicated to the release is created after RC0. -The jenkins job needs a couple of updates for each release: +The daily results can be found in `LF DT lab daily results web site <https://logs.onap.org/onap-integration/daily/onap-daily-dt-oom-master/>`_. -- Set the integration branch to 'origin/master' -- Modify the parameters to deploy.sh to specify "-i master" and "-o frankfurt" - to get integration master an oom frankfurt clones onto the nfs server. +.. image:: files/s3p/jakarta-dashboard.png + :align: center -The path for robot logs on dockerdata-nfs changed in Frankfurt so the -/dev-robot/ becomes /dev/robot -.. note:: - For Frankfurt release, the stability test has been executed on an - kubernetes infrastructure based on El Alto recommendations. The kubernetes - version was 1.15.3 (frankfurt 1.15.11) and the helm version was 2.14.2 - (frankfurt 2.16.6). However the ONAP dockers were updated to Frankfurt RC2 - candidate versions. The results are informative and can be compared with - previous campaigns. The stability tests used robot container image - **1.6.1-STAGING-20200519T201214Z**. Robot container was patched to use GRA_API - since VNF_API has been deprecated. +Infrastructure Healthcheck Tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Shakedown consists of creating some temporary tags for stability72hrvLB, -stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully -(including cleanup) in the environment before the jenkins job started with the -higher level testsuite tag stability72hr that covers all three test types. +These tests deal with the Kubernetes/Helm tests on ONAP cluster. -Clean out the old buid jobs using a jenkins console script (manage jenkins) +The global expected criteria is **100%**. -:: +The onap-k8s and onap-k8s-teardown, providing a snapshop of the onap namespace +in Kubernetes, as well as the onap-helm tests are expected to be PASS. - def jobName = "windriver-longevity-stability72hr"= - def job = Jenkins.instance.getItem(jobName) - job.getBuilds().each { it.delete() } - job.nextBuildNumber = 1 - job.save() +.. image:: files/s3p/istanbul_daily_infrastructure_healthcheck.png + :align: center +Healthcheck Tests +~~~~~~~~~~~~~~~~~ -appc.properties updated to apply the fix for DMaaP message processing to call -http://localhost:8181 for the streams update. +These tests are the traditionnal robot healthcheck tests and additional tests +dealing with a single component. -Results: 100% PASS -~~~~~~~~~~~~~~~~~~ -=================== ======== ========== ======== ========= ========= -Test Case Attempts Env Issues Failures Successes Pass Rate -=================== ======== ========== ======== ========= ========= -Stability 72 hours 77 19 0 58 100% -vFW Closed Loop 60 0 0 100 100% -**Total** 137 19 0 158 **100%** -=================== ======== ========== ======== ========= ========= - -Detailed results can be found at https://wiki.onap.org/display/DW/Frankfurt+Stability+Run+Notes - -.. note:: - - Overall results were good. All of the test failures were due to - issues with the unstable environment and tooling framework. - - JIRAs were created for readiness/liveness probe issues found while - testing under the unstable environment. Patches applied to oom and - testsuite during the testing helped reduce test failures due to - environment and tooling framework issues. - - The vFW Closed Loop test was very stable and self recovered from - environment issues. - -Resources overview -~~~~~~~~~~~~~~~~~~ -============ ====================== =========== ========== ========== -Date #1 CPU #1 RAM CPU* RAM** -============ ====================== =========== ========== ========== -May 20 18:45 dcae-tca-anaytics:511m appc:2901Mi 1649 36092 -May 21 12:33 dcae-tca-anaytics:664m appc:2901Mi 1605 38221 -May 22 09:35 dcae-tca-anaytics:425m appc:2837Mi 1459 38488 -May 23 11:01 cassandra-1:371m appc:2849Mi 1829 39431 -============ ====================== =========== ========== ========== - -.. note:: - - Results are given from the command "kubectl -n onap top pods | sort -rn -k 3 - | head -20" - - * sum of the top 20 CPU consumption - - ** sum of the top 20 RAM consumption +The expectation is **100% OK**. -CI results -========== +.. image:: files/s3p/istanbul_daily_healthcheck.png + :align: center + +Smoke Tests +~~~~~~~~~~~ -A daily Frankfurt CI chain has been created after RC0. +These tests are end to end and automated use case tests. +See the :ref:`the Integration Test page <integration-tests>` for details. -The evolution of the full healthcheck test suite can be described as follows: +The expectation is **100% OK**. -|image1| +.. figure:: files/s3p/istanbul_daily_smoke.png + :align: center -Full healthcheck testsuite verifies the status of each component. It is -composed of 47 tests. The success rate from the 9th to the 28th was never under -95%. +Security Tests +~~~~~~~~~~~~~~ -4 test categories were defined: +These tests are tests dealing with security. +See the :ref:`the Integration Test page <integration-tests>` for details. -- infrastructure healthcheck: test of ONAP kubernetes cluster and help chart status -- healthcheck tests: verification of the components in the target deployment - environment -- smoke tests: basic VM tests (including onboarding/distribution/instantiation), - and automated use cases (pnf-registrate, hvves, 5gbulkpm) -- security tests +Waivers have been granted on different projects for the different tests. +The list of waivers can be found in +https://git.onap.org/integration/seccom/tree/waivers?h=jakarta. -The security target (66% for Frankfurt) was reached after the RC1. A regression -due to the automation of the hvves use case (triggering the exposition of a -public port in HTTP) was fixed on the 28th of May. +nodeport_check_certs test is expected to fail. Even tremendous progress have +been done in this area, some certificates (unmaintained, upstream or integration +robot pods) are still not correct due to bad certificate issuers (Root CA +certificate non valid) or extra long validity. Most of the certificates have +been installed using cert-manager and will be easily renewable. -|image2| +The expectation is **80% OK**. The criteria is met. -Orange Openlab -============== +.. figure:: files/s3p/istanbul_daily_security.png + :align: center -The Orange Openlab is a community lab targeting ONAP end user. It provides an -ONAP and cloud resources to discover ONAP. -A Frankfurt pre-RC0 version was installed beginning of May. The usual gating -testing suite was run daily in addition of the traffic generated by the lab -users. The VM instantiation has been working well without any reinstallation -over the **27** last days. +Stability tests +--------------- -Resilience -========== +Stability tests have been performed on Istanbul release: -The resilience test executed in El Alto was not realized in Frankfurt. +- SDC stability test +- Parallel instantiation test -.. |image1| image:: files/s3p/daily_frankfurt1.png - :width: 6.5in +The results can be found in the weekly backend logs +https://logs.onap.org/onap-integration/weekly/onap_weekly_pod4_istanbul. + +SDC stability test +~~~~~~~~~~~~~~~~~~ -.. |image2| image:: files/s3p/daily_frankfurt2.png - :width: 6.5in +In this test, we consider the basic_onboard automated test and we run 5 +simultaneous onboarding procedures in parallel during 24h. + +The basic_onboard test consists in the following steps: + +- [SDC] VendorOnboardStep: Onboard vendor in SDC. +- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC. +- [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC. +- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file + in SDC. + +The test has been initiated on the Istanbul weekly lab on the 14th of November. + +As already observed in daily|weekly|gating chain, we got race conditions on +some tests (https://jira.onap.org/browse/INT-1918). + +The success rate is expected to be above 95% on the 100 first model upload +and above 80% until we onboard more than 500 models. + +We may also notice that the function test_duration=f(time) increases +continuously. At the beginning the test takes about 200s, 24h later the same +test will take around 1000s. +Finally after 36h, the SDC systematically answers with a 500 HTTP answer code +explaining the linear decrease of the success rate. + +The following graphs provides a good view of the SDC stability test. + +.. image:: files/s3p/istanbul_sdc_stability.png + :align: center + +.. csv-table:: S3P Onboarding stability results + :file: ./files/csv/s3p-sdc.csv + :widths: 60,20,20,20 + :delim: ; + :header-rows: 1 + +.. important:: + The onboarding duration increases linearly with the number of on-boarded + models, which is already reported and may be due to the fact that models + cannot be deleted. In fact the test client has to retrieve the list of + models, which is continuously increasing. No limit tests have been + performed. + However 1085 on-boarded models is already a vry high figure regarding the + possible ONAP usage. + Moreover the mean duration time is much lower in Istanbul. + It explains why it was possible to run 35% more tests within the same + time frame. + +Parallel instantiations stability test +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The test is based on the single test (basic_vm) that can be described as follows: + +- [SDC] VendorOnboardStep: Onboard vendor in SDC. +- [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC. +- [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC. +- [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file + in SDC. +- [AAI] RegisterCloudRegionStep: Register cloud region. +- [AAI] ComplexCreateStep: Create complex. +- [AAI] LinkCloudRegionToComplexStep: Connect cloud region with complex. +- [AAI] CustomerCreateStep: Create customer. +- [AAI] CustomerServiceSubscriptionCreateStep: Create customer's service + subscription. +- [AAI] ConnectServiceSubToCloudRegionStep: Connect service subscription with + cloud region. +- [SO] YamlTemplateServiceAlaCarteInstantiateStep: Instantiate service described + in YAML using SO a'la carte method. +- [SO] YamlTemplateVnfAlaCarteInstantiateStep: Instantiate vnf described in YAML + using SO a'la carte method. +- [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module + described in YAML using SO a'la carte method. + +10 instantiation attempts are done simultaneously on the ONAP solution during 24h. + +The results can be described as follows: + +.. image:: files/s3p/istanbul_instantiation_stability_10.png + :align: center + +.. csv-table:: S3P Instantiation stability results + :file: ./files/csv/s3p-instantiation.csv + :widths: 60,20,20,20 + :delim: ; + :header-rows: 1 + +The results are good with a success rate above 95%. After 24h more than 1300 +VNF have been created and deleted. + +As for SDC, we can observe a linear increase of the test duration. This issue +has been reported since Guilin. For SDC as it is not possible to delete the +models, it is possible to imagine that the duration increases due to the fact +that the database of models continuously increases. Therefore the client has +to retrieve an always bigger list of models. +But for the instantiations, it is not the case as the references +(module, VNF, service) are cleaned at the end of each test and all the tests +use the same model. Then the duration of an instantiation test should be +almost constant, which is not the case. Further investigations are needed. + +.. important:: + The test has been executed with the mariadb-galera replicaset set to 1 + (3 by default). With this configuration the results during 24h are very + good. When set to 3, the error rate is higher and after some hours + most of the instantiation are failing. + However, even with a replicaset set to 1, a test on Master weekly chain + showed that the system is hitting another limit after about 35h + (https://jira.onap.org/browse/SO-3791). |