Update documentation

Issue-ID: INT-1551 Signed-off-by: mrichomme <morgan.richomme@orange.com> Change-Id: Iacc99e3504ae62e90bc3d963056d780ae13c402a Signed-off-by: mrichomme <morgan.richomme@orange.com>
author: mrichomme <morgan.richomme@orange.com> 2020-05-28 22:38:30 +0200
committer: Marcin Przybysz <marcin.przybysz@nokia.com> 2020-05-29 14:04:39 +0000
commit: ac588d62a90e7badc32acf476e0ce391e8bdd31b (patch)
tree: 7b033673ec6f95085c4edd6b734e4d2dbfdbf1dc /docs/integration-s3p.rst
parent: c01b1fb2ff905d1f39437f7b1106b3e9bd6d7338 (diff)
1 files changed, 147 insertions, 49 deletions
diff --git a/docs/integration-s3p.rst b/docs/integration-s3p.rst
index f42b48911..835c08e1b 100644
--- a/docs/integration-s3p.rst
+++ b/docs/integration-s3p.rst
@@ -3,22 +3,60 @@
 ONAP Maturity Testing Notes
 ---------------------------
 
-For the El Alto release, ONAP continues to improve in multiple
-areas of Scalability, Security, Stability and Performance (S3P)
-metrics.
-
+Historically integration team used to execute specific stability and resilience
+tests on target release. For frankfurt a stability test was executed.
+Openlab, based on  Frankfurt RC0 dockers was also observed a long duration
+period to evaluate the overall stability.
+Finally the CI daily chain created at Frankfurt RC0 was also a precious indicator
+to estimate the solution stability.
 
+No resilience or stress tests have been executed due to a lack of resources
+and late availability of the release. The testing strategy shall be amended in
+Guilin, several requirements have been created to improve the S3P testing domain.
 
 Stability
 =========
 
-** TODO **
+ONAP stability was tested through a 72 hour test.
+The intent of the 72 hour stability test is not to exhaustively test all
+functions but to run a steady load against the system and look for issues like
+memory leaks that cannot be found in the short duration install and functional
+testing during the development cycle.
 
 Integration Stability Testing verifies that the ONAP platform remains fully
 functional after running for an extended amounts of time.
 This is done by repeated running tests against an ONAP instance for a period of
 72 hours.
 
+The 72 hour stability run result was **PASS**.
+
+The onboard and instantiate tests ran for over 115 hours before environment
+issues stopped the test. There were errors due to both tooling and environment
+errors.
+
+The overall memory utilization only grew about 2% on the work nodes despite
+the environment issues. Interestingly the kubernetes ochestration node memory
+grew more which could mean we are over driving the API's in some fashion.
+
+We did not limit other tenant activities in Windriver during this test run and
+we saw the impact from things like the re-installation of SB00 in the tenant
+and general network latency impacts that caused openstack to be slower to
+instantiate.
+For future stability runs we should go back to the process of shutting down
+non-critical tenants in the test environment to free up host resources for
+the test run (or other ways to prevent other testing from affecting the stability
+run).
+
+The control loop tests were **100% successful** and the cycle time for the loop was
+fairly consistent despite the environment issues. Future control loop stability
+tests should consider doing more policy edit type activites and running more
+control loop if host resources are available. The 10 second VES telemetry event
+is quite aggressive so we are sending more load into the VES collector and TCA
+engine during onset events than would be typical so adding additional loops
+should factor that in. The jenkins jobs ran fairly well although the instantiate
+Demo vFWCL took longer than usual and should be factored into future test planning.
+
+
 Methodology
 ~~~~~~~~~~~
 
@@ -29,18 +67,58 @@ The Stability Test has two main components:
 - Set up vFW Closed Loop to remain running, then check periodically that the
   closed loop functionality is still working.
 
+The integration-longevity tenant in Intel/Windriver environment was used for the
+72 hour tests.
+
+The onap-ci job for  "Project windriver-longevity-release-manual" was used for
+the deployment with the OOM set to frankfurt and Integration branches set to
+master. Integration master was used so we could catch the latest updates to
+integration scripts and vnf heat templates.
+
+The jenkins job needs a couple of updates for each release:
+
+- Set the integration branch to 'origin/master'
+- Modify the parameters to deploy.sh to specify "-i master" and "-o frankfurt"
+  to get integration master an oom frankfurt clones onto the nfs server.
+
+The path for robot logs on dockerdata-nfs  changed in Frankfurt so the
+/dev-robot/ becomes /dev/robot
+
+The stability tests used robot container image  **1.6.1-STAGING-20200519T201214Z**.
+
+robot container updates: API_TYPE was set to GRA_API since we have deprecated
+VNF_API.
+
+Shakedown consists of creating some temporary tags for stability72hrvLB,
+stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully
+(including cleanup) in the environment before the jenkins job started with the
+higher level testsuite tag stability72hr that covers all three test types.
+
+Clean out the old buid jobs using a jenkins console script (manage jenkins)
+
+::
+
+  def jobName = "windriver-longevity-stability72hr"=
+  def job = Jenkins.instance.getItem(jobName)
+  job.getBuilds().each { it.delete() }
+  job.nextBuildNumber = 1
+  job.save()
+
+
+appc.properties updated to apply the fix for DMaaP message processing to call
+http://localhost:8181 for the streams update.
 
 Results: 100% PASS
 ~~~~~~~~~~~~~~~~~~
 =================== ======== ========== ======== ========= =========
 Test Case           Attempts Env Issues Failures Successes Pass Rate
 =================== ======== ========== ======== ========= =========
-Stability 72 hours  72       34         0        38        100%
-vFW Closed Loop     75       7          0        68        100%
-**Total**           147      41         0        106       **100%**
+Stability 72 hours  77       19         0        58        100%
+vFW Closed Loop     60       0          0        100       100%
+**Total**           137      19         0        158       **100%**
 =================== ======== ========== ======== ========= =========
 
-Detailed results can be found at https://wiki.onap.org/display/DW/Dublin+Release+Stability+Testing+Status .
+Detailed results can be found at https://wiki.onap.org/display/DW/Frankfurt+Stability+Run+Notes
 
 .. note::
  - Overall results were good. All of the test failures were due to
@@ -52,48 +130,68 @@ Detailed results can be found at https://wiki.onap.org/display/DW/Dublin+Release
  - The vFW Closed Loop test was very stable and self recovered from
    environment issues.
 
+Resources overview
+~~~~~~~~~~~~~~~~~~
+============ ====================== =========== ========== ==========
+Date          #1 CPU                #1 RAM      CPU*       RAM**
+============ ====================== =========== ========== ==========
+May 20 18:45 dcae-tca-anaytics:511m appc:2901Mi 1649       36092
+May 21 12:33 dcae-tca-anaytics:664m appc:2901Mi 1605       38221
+May 22 09:35 dcae-tca-anaytics:425m appc:2837Mi 1459       38488
+May 23 11:01 cassandra-1:371m       appc:2849Mi 1829       39431
+============ ====================== =========== ========== ==========
+
+.. note::
+  - Results are given from the command "kubectl -n onap top pods | sort -rn -k 3
+    | head -20"
+  - * sum of the top 20 CPU consumption
+  - ** sum of the top 20 RAM consumption
+
+CI results
+==========
+
+A daily Frankfurt CI chain has been created after RC0.
+
+The evolution of the full healthcheck test suite can be described as follows:
+
+|image1|
+
+Full healthcheck testsuite verifies the status of each component. It is
+composed of 47 tests. The success rate from the 9th to the 28th was never under
+95%.
+
+4 test categories were defined:
+
+- infrastructure healthcheck: test of ONAP kubernetes cluster and help chart status
+- healthcheck tests: verification of the components in the target deployment
+  environment
+- smoke tests: basic VM tests (including onboarding/distribution/instantiation),
+  and automated use cases (pnf-registrate, hvves, 5gbulkpm)
+- security tests
+
+The security target (66% for Frankfurt) was reached after the RC1. A regression
+due to the automation of the hvves use case (triggering the exposition of a
+public port in HTTP) was fixed on the 28th of May.
+
+|image2|
+
+Orange Openlab
+==============
+
+The Orange Openlab is a community lab targeting ONAP end user. It provides an
+ONAP and cloud resources to discover ONAP.
+A Frankfurt pre-RC0 version was installed beginning of May. The usual gating
+testing suite was run daily in addition of the traffic generated by the lab
+users. The VM instantiation has been working well without any reinstallation
+over the **27** last days.
 
 Resilience
 ==========
 
-Integration Resilience Testing verifies that ONAP can automatically recover
-from failures of any of its components.
-This is done by deleting the ONAP pods that are involved in each particular Use
-Case flow and then checking that the Use Case flow can again be executed
-successfully after ONAP recovers.
+The resilience test executed in El Alto was not realized in Frankfurt.
 
-Methodology
-~~~~~~~~~~~
-For each Use Case, a list of the ONAP components involved is identified.
-The pods of each of those components are systematically deleted one-by-one;
-after each pod deletion, we wait for the pods to recover, then execute the Use
-Case again to verify successful ONAP platform recovery.
-
-
-Results: 99.4% PASS
-~~~~~~~~~~~~~~~~~~~
-=============================== ======== ========== ======== ========= =========
-Use Case                        Attempts Env Issues Failures Successes Pass Rate
-=============================== ======== ========== ======== ========= =========
-VNF Onboarding and Distribution 49       0          0        49        100%
-VNF Instantiation               64       19         1        44        97.8%
-vFW Closed Loop                 66       0          0        66        100%
-**Total**                       179      19         1        159       **99.4%**
-=============================== ======== ========== ======== ========= =========
-
-Detailed results can be found at https://wiki.onap.org/display/DW/Dublin+Release+Resilience+Testing+Status .
-
-
-Deployability
-=============
-
-Smaller ONAP container images footprint reduces resource consumption,
-time to deploy, time to heal, as well as scale out resources.
-
-Minimizing the footprint of ONAP container images reduces resource
-consumption, time to deploy, time and time to heal. It also reduces
-the resources needed to scale out and time to scale in. For those
-reasons footprint minimization postively impacts the scalability of
-the ONAP platform.  Smaller ONAP container images footprint reduces
-resource consumption, time to deploy, time to heal, as well as scale
-out resources.
+.. |image1| image:: files/s3p/daily_frankfurt1.png
+      :width: 6.5in
+
+.. |image2| image:: files/s3p/daily_frankfurt2.png
+      :width: 6.5in
author	mrichomme <morgan.richomme@orange.com>	2020-05-28 22:38:30 +0200
committer	Marcin Przybysz <marcin.przybysz@nokia.com>	2020-05-29 14:04:39 +0000
commit	ac588d62a90e7badc32acf476e0ce391e8bdd31b (patch)
tree	7b033673ec6f95085c4edd6b734e4d2dbfdbf1dc /docs/integration-s3p.rst
parent	c01b1fb2ff905d1f39437f7b1106b3e9bd6d7338 (diff)