diff options
15 files changed, 132 insertions, 110 deletions
diff --git a/docs/development/devtools/api-s3p-results/api-response-time-distribution_I.png b/docs/development/devtools/api-s3p-results/api-response-time-distribution_I.png Binary files differnew file mode 100644 index 00000000..b53da6ba --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api-response-time-distribution_I.png diff --git a/docs/development/devtools/api-s3p-results/api-response-time-distribution_performance_I.png b/docs/development/devtools/api-s3p-results/api-response-time-distribution_performance_I.png Binary files differnew file mode 100644 index 00000000..e65cc151 --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api-response-time-distribution_performance_I.png diff --git a/docs/development/devtools/api-s3p-results/api-response-time-overtime_I.png b/docs/development/devtools/api-s3p-results/api-response-time-overtime_I.png Binary files differnew file mode 100644 index 00000000..ad2f62e9 --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api-response-time-overtime_I.png diff --git a/docs/development/devtools/api-s3p-results/api-response-time-overtime_performance_I.png b/docs/development/devtools/api-s3p-results/api-response-time-overtime_performance_I.png Binary files differnew file mode 100644 index 00000000..91d291bd --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api-response-time-overtime_performance_I.png diff --git a/docs/development/devtools/api-s3p-results/api-s3p-jm-1_I.png b/docs/development/devtools/api-s3p-results/api-s3p-jm-1_I.png Binary files differnew file mode 100644 index 00000000..d38df8dc --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api-s3p-jm-1_I.png diff --git a/docs/development/devtools/api-s3p-results/api-s3p-jm-2_I.png b/docs/development/devtools/api-s3p-results/api-s3p-jm-2_I.png Binary files differnew file mode 100644 index 00000000..1c85d949 --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api-s3p-jm-2_I.png diff --git a/docs/development/devtools/api-s3p-results/api_top_after_72h.JPG b/docs/development/devtools/api-s3p-results/api_top_after_72h.JPG Binary files differnew file mode 100644 index 00000000..c69b2d0f --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api_top_after_72h.JPG diff --git a/docs/development/devtools/api-s3p-results/api_top_before_72h.JPG b/docs/development/devtools/api-s3p-results/api_top_before_72h.JPG Binary files differnew file mode 100644 index 00000000..c2d17afa --- /dev/null +++ b/docs/development/devtools/api-s3p-results/api_top_before_72h.JPG diff --git a/docs/development/devtools/api-s3p.rst b/docs/development/devtools/api-s3p.rst index 55867b44..1c66297b 100644 --- a/docs/development/devtools/api-s3p.rst +++ b/docs/development/devtools/api-s3p.rst @@ -33,8 +33,9 @@ with the following command: .. code-block:: bash - jmeter.sh --nongui --testfile policy_api_stability.jmx --logfile result.jtl + nohup ./apache-jmeter-5.4.1/bin/jmeter.sh -n -t policy_api_stability.jmx -l stabilityTestResultsPolicyApi.jtl +The test was run in the background via “nohup”, to prevent it from being interrupted. Test Plan --------- @@ -54,6 +55,7 @@ of each entity is set to the running thread number. **API Test Flow (5 threads running the same steps in the same loop)** +- Get Policy Metrics - Create a new Monitoring Policy Type with Version 6.0.# - Create a new Monitoring Policy Type with Version 7.0.# - Create a new Optimization Policy Type with Version 6.0.# @@ -118,18 +120,31 @@ The load was performed against a non-tweaked ONAP OOM installation. ======================= ============= =========== =============================== =============================== =============================== **Total # of requests** **Success %** **TPS** **Avg. time taken per request** **Min. time taken per request** **Max. time taken per request** ======================= ============= =========== =============================== =============================== =============================== - 627746 100% 2.42 2058 ms 26 ms 72809 ms + 242277 100% 0.935 5340 ms 1 ms 736976 ms ======================= ============= =========== =============================== =============================== =============================== -.. image:: images/api-s3p-jm-1_H.png +.. image:: api-s3p-results/api-s3p-jm-1_I.png **JMeter Results** The following graphs show the response time distributions. The "Get Policy Types" API calls are the most expensive calls that -average a 10 seconds plus response time. +average a 7 seconds plus response time. -.. image:: images/api-response-time-distribution_H.png -.. image:: images/api-response-time-overtime_H.png +.. image:: api-s3p-results/api-response-time-distribution_I.png +.. image:: api-s3p-results/api-response-time-overtime_I.png + +**Memory and CPU usage** + +The memory and CPU usage can be monitored by running "top" command in the policy-api pod. +A snapshot is taken before and after test execution to monitor the changes in resource utilization. + +Memory and CPU usage before test execution: + +.. image:: api-s3p-results/api_top_before_72h.JPG + +Memory and CPU usage after test execution: + +.. image:: api-s3p-results/api_top_after_72h.JPG Performance Test of Policy API @@ -151,8 +166,9 @@ with the following command: .. code-block:: bash - jmeter.sh --nongui --testfile policy_api_performance.jmx --logfile result.jtl + nohup ./apache-jmeter-5.4.1/bin/jmeter.sh -n -t policy_api_performance.jmx -l performanceTestResultsPolicyApi.jtl +The test was run in the background via “nohup”, to prevent it from being interrupted. Test Plan --------- @@ -170,18 +186,18 @@ Running/Triggering performance test will be the same as stability test. That is, ======================= ============= =========== =============================== =============================== =============================== **Total # of requests** **Success %** **TPS** **Avg. time taken per request** **Min. time taken per request** **Max. time taken per request** ======================= ============= =========== =============================== =============================== =============================== - 4082 100% 0.45 1297 ms 4 ms 63612 ms + 2822 100% 0.31 63794 ms 2 ms 1183376 ms ======================= ============= =========== =============================== =============================== =============================== -.. image:: images/api-s3p-jm-2_H.png +.. image:: api-s3p-results/api-s3p-jm-2_I.png Test Results ------------ The following graphs show the response time distributions. -.. image:: images/api-response-time-distribution_performance_H.png -.. image:: images/api-response-time-overtime_performance_H.png +.. image:: api-s3p-results/api-response-time-distribution_performance_I.png +.. image:: api-s3p-results/api-response-time-overtime_performance_I.png diff --git a/docs/development/devtools/drools-s3p.rst b/docs/development/devtools/drools-s3p.rst index 303a6beb..1586379e 100644 --- a/docs/development/devtools/drools-s3p.rst +++ b/docs/development/devtools/drools-s3p.rst @@ -10,31 +10,34 @@ Policy Drools PDP component ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Both the Performance and the Stability tests were executed against a default ONAP installation in the policy-k8s tenant in the windriver lab, from an independent VM running the jmeter tool to inject the load. +Both the Performance and the Stability tests were executed against an ONAP installation in the policy-k8s tenant +in the windriver lab, from an independent VM running the jmeter tool to inject the load. General Setup ************* -The kubernetes installation allocated all policy components in the same worker node VM and some additional ones. -The worker VM hosting the policy components has the following spec: +The installation runs the following components in a single VM: -- 16GB RAM -- 8 VCPU -- 160GB Ephemeral Disk +- AAF +- AAI +- DMAAP +- POLICY -The standalone VM designated to run jmeter has the same configuration. The jmeter JVM -was instantiated with a max heap configuration of 12G. +The VM has the following hardware spec: -The drools-pdp container uses the default JVM memory settings from a default OOM installation: +- 126GB RAM +- 12 VCPUs +- 155GB Ephemeral Disk -.. code-block:: bash +Jmeter is run from a different VM with the following configuration: - VM settings: - Max. Heap Size (Estimated): 989.88M - Using VM: OpenJDK 64-Bit Server VM +- 16GB RAM +- 8 VCPUs +- 155GB Ephemeral Disk +The drools-pdp container uses the JVM memory settings from a default OOM installation. -Other ONAP components used during the stability tests are: +Other ONAP components exercised during the stability tests were: - Policy XACML PDP to process guard queries for each transaction. - DMaaP to carry PDP-D and jmeter initiated traffic to complete transactions. @@ -48,19 +51,14 @@ The following components are simulated during the tests. - APPC responses for the vCPE and vFW use cases. - AAI to answer queries for the use cases under test. -In order to avoid interferences with the APPC component while running the tests, -the APPC component was disabled. - SO, and AAI actors were simulated within the PDP-D JVM by enabling the feature-controlloop-utils before running the tests. PDP-D Setup *********** -The kubernetes charts were modified previous to the installation with -the changes below. - -The feature-controlloop-utils was started by adding the following script: +The kubernetes charts were modified previous to the installation +to add the following script that enables the controlloop-utils feature: .. code-block:: bash @@ -75,7 +73,7 @@ Stability Test of Policy PDP-D PDP-D performance ================= -The test set focused on the following use cases: +The tests focused on the following use cases: - vCPE - vDNS @@ -86,23 +84,20 @@ For 72 hours the following 5 scenarios ran in parallel: - vCPE success scenario - vCPE failure scenario (failure returned by simulated APPC recipient through DMaaP). - vDNS success scenario. -- vDNS failure scenario. +- vDNS failure scenario (failure by introducing in the DCAE ONSET a non-existant vserver-name reference). - vFirewall success scenario. -Five threads ran in parallel, one for each scenario. The transactions were initiated +Five threads ran in parallel, one for each scenario, back to back with no pauses. The transactions were initiated by each jmeter thread group. Each thread initiated a transaction, monitored the transaction, and -as soon as the transaction ending was detected, it initiated the next one, so back to back with no -pauses. +as soon as the transaction ending was detected, it initiated the next one. -All transactions completed successfully as it was expected in each scenario, with no failures. - -The command executed was +JMeter was run in a docker container with the following command: .. code-block:: bash - ./jmeter -n -t /home/ubuntu/drools-applications/testsuites/stability/src/main/resources/s3p.jmx -l /home/ubuntu/jmeter_result/jmeter.jtl -e -o /home/ubuntu/jmeter_result > /dev/null 2>&1 + docker run --interactive --tty --name jmeter --rm --volume $PWD:/jmeter -e VERBOSE_GC="" egaillardon/jmeter-plugins --nongui --testfile s3p.jmx --loglevel WARN -The results were computed by monitoring the statistics REST endpoint accessible through the telemetry shell or APIs. +The results were accessed by using the telemetry API to gather statistics: vCPE Success scenario @@ -114,20 +109,11 @@ ControlLoop-vCPE-48f0c2c3-a172-4192-9ae3-052274181b6e: # Times are in milliseconds - # Previous to the run, there was 1 failure as a consequence of testing - # the flows before the stability load was initiated. There was - # an additional failure encountered during the execution. - - "ControlLoop-vCPE-48f0c2c3-a172-4192-9ae3-052274181b6e": { - "policyExecutedCount": 161328, - "policyExecutedSuccessCount": 161326, - "totalElapsedTime": 44932780, - "averageExecutionTime": 278.5181741545175, - "birthTime": 1616092087842, - "lastStart": 1616356511841, - "lastExecutionTime": 1616356541972, - "policyExecutedFailCount": 2 - } + Control Loop Name: ControlLoop-vCPE-48f0c2c3-a172-4192-9ae3-052274181b6e + Number of Transactions Executed: 114007 + Number of Successful Transactions: 112727 + Number of Failure Transactions: 1280 + Average Execution Time: 434.9942021103967 ms. vCPE Failure scenario @@ -139,16 +125,12 @@ ControlLoop-vCPE-Fail: # Times are in milliseconds - "ControlLoop-vCPE-Fail": { - "policyExecutedCount": 250172, - "policyExecutedSuccessCount": 0, - "totalElapsedTime": 63258856, - "averageExecutionTime": 252.8614553187407, - "birthTime": 1616092143137, - "lastStart": 1616440688824, - "lastExecutionTime": 1616440689010, - "policyExecutedFailCount": 250172 - } + Control Loop Name: ControlLoop-vCPE-Fail + Number of Transactions Executed: 114367 + Number of Successful Transactions: 114367 (failure transactions are expected) + Number of Failure Transactions: 0 (success transactions are not expected) + Average Execution Time: 433.61750330077734 ms. + vDNS Success scenario ===================== @@ -159,16 +141,12 @@ ControlLoop-vDNS-6f37f56d-a87d-4b85-b6a9-cc953cf779b3: # Times are in milliseconds - "ControlLoop-vDNS-6f37f56d-a87d-4b85-b6a9-cc953cf779b3": { - "policyExecutedCount": 235438, - "policyExecutedSuccessCount": 235438, - "totalElapsedTime": 37564263, - "averageExecutionTime": 159.550552587093, - "birthTime": 1616092578063, - "lastStart": 1616356511253, - "lastExecutionTime": 1616356511653, - "policyExecutedFailCount": 0 - } + Control Loop Name: ControlLoop-vDNS-6f37f56d-a87d-4b85-b6a9-cc953cf779b3 + Number of Transactions Executed: 237512 + Number of Successful Transactions: 229532 + Number of Failure Transactions: 7980 + Average Execution Time: 268.028794334602 ms. + vDNS Failure scenario ===================== @@ -179,16 +157,11 @@ ControlLoop-vDNS-Fail: # Times are in milliseconds - "ControlLoop-vDNS-Fail": { - "policyExecutedCount": 2754574, - "policyExecutedSuccessCount": 0, - "totalElapsedTime": 14396495, - "averageExecutionTime": 5.22639616869977, - "birthTime": 1616092659237, - "lastStart": 1616440696444, - "lastExecutionTime": 1616440696444, - "policyExecutedFailCount": 2754574 - } + Control Loop Name: ControlLoop-vDNS-Fail + Number of Transactions Executed: 1957987 + Number of Successful Transactions: 1957987 (failure transactions are expected) + Number of Failure Transactions: 0 (success transactions are not expected) + Average Execution Time: 39.369322166081794 vFirewall Success scenario @@ -200,17 +173,41 @@ ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a: # Times are in milliseconds - # Previous to the run, there were 2 failures as a consequence of testing - # the flows before the stability load was initiated. There was - # an additional failure encountered during the execution. - - "ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a": { - "policyExecutedCount": 145197, - "policyExecutedSuccessCount": 145194, - "totalElapsedTime": 33100249, - "averageExecutionTime": 227.96785746261975, - "birthTime": 1616092985229, - "lastStart": 1616356511732, - "lastExecutionTime": 1616356541972, - "policyExecutedFailCount": 3 - } + Control Loop Name: ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a + Number of Transactions Executed: 120308 + Number of Successful Transactions: 118895 + Number of Failure Transactions: 1413 + Average Execution Time: 394.8609236293513 ms. + + +Commentary +========== + +There has been a degradation of performance observed in this release +when compared with the previous one. +Approximately 1% of transactions were not completed as expected for +some use cases. Average Execution Times are extended as well. +The unexpected results seem to point in the direction of the +interactions of the distributed locking feature with the database. +These areas as well as the conditions for the test need to be investigated +further. + +.. code-block:: bash + + # Common pattern in the audit.log for unexpected transaction completions + + a8d637fc-a2d5-49f9-868b-5b39f7befe25||ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a| + policy:usecases:[org.onap.policy.drools-applications.controlloop.common:controller-usecases:1.9.0:usecases]| + 2021-10-12T19:48:02.052+00:00|2021-10-12T19:48:02.052+00:00|0| + null:operational.modifyconfig.EVENT.MANAGER.FINAL:1.0.0|dev-policy-drools-pdp-0| + ERROR|400|Target Lock was lost|||VNF.generic-vnf.vnf-name||dev-policy-drools-pdp-0|| + dev-policy-drools-pdp-0|microservice.stringmatcher| + {vserver.prov-status=ACTIVE, vserver.is-closed-loop-disabled=false, + generic-vnf.vnf-name=fw0002vm002fw002, vserver.vserver-name=OzVServer}|||| + INFO|Session org.onap.policy.drools-applications.controlloop.common:controller-usecases:1.9.0:usecases| + + # The "Target Lock was lost" is a common message error in the unexpected results. + + +END-OF-DOCUMENT + diff --git a/docs/development/devtools/images/pap-s3p-mem-at.png b/docs/development/devtools/images/pap-s3p-mem-at.png Binary files differnew file mode 100644 index 00000000..dd880227 --- /dev/null +++ b/docs/development/devtools/images/pap-s3p-mem-at.png diff --git a/docs/development/devtools/images/pap-s3p-mem-bt.png b/docs/development/devtools/images/pap-s3p-mem-bt.png Binary files differnew file mode 100644 index 00000000..7c909831 --- /dev/null +++ b/docs/development/devtools/images/pap-s3p-mem-bt.png diff --git a/docs/development/devtools/images/pap-s3p-performance-result-jmeter.png b/docs/development/devtools/images/pap-s3p-performance-result-jmeter.png Binary files differnew file mode 100644 index 00000000..be8bd99e --- /dev/null +++ b/docs/development/devtools/images/pap-s3p-performance-result-jmeter.png diff --git a/docs/development/devtools/images/pap-s3p-stability-result-jmeter.png b/docs/development/devtools/images/pap-s3p-stability-result-jmeter.png Binary files differnew file mode 100644 index 00000000..5ebc769f --- /dev/null +++ b/docs/development/devtools/images/pap-s3p-stability-result-jmeter.png diff --git a/docs/development/devtools/pap-s3p.rst b/docs/development/devtools/pap-s3p.rst index 1c031775..6666fe78 100644 --- a/docs/development/devtools/pap-s3p.rst +++ b/docs/development/devtools/pap-s3p.rst @@ -35,12 +35,15 @@ The 72 hours stability test ran the following steps sequentially in a single thr - **Check PdpGroup Query** - makes a PdpGroup query request and verifies that PdpGroup is in the ACTIVE state. - **Deploy defaultDomain Policy** - deploys the policy defaultDomain in the existing PdpGroup - **Check status of defaultGroup** - checks the status of defaultGroup PdpGroup with the defaultDomain policy 1.0.0. +- **Check PdpGroup Audit defaultGroup** - checks the audit information for the defaultGroup PdpGroup. +- **Check PdpGroup Audit Policy (defaultGroup)** - checks the audit information for the defaultGroup PdpGroup with the defaultDomain policy 1.0.0. - **Create/Update PDP Group** - creates a new PDPGroup named sampleGroup. - **Check PdpGroup Query** - makes a PdpGroup query request and verifies that 2 PdpGroups are in the ACTIVE state and defaultGroup has a policy deployed on it. - **Deployment Update sampleDomain** - deploys the policy sampleDomain in sampleGroup PdpGroup using pap api - **Check status of sampleGroup** - checks the status of the sampleGroup PdpGroup. - **Check status of PdpGroups** - checks the status of both PdpGroups. - **Check PdpGroup Query** - makes a PdpGroup query request and verifies that the defaultGroup has a policy defaultDomain deployed on it and sampleGroup has policy sampleDomain deployed on it. +- **Check Audit** - checks the audit information for all PdpGroups. - **Check Consolidated Health** - checks the consolidated health status of all policy components. - **Check Deployed Policies** - checks for all the deployed policies using pap api. - **Undeploy Policy sampleDomain** - undeploys the policy sampleDomain from sampleGroup PdpGroup using pap api @@ -71,14 +74,14 @@ The test was run in the background via "nohup", to prevent it from being interru .. code-block:: bash - nohup ./jMeter/apache-jmeter-5.3/bin/jmeter.sh -n -t stabil.jmx -l testresults.jtl + nohup ./jMeter/apache-jmeter-5.3/bin/jmeter.sh -n -t stability.jmx -l testresults.jtl Test Results ------------ **Summary** -Stability test plan was triggered for 24 hours. +Stability test plan was triggered for 72 hours. .. Note:: @@ -95,18 +98,24 @@ Stability test plan was triggered for 24 hours. ======================= ================= ================== ================================== **Total # of requests** **Success %** **Error %** **Average time taken per request** ======================= ================= ================== ================================== -11921 100.00 % 0.00 % 382 ms +34053 99.14 % 0.86 % 1051 ms ======================= ================= ================== ================================== .. Note:: .. container:: paragraph - There were no failures during the 24 hours test. + There were some failures during the 72 hour stability tests. These tests were caused by the apex-pdp pods restarting + intermitently due to limited resources in our testing environment. The second apex instance was configured as a + replica of the apex-pdp pod and therefore, when it restarted, registered to the "defaultGroup" as the configuration + was taken from the original apex-pdp pod. This meant a manual change whenever the pods restarted to make apex-pdp-"2" + register with the "sampleGroup". + When both pods were running as expected, no errors relating to the pap functionality were observed. These errors are + strictly caused by the environment setup and not by pap. **JMeter Screenshot** -.. image:: images/pap-s3p-stability-result-jmeter.PNG +.. image:: images/pap-s3p-stability-result-jmeter.png **Memory and CPU usage** @@ -114,11 +123,11 @@ The memory and CPU usage can be monitored by running "top" command on the PAP po Memory and CPU usage before test execution: -.. image:: images/pap-s3p-mem-bt.PNG +.. image:: images/pap-s3p-mem-bt.png Memory and CPU usage after test execution: -.. image:: images/pap-s3p-mem-at.PNG +.. image:: images/pap-s3p-mem-at.png Performance Test of PAP @@ -152,7 +161,7 @@ Running/Triggering the performance test will be the same as the stability test. .. code-block:: bash - nohup ./jMeter/apache-jmeter-5.3/bin/jmeter.sh -n -t perf.jmx -l perftestresults.jtl + nohup ./jMeter/apache-jmeter-5.3/bin/jmeter.sh -n -t performance.jmx -l perftestresults.jtl Once the test execution is completed, execute the below script to get the statistics: @@ -171,9 +180,9 @@ Test results are shown as below. ======================= ================= ================== ================================== **Total # of requests** **Success %** **Error %** **Average time taken per request** ======================= ================= ================== ================================== -46314 100 % 0.00 % 1084 ms +24092 100 % 0.00 % 2467 ms ======================= ================= ================== ================================== **JMeter Screenshot** -.. image:: images/pap-s3p-performance-result-jmeter.PNG
\ No newline at end of file +.. image:: images/pap-s3p-performance-result-jmeter.png
\ No newline at end of file |