diff options
-rw-r--r-- | docs/development/devtools/drools-s3p.rst | 168 | ||||
-rw-r--r-- | docs/development/devtools/images/s3p-drools-1.png | bin | 0 -> 234824 bytes | |||
-rw-r--r-- | docs/development/devtools/images/s3p-drools-2.png | bin | 0 -> 248426 bytes | |||
-rw-r--r-- | docs/development/devtools/images/s3p-drools-3.png | bin | 0 -> 160364 bytes | |||
-rw-r--r-- | docs/development/devtools/images/s3p-drools-4.png | bin | 0 -> 200544 bytes | |||
-rw-r--r-- | docs/development/prometheus-metrics.rst | 21 |
6 files changed, 41 insertions, 148 deletions
diff --git a/docs/development/devtools/drools-s3p.rst b/docs/development/devtools/drools-s3p.rst index 22c1b47d..571e09a3 100644 --- a/docs/development/devtools/drools-s3p.rst +++ b/docs/development/devtools/drools-s3p.rst @@ -10,32 +10,20 @@ Policy Drools PDP component ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Both the Performance and the Stability tests were executed against an ONAP installation in the policy-k8s tenant -in the windriver lab, from an independent VM running the jmeter tool to inject the load. +Both the Performance and the Stability tests were executed against an ONAP installation in the Policy tenant +in the UNH lab, from the admin VM running the jmeter tool to inject the load. General Setup ************* -The installation runs the following components in a single VM: - -- AAF -- AAI -- DMAAP -- POLICY - -The VM has the following hardware spec: - -- 126GB RAM -- 12 VCPUs -- 155GB Ephemeral Disk - -Jmeter is run from a different VM with the following configuration: +Agent VMs in this lab have the following configuration: - 16GB RAM -- 8 VCPUs -- 155GB Ephemeral Disk +- 8 VCPU -The drools-pdp container uses the JVM memory settings from a default OOM installation. +Jmeter is run from the admin VM. + +The drools-pdp container uses the JVM memory and CPU settings from the default OOM installation. Other ONAP components exercised during the stability tests were: @@ -51,22 +39,6 @@ The following components are simulated during the tests. - APPC responses for the vCPE and vFW use cases. - AAI to answer queries for the use cases under test. -SO, and AAI actors were simulated within the PDP-D JVM by enabling the -feature-controlloop-utils before running the tests. - -PDP-D Setup -*********** - -The kubernetes charts were modified previous to the installation -to add the following script that enables the controlloop-utils feature: - -.. code-block:: bash - - oom/kubernetes/policy/charts/drools/resources/configmaps/features.pre.sh: - - #!/bin/sh - sh -c "features enable controlloop-utils" - Stability Test of Policy PDP-D ****************************** @@ -82,132 +54,38 @@ The tests focused on the following use cases: For 72 hours the following 5 scenarios ran in parallel: - vCPE success scenario -- vCPE failure scenario (failure returned by simulated APPC recipient through DMaaP). - vDNS success scenario. -- vDNS failure scenario (failure by introducing in the DCAE ONSET a non-existent vserver-name reference). - vFirewall success scenario. +- vCPE failure scenario (simulates a failure scenario returned by simulated APPC recipient through DMaaP). +- vDNS failure scenario (simulates a failure by introducing in the DCAE ONSET a non-existent vserver-name reference). Five threads ran in parallel, one for each scenario, back to back with no pauses. The transactions were initiated by each jmeter thread group. Each thread initiated a transaction, monitored the transaction, and as soon as the transaction ending was detected, it initiated the next one. -JMeter was run in a docker container with the following command: - -.. code-block:: bash - - docker run --interactive --tty --name jmeter --rm --volume $PWD:/jmeter -e VERBOSE_GC="" egaillardon/jmeter-plugins --nongui --testfile s3p.jmx --loglevel WARN - -The results were accessed by using the telemetry API to gather statistics: - - -vCPE Success scenario -===================== - -ControlLoop-vCPE-48f0c2c3-a172-4192-9ae3-052274181b6e: - -.. code-block:: bash - - # Times are in milliseconds - - Control Loop Name: ControlLoop-vCPE-48f0c2c3-a172-4192-9ae3-052274181b6e - Number of Transactions Executed: 114007 - Number of Successful Transactions: 112727 - Number of Failure Transactions: 1280 - Average Execution Time: 434.9942021103967 ms. - - -vCPE Failure scenario -===================== - -ControlLoop-vCPE-Fail: +The results are illustrated on the following graphs: -.. code-block:: bash - - # Times are in milliseconds - - Control Loop Name: ControlLoop-vCPE-Fail - Number of Transactions Executed: 114367 - Number of Successful Transactions: 114367 (failure transactions are expected) - Number of Failure Transactions: 0 (success transactions are not expected) - Average Execution Time: 433.61750330077734 ms. - - -vDNS Success scenario -===================== - -ControlLoop-vDNS-6f37f56d-a87d-4b85-b6a9-cc953cf779b3: - -.. code-block:: bash - - # Times are in milliseconds - - Control Loop Name: ControlLoop-vDNS-6f37f56d-a87d-4b85-b6a9-cc953cf779b3 - Number of Transactions Executed: 237512 - Number of Successful Transactions: 229532 - Number of Failure Transactions: 7980 - Average Execution Time: 268.028794334602 ms. - - -vDNS Failure scenario -===================== - -ControlLoop-vDNS-Fail: - -.. code-block:: bash - - # Times are in milliseconds - - Control Loop Name: ControlLoop-vDNS-Fail - Number of Transactions Executed: 1957987 - Number of Successful Transactions: 1957987 (failure transactions are expected) - Number of Failure Transactions: 0 (success transactions are not expected) - Average Execution Time: 39.369322166081794 - - -vFirewall Success scenario -========================== - -ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a: - -.. code-block:: bash - - # Times are in milliseconds - - Control Loop Name: ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a - Number of Transactions Executed: 120308 - Number of Successful Transactions: 118895 - Number of Failure Transactions: 1413 - Average Execution Time: 394.8609236293513 ms. +.. image:: images/s3p-drools-1.png +.. image:: images/s3p-drools-2.png +.. image:: images/s3p-drools-3.png +.. image:: images/s3p-drools-4.png Commentary ========== -There has been a degradation of performance observed in this release -when compared with the previous one. -Approximately 1% of transactions were not completed as expected for -some use cases. Average Execution Times are extended as well. -The unexpected results seem to point in the direction of the -interactions of the distributed locking feature with the database. -These areas as well as the conditions for the test need to be investigated -further. +There is around 1% unexpected failures during the 72-hour run. This can also be seen in the +final output of jmeter: .. code-block:: bash - # Common pattern in the audit.log for unexpected transaction completions - - a8d637fc-a2d5-49f9-868b-5b39f7befe25||ControlLoop-vFirewall-d0a1dfc6-94f5-4fd4-a5b5-4630b438850a| - policy:usecases:[org.onap.policy.drools-applications.controlloop.common:controller-usecases:1.9.0:usecases]| - 2021-10-12T19:48:02.052+00:00|2021-10-12T19:48:02.052+00:00|0| - null:operational.modifyconfig.EVENT.MANAGER.FINAL:1.0.0|dev-policy-drools-pdp-0| - ERROR|400|Target Lock was lost|||VNF.generic-vnf.vnf-name||dev-policy-drools-pdp-0|| - dev-policy-drools-pdp-0|microservice.stringmatcher| - {vserver.prov-status=ACTIVE, vserver.is-closed-loop-disabled=false, - generic-vnf.vnf-name=fw0002vm002fw002, vserver.vserver-name=OzVServer}|||| - INFO|Session org.onap.policy.drools-applications.controlloop.common:controller-usecases:1.9.0:usecases| - - # The "Target Lock was lost" is a common message error in the unexpected results. + summary = 37705505 in 72:00:56 = 145.4/s Avg: 30 Min: 0 Max: 20345 Err: 360852 (0.96%) +The 1% errors were found to be related to the nature of the run, where each one of the 5 use case +threads run without pauses starting one after the other a new round of their assigned control loop. +It has been found that at times, the release time of the lock (which requires DB operations) outruns +the initiation of the next control loop (using the same resource), therefore the newly initiated control +loop fails. In reality, this scenario with the same resource being used back to back in consecutive control +loop rounds will be unlikely. -END-OF-DOCUMENT diff --git a/docs/development/devtools/images/s3p-drools-1.png b/docs/development/devtools/images/s3p-drools-1.png Binary files differnew file mode 100644 index 00000000..5dc70c57 --- /dev/null +++ b/docs/development/devtools/images/s3p-drools-1.png diff --git a/docs/development/devtools/images/s3p-drools-2.png b/docs/development/devtools/images/s3p-drools-2.png Binary files differnew file mode 100644 index 00000000..e985a712 --- /dev/null +++ b/docs/development/devtools/images/s3p-drools-2.png diff --git a/docs/development/devtools/images/s3p-drools-3.png b/docs/development/devtools/images/s3p-drools-3.png Binary files differnew file mode 100644 index 00000000..8f2a1d4c --- /dev/null +++ b/docs/development/devtools/images/s3p-drools-3.png diff --git a/docs/development/devtools/images/s3p-drools-4.png b/docs/development/devtools/images/s3p-drools-4.png Binary files differnew file mode 100644 index 00000000..369d1f33 --- /dev/null +++ b/docs/development/devtools/images/s3p-drools-4.png diff --git a/docs/development/prometheus-metrics.rst b/docs/development/prometheus-metrics.rst index 84699853..39d0a71c 100644 --- a/docs/development/prometheus-metrics.rst +++ b/docs/development/prometheus-metrics.rst @@ -131,9 +131,6 @@ Key metrics for APEX-PDP | pdpa_engine_average_execution_time_seconds | Average time taken to execute an APEX policy in seconds | "engine_instance_id": ID of the engine thread | +---------------------------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+ -Key metrics for Drools PDP --------------------------- - Key metrics for XACML PDP ------------------------- @@ -146,7 +143,25 @@ Key metrics for XACML PDP +--------------------------------+---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | pdpx_policy_decisions_total | Counts the total number of decisions | permit: Counts the number of permit decisions; "deny": Counts the number of deny decisions; "indeterminant": Counts the number of indeterminant decisions; "not_applicable": Counts the number of not applicable decisions. | +--------------------------------+---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| logback_appender_total | Counts the log entries | level: Counts on a per log level basis. | ++--------------------------------+---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Key metrics for Drools PDP +-------------------------- ++-----------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ +| Metric name | Metric description |Metric labels | ++===============================================+=======================================================+=======================================================+ +| process_start_time_seconds | Uptime of policy-drools-pdp component in seconds. | | ++-----------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ +| pdpd_policy_deployments_total | Count of policy deployments | operation: deploy|undeploy, status: SUCCESS|FAILURE | ++-----------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ +| pdpd_policy_executions_latency_seconds_count | Count of policy executions | controller, controlloop, policy | ++-----------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ +| pdpd_policy_executions_latency_seconds_sum | Count of policy execution latency in seconds | controller, controlloop, policy | ++-----------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ +| logback_appender_total | Count of log entries | level | ++-----------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ Key metrics for Policy Distribution ----------------------------------- |