aboutsummaryrefslogtreecommitdiffstats
path: root/docs/drools/feature_statemgmt.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/drools/feature_statemgmt.rst')
-rw-r--r--docs/drools/feature_statemgmt.rst93
1 files changed, 45 insertions, 48 deletions
diff --git a/docs/drools/feature_statemgmt.rst b/docs/drools/feature_statemgmt.rst
index 960ff396..29497003 100644
--- a/docs/drools/feature_statemgmt.rst
+++ b/docs/drools/feature_statemgmt.rst
@@ -2,15 +2,15 @@
.. This work is licensed under a Creative Commons Attribution 4.0 International License.
.. http://creativecommons.org/licenses/by/4.0
+.. _feature-sm-label:
+
*************************
-Feature: State Management
+Feature: State Management
*************************
.. contents::
:depth: 2
-Summary
-^^^^^^^
The State Management Feature provides:
- Node-level health monitoring
@@ -24,11 +24,9 @@ The State Management Feature provides:
- Availability Status
- Standby Status
-Usage
-^^^^^
Enabling and Disabling Feature State Management
------------------------------------------------
+===============================================
The State Management Feature is enabled from the command line when logged in as policy after configuring the feature properties file (see Description Details section). From the command line:
@@ -56,34 +54,34 @@ The Drools PDP must be stopped prior to enabling/disabling features and then res
session-persistence 1.1.0-SNAPSHOT disabled
Description Details
-^^^^^^^^^^^^^^^^^^^
+~~~~~~~~~~~~~~~~~~~
State Model
------------
+"""""""""""
The state model follows the ITU X.731 standard for state management. The supported state values are:
**Administrative State:**
- Locked - All application transaction processing is prohibited
- Unlocked - Application transaction processing is allowed
-
+
**Administrative State Transitions:**
- The transition from Unlocked to Locked state is triggered with a Lock operation
- The transition from the Locked to Unlocked state is triggered with an Unlock operation
**Operational State:**
- Enabled - The node is healthy and able to process application transactions
- - Disabled - The node is not healthy and not able to process application transactions
+ - Disabled - The node is not healthy and not able to process application transactions
**Operational State Transitions:**
- The transition from Enabled to Disabled is triggered with a disableFailed or disableDependency operation
- The transition from Disabled to Enabled is triggered with an enableNotFailed and enableNoDependency operation
-
+
**Availability Status:**
- Null - The Operational State is Enabled
- Failed - The Operational State is Disabled because the node is no longer healthy
- Dependency - The Operational State is Disabled because all members of a dependency group are disabled
- Dependency.Failed - The Operational State is Disabled because the node is no longer healthy and all members of a dependency group are disabled
-
+
**Availability Status Transitions:**
- The transition from Null to Failed is triggered with a disableFailed operation
- The transtion from Null to Dependency is triggered with a disableDependency operation
@@ -93,13 +91,13 @@ The state model follows the ITU X.731 standard for state management. The suppor
- The transition from Dependency.Failed to Dependency is triggered with an enableNotFailed operation
- The transition from Failed to Null is triggered with an enableNotFailed operation
- The transition from Dependency to Null is triggered with an enableNoDependency operation
-
+
**Standby Status:**
- Null - The node does not support active-standby behavior
- ProvidingService - The node is actively providing application transaction service
- HotStandby - The node is capable of providing application transaction service, but is currently waiting to be promoted
- ColdStandby - The node is not capable of providing application service because of a failure
-
+
**Standby Status Transitions:**
- The transition from Null to HotStandby is triggered by a demote operation when the Operational State is Enabled
- The transition for Null to ColdStandby is triggered is a demote operation when the Operational State is Disabled
@@ -110,7 +108,7 @@ The state model follows the ITU X.731 standard for state management. The suppor
- The transition from ProvidingService to HotStandby is triggered by a Demote operation
Database
---------
+~~~~~~~~
The State Management feature creates a StateManagement database having three tables:
@@ -130,7 +128,7 @@ The State Management feature creates a StateManagement database having three tab
- **fpc_count** - A forward progress counter which is periodically incremented if the node is healthy
- **created_date** - The timestamp the resource entry was created
- **last_updated** - The timestamp the resource entry was last updated
-
+
**ResourceRegistrationEntity** - This table has the following columns:
- **ResourceRegistrationId** - Automatically created unique identifier
- **resourceName** - The unique identifier for a node
@@ -141,16 +139,16 @@ The State Management feature creates a StateManagement database having three tab
- **last_updated** - The timestamp the resource entry was last updated
Node Health Monitoring
-----------------------
+~~~~~~~~~~~~~~~~~~~~~~
**Application Monitoring**
-
- Application monitoring can be implemented using the *startTransaction()* and *endTransaction()* methods. Whenever a transaction is started, the *startTransaction()* method is called. If the node is locked, disabled or in a hot/cold standby state, the method will throw an exception. Otherwise, it resets the timer which triggers the default *testTransaction()* method.
-
+
+ Application monitoring can be implemented using the *startTransaction()* and *endTransaction()* methods. Whenever a transaction is started, the *startTransaction()* method is called. If the node is locked, disabled or in a hot/cold standby state, the method will throw an exception. Otherwise, it resets the timer which triggers the default *testTransaction()* method.
+
When a transaction completes, calling *endTransaction()* increments the forward process counter in the *ForwardProgressEntity* DB table. As long as this counter is updating, the integrity monitor will assume the node is healthy/sane.
-
+
If the *startTransaction()* method is not called within a provisioned period of time, a timer will expire which calls the *testTransaction()* method. The default implementation of this method simply increments the forward progress counter. The *testTransaction()* method may be overwritten to perform a more meaningful test of system sanity, if desired.
-
+
If the forward progress counter stops incrementing, the integrity monitoring routine will assume the node application has lost sanity and it will trigger a *statechange* (disableFailed) to cause the operational state to become disabled and the availability status attribute to become failed. Once the forward progress counter again begins incrementing, the operational state will return to enabled.
**Application Monitoring with AllSeemsWell**
@@ -177,9 +175,9 @@ Node Health Monitoring
Site Manager
-------------
+~~~~~~~~~~~~
-The Site Manager is not deployed with the Drools PDP, but it is available in the policy/common repository in the site-manager directory.
+The Site Manager is not deployed with the Drools PDP, but it is available in the policy/common repository in the site-manager directory.
The Site Manager provides a lock/unlock interface for nodes and a way to display node information and status.
The following is from the README file included with the Site Manager.
@@ -189,33 +187,33 @@ The following is from the README file included with the Site Manager.
Before using 'siteManager', the file 'siteManager.properties' needs to be
edited to configure the parameters used to access the database:
-
+
javax.persistence.jdbc.driver - typically 'org.mariadb.jdbc.Driver'
-
+
javax.persistence.jdbc.url - URL referring to the database,
which typically has the form: 'jdbc:mariadb://<host>:<port>/<db>'
('<db>' is probably 'xacml' in this case)
-
+
javax.persistence.jdbc.user - the user id for accessing the database
-
+
javax.persistence.jdbc.password - password for accessing the database
-
+
Once the properties file has been updated, the 'siteManager' script can be
invoked as follows:
-
+
siteManager show [ -s <site> | -r <resourceName> ] :
display node information (Site, NodeType, ResourceName, AdminState,
OpState, AvailStatus, StandbyStatus)
-
+
siteManager setAdminState { -s <site> | -r <resourceName> } <new-state> :
update admin state on selected nodes
-
+
siteManager lock { -s <site> | -r <resourceName> } :
lock selected nodes
-
+
siteManager unlock { -s <site> | -r <resourceName> } :
unlock selected nodes
-
+
Note that the 'siteManager' script assumes that the script,
'site-manager-${project.version}.jar' file and 'siteManager.properties' file
are all in the same directory. If the files are separated, the 'siteManager'
@@ -223,7 +221,7 @@ script will need to be modified so it can locate the jar and properties files.
Properties
-----------
+~~~~~~~~~~
The feature-state-mangement.properties file controls the function of the State Management Feature. In general, the properties have adequate descriptions in the file. Parameters which must be replaced prior to usage are indicated thus: ${{parameter to be replaced}}.
@@ -235,7 +233,7 @@ The feature-state-mangement.properties file controls the function of the State M
javax.persistence.jdbc.url=jdbc:mariadb://${{SQL_HOST}}:3306/statemanagement
javax.persistence.jdbc.user=${{SQL_USER}}
javax.persistence.jdbc.password=${{SQL_PASSWORD}}
-
+
# DroolsPDPIntegrityMonitor Properties
# Test interface host and port defaults may be overwritten here
http.server.services.TEST.host=0.0.0.0
@@ -244,9 +242,9 @@ The feature-state-mangement.properties file controls the function of the State M
# http.server.services.TEST.restClasses=org.onap.policy.drools.statemanagement.IntegrityMonitorRestManager
# http.server.services.TEST.managed=false
# http.server.services.TEST.swagger=true
-
+
#IntegrityMonitor Properties
-
+
# Must be unique across the system
resource.name=pdp1
# Name of the site in which this node is hosted
@@ -259,9 +257,9 @@ The feature-state-mangement.properties file controls the function of the State M
test_trans_interval=10
# Interval between writes of the FPC to the DB seconds
write_fpc_interval=5
- # Node type Note: Make sure you don't leave any trailing spaces, or you'll get an 'invalid node type' error!
+ # Node type Note: Make sure you don't leave any trailing spaces, or you'll get an 'invalid node type' error!
node_type=pdp_drools
- # Dependency groups are groups of resources upon which a node operational state is dependent upon.
+ # Dependency groups are groups of resources upon which a node operational state is dependent upon.
# Each group is a comma-separated list of resource names and groups are separated by a semicolon. For example:
# dependency_groups=site_1.astra_1,site_1.astra_2;site_1.brms_1,site_1.brms_2;site_1.logparser_1;site_1.pypdp_1
dependency_groups=
@@ -270,18 +268,17 @@ The feature-state-mangement.properties file controls the function of the State M
test_via_jmx=true
# This is the max number of seconds beyond which a non incrementing FPC is considered a failure
max_fpc_update_interval=120
- # Run the state audit every 60 seconds (60000 ms). The state audit finds stale DB entries in the
- # forwardprogressentity table and marks the node as disabled/failed in the statemanagemententity
+ # Run the state audit every 60 seconds (60000 ms). The state audit finds stale DB entries in the
+ # forwardprogressentity table and marks the node as disabled/failed in the statemanagemententity
# table. NOTE! It will only run on nodes that have a standbystatus = providingservice.
# A value of <= 0 will turn off the state audit.
state_audit_interval_ms=60000
- # The refresh state audit is run every (default) 10 minutes (600000 ms) to clean up any state corruption in the
+ # The refresh state audit is run every (default) 10 minutes (600000 ms) to clean up any state corruption in the
# DB statemanagemententity table. It only refreshes the DB state entry for the local node. That is, it does not
- # refresh the state of any other nodes. A value <= 0 will turn the audit off. Any other value will override
+ # refresh the state of any other nodes. A value <= 0 will turn the audit off. Any other value will override
# the default of 600000 ms.
refresh_state_audit_interval_ms=600000
-
-
+
# Repository audit properties
# Assume it's the releaseRepository that needs to be audited,
# because that's the one BRMGW will publish to.
@@ -293,14 +290,14 @@ The feature-state-mangement.properties file controls the function of the State M
repository2.audit.url=${{releaseRepository2Url}}
repository2.audit.username=${{repositoryUsername2}}
repository2.audit.password=${{repositoryPassword2}}
-
+
# Repository Audit Properties
# Flag to control the execution of the subsystemTest for the Nexus Maven repository
repository.audit.is.active=false
repository.audit.ignore.errors=true
repository.audit.interval_sec=86400
repository.audit.failure.threshold=3
-
+
# DB Audit Properties
# Flag to control the execution of the subsystemTest for the Database
db.audit.is.active=false