summaryrefslogtreecommitdiffstats
path: root/docs/Chapter4/Resiliency.rst
diff options
context:
space:
mode:
authorstark, steven <ss820f@att.com>2018-06-26 13:34:59 -0700
committerstark, steven <ss820f@att.com>2018-06-27 12:18:17 -0700
commit2cbffc5b71c88fd858b654335731ea72fd80220c (patch)
tree695110bcb0d91fa22f607363b9803643dc67d233 /docs/Chapter4/Resiliency.rst
parent8274f893dd8e46a320a3a2b7a5c44430a8d4e765 (diff)
[VNFRQTS] break up larger rst files into toxtree
Broke all chapter files up so they follow the same patter Change-Id: I8a2152b92f0568cf4858615054bb66fabf0ea343 Issue-ID: VNFRQTS-253 Signed-off-by: stark, steven <ss820f@att.com>
Diffstat (limited to 'docs/Chapter4/Resiliency.rst')
-rw-r--r--docs/Chapter4/Resiliency.rst301
1 files changed, 301 insertions, 0 deletions
diff --git a/docs/Chapter4/Resiliency.rst b/docs/Chapter4/Resiliency.rst
new file mode 100644
index 0000000..bf3e2e8
--- /dev/null
+++ b/docs/Chapter4/Resiliency.rst
@@ -0,0 +1,301 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. Copyright 2017 AT&T Intellectual Property. All rights reserved.
+
+VNF Resiliency
+-------------------------
+
+The VNF is responsible for meeting its resiliency goals and must factor
+in expected availability of the targeted virtualization environment.
+This is likely to be much lower than found in a traditional data center.
+Resiliency is defined as the ability of the VNF to respond to error
+conditions and continue to provide the service intended. A number of
+software resiliency dimensions have been identified as areas that should
+be addressed to increase resiliency. As VNFs are deployed into the
+Network Cloud, resiliency must be designed into the VNF software to
+provide high availability versus relying on the Network Cloud to achieve
+that end.
+
+Section 5.a Resiliency in *VNF Guidelines* describes
+the overall guidelines for designing VNFs to meet resiliency goals.
+Below are more detailed resiliency requirements for VNFs.
+
+All Layer Redundancy
+^^^^^^^^^^^^^^^^^^^^^^
+
+Design the VNF to be resilient to the failures of the underlying
+virtualized infrastructure (Network Cloud). VNF design considerations
+would include techniques such as multiple vLANs, multiple local and
+geographic instances, multiple local and geographic data replication,
+and virtualized services such as Load Balancers.
+
+
+All Layer Redundancy Requirements
+
+* R-52499 The VNF **MUST** meet their own resiliency goals and not rely
+ on the Network Cloud.
+* R-42207 The VNF **MUST** design resiliency into a VNF such that the
+ resiliency deployment model (e.g., active-active) can be chosen at
+ run-time.
+* R-03954 The VNF **MUST** survive any single points of failure within
+ the Network Cloud (e.g., virtual NIC, VM, disk failure).
+* R-89010 The VNF **MUST** survive any single points of software failure
+ internal to the VNF (e.g., in memory structures, JMS message queues).
+* R-67709 The VNF **MUST** be designed, built and packaged to enable
+ deployment across multiple fault zones (e.g., VNFCs deployed in
+ different servers, racks, OpenStack regions, geographies) so that
+ in the event of a planned/unplanned downtime of a fault zone, the
+ overall operation/throughput of the VNF is maintained.
+* R-35291 The VNF **MUST** support the ability to failover a VNFC
+ automatically to other geographically redundant sites if not
+ deployed active-active to increase the overall resiliency of the VNF.
+* R-36843 The VNF **MUST** support the ability of the VNFC to be deployable
+ in multi-zoned cloud sites to allow for site support in the event of cloud
+ zone failure or upgrades.
+* R-00098 The VNF **MUST NOT** impact the ability of the VNF to provide
+ service/function due to a single container restart.
+* R-79952 The VNF **SHOULD** support container snapshots if not for rebuild
+ and evacuate for rollback or back out mechanism.
+
+Minimize Cross Data-Center Traffic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Avoid performance-sapping data center-to-data center replication delay
+by applying techniques such as caching and persistent transaction paths
+- Eliminate replication delay impact between data centers by using a
+concept of stickiness (i.e., once a client is routed to data center "A",
+the client will stay with Data center “A” until the entire session is
+completed).
+
+Minimize Cross Data-Center Traffic Requirements
+
+* R-92935 The VNF **SHOULD** minimize the propagation of state information
+ across multiple data centers to avoid cross data center traffic.
+
+Application Resilient Error Handling
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Ensure an application communicating with a downstream peer is equipped
+to intelligently handle all error conditions. Make sure code can handle
+exceptions seamlessly - implement smart retry logic and implement
+multi-point entry (multiple data centers) for back-end system
+applications.
+
+Application Resilient Error Handling Requirements
+
+* R-26371 The VNF **MUST** detect communication failure for inter VNFC
+ instance and intra/inter VNF and re-establish communication
+ automatically to maintain the VNF without manual intervention to
+ provide service continuity.
+* R-18725 The VNF **MUST** handle the restart of a single VNFC instance
+ without requiring all VNFC instances to be restarted.
+* R-06668 The VNF **MUST** handle the start or restart of VNFC instances
+ in any order with each VNFC instance establishing or re-establishing
+ required connections or relationships with other VNFC instances and/or
+ VNFs required to perform the VNF function/role without requiring VNFC
+ instance(s) to be started/restarted in a particular order.
+* R-80070 The VNF **MUST** handle errors and exceptions so that they do
+ not interrupt processing of incoming VNF requests to maintain service
+ continuity (where the error is not directly impacting the software
+ handling the incoming request).
+* R-32695 The VNF **MUST** provide the ability to modify the number of
+ retries, the time between retries and the behavior/action taken after
+ the retries have been exhausted for exception handling to allow the
+ NCSP to control that behavior, where the interface and/or functional
+ specification allows for altering behaviour.
+* R-48356 The VNF **MUST** fully exploit exception handling to the extent
+ that resources (e.g., threads and memory) are released when no longer
+ needed regardless of programming language.
+* R-67918 The VNF **MUST** handle replication race conditions both locally
+ and geo-located in the event of a data base instance failure to maintain
+ service continuity.
+* R-36792 The VNF **MUST** automatically retry/resubmit failed requests
+ made by the software to its downstream system to increase the success rate.
+* R-70013 The VNF **MUST NOT** require any manual steps to get it ready for
+ service after a container rebuild.
+* R-65515 The VNF **MUST** provide a mechanism and tool to start VNF
+ containers (VMs) without impacting service or service quality assuming
+ another VNF in same or other geographical location is processing service
+ requests.
+* R-94978 The VNF **MUST** provide a mechanism and tool to perform a graceful
+ shutdown of all the containers (VMs) in the VNF without impacting service
+ or service quality assuming another VNF in same or other geographical
+ location can take over traffic and process service requests.
+
+
+System Resource Optimization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Ensure an application is using appropriate system resources for the task
+at hand; for example, do not use network or IO operations inside
+critical sections, which could end up blocking other threads or
+processes or eating memory if they are unable to complete. Critical
+sections should only contain memory operation, and should not contain
+any network or IO operation.
+
+
+System Resource Optimization Requirements
+
+* R-22059 The VNF **MUST NOT** execute long running tasks (e.g., IO,
+ database, network operations, service calls) in a critical section
+ of code, so as to minimize blocking of other operations and increase
+ concurrent throughput.
+* R-63473 The VNF **MUST** automatically advertise newly scaled
+ components so there is no manual intervention required.
+* R-74712 The VNF **MUST** utilize FQDNs (and not IP address) for
+ both Service Chaining and scaling.
+* R-41159 The VNF **MUST** deliver any and all functionality from any
+ VNFC in the pool (where pooling is the most suitable solution). The
+ VNFC pool member should be transparent to the client. Upstream and
+ downstream clients should only recognize the function being performed,
+ not the member performing it.
+* R-85959 The VNF **SHOULD** automatically enable/disable added/removed
+ sub-components or component so there is no manual intervention required.
+* R-06885 The VNF **SHOULD** support the ability to scale down a VNFC pool
+ without jeopardizing active sessions. Ideally, an active session should
+ not be tied to any particular VNFC instance.
+* R-12538 The VNF **SHOULD** support load balancing and discovery
+ mechanisms in resource pools containing VNFC instances.
+* R-98989 The VNF **SHOULD** utilize resource pooling (threads,
+ connections, etc.) within the VNF application so that resources
+ are not being created and destroyed resulting in resource management
+ overhead.
+* R-55345 The VNF **SHOULD** use techniques such as “lazy loading” when
+ initialization includes loading catalogues and/or lists which can grow
+ over time, so that the VNF startup time does not grow at a rate
+ proportional to that of the list.
+* R-35532 The VNF **SHOULD** release and clear all shared assets (memory,
+ database operations, connections, locks, etc.) as soon as possible,
+ especially before long running sync and asynchronous operations, so as
+ to not prevent use of these assets by other entities.
+
+
+Application Configuration Management
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Leverage configuration management audit capability to drive conformity
+to develop gold configurations for technologies like Java, Python, etc.
+
+Application Configuration Management Requirements
+
+* R-77334 The VNF **MUST** allow configurations and configuration parameters
+ to be managed under version control to ensure consistent configuration
+ deployment, traceability and rollback.
+* R-99766 The VNF **MUST** allow configurations and configuration parameters
+ to be managed under version control to ensure the ability to rollback to
+ a known valid configuration.
+* R-73583 The VNF **MUST** allow changes of configuration parameters
+ to be consumed by the VNF without requiring the VNF or its sub-components
+ to be bounced so that the VNF availability is not effected.
+
+
+Intelligent Transaction Distribution & Management
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Leverage Intelligent Load Balancing and redundant components (hardware
+and modules) for all transactions, such that at any point in the
+transaction: front end, middleware, back end -- a failure in any one
+component does not result in a failure of the application or system;
+i.e., transactions will continue to flow, albeit at a possibly reduced
+capacity until the failed component restores itself. Create redundancy
+in all layers (software and hardware) at local and remote data centers;
+minimizing interdependencies of components (i.e. data replication,
+deploying non-related elements in the same container).
+
+Intelligent Transaction Distribution & Management Requirements
+
+* R-21558 The VNF **SHOULD** use intelligent routing by having knowledge
+ of multiple downstream/upstream endpoints that are exposed to it, to
+ ensure there is no dependency on external services (such as load balancers)
+ to switch to alternate endpoints.
+* R-08315 The VNF **SHOULD** use redundant connection pooling to connect
+ to any backend data source that can be switched between pools in an
+ automated/scripted fashion to ensure high availability of the connection
+ to the data source.
+* R-27995 The VNF **SHOULD** include control loop mechanisms to notify
+ the consumer of the VNF of their exceeding SLA thresholds so the consumer
+ is able to control its load against the VNF.
+
+Deployment Optimization
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Reduce opportunity for failure, by human or by machine, through smarter
+deployment practices and automation. This can include rolling code
+deployments, additional testing strategies, and smarter deployment
+automation (remove the human from the mix).
+
+Deployment Optimization Requirements
+
+* R-73364 The VNF **MUST** support at least two major versions of the
+ VNF software and/or sub-components to co-exist within production
+ environments at any time so that upgrades can be applied across
+ multiple systems in a staggered manner.
+* R-02454 The VNF **MUST** support the existence of multiple major/minor
+ versions of the VNF software and/or sub-components and interfaces that
+ support both forward and backward compatibility to be transparent to
+ the Service Provider usage.
+* R-57855 The VNF **MUST** support hitless staggered/rolling deployments
+ between its redundant instances to allow "soak-time/burn in/slow roll"
+ which can enable the support of low traffic loads to validate the
+ deployment prior to supporting full traffic loads.
+* R-64445 The VNF **MUST** support the ability of a requestor of the
+ service to determine the version (and therefore capabilities) of the
+ service so that Network Cloud Service Provider can understand the
+ capabilities of the service.
+* R-56793 The VNF **MUST** test for adherence to the defined performance
+ budgets at each layer, during each delivery cycle with delivered
+ results, so that the performance budget is measured and the code
+ is adjusted to meet performance budget.
+* R-77667 The VNF **MUST** test for adherence to the defined performance
+ budget at each layer, during each delivery cycle so that the performance
+ budget is measured and feedback is provided where the performance budget
+ is not met.
+* R-49308 The VNF **SHOULD** test for adherence to the defined resiliency
+ rating recommendation at each layer, during each delivery cycle with
+ delivered results, so that the resiliency rating is measured and the
+ code is adjusted to meet software resiliency requirements.
+* R-16039 The VNF **SHOULD** test for adherence to the defined
+ resiliency rating recommendation at each layer, during each
+ delivery cycle so that the resiliency rating is measured and
+ feedback is provided where software resiliency requirements are
+ not met.
+
+Monitoring & Dashboard
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Promote dashboarding as a tool to monitor and support the general
+operational health of a system. It is critical to the support of the
+implementation of many resiliency patterns essential to the maintenance
+of the system. It can help identify unusual conditions that might
+indicate failure or the potential for failure. This would contribute to
+improve Mean Time to Identify (MTTI), Mean Time to Repair (MTTR), and
+post-incident diagnostics.
+
+Monitoring & Dashboard Requirements
+
+* R-34957 The VNF **MUST** provide a method of metrics gathering for each
+ layer's performance to identify/document variances in the allocations so
+ they can be addressed.
+* R-49224 The VNF **MUST** provide unique traceability of a transaction
+ through its life cycle to ensure quick and efficient troubleshooting.
+* R-52870 The VNF **MUST** provide a method of metrics gathering
+ and analysis to evaluate the resiliency of the software from both
+ a granular as well as a holistic standpoint. This includes, but is
+ not limited to thread utilization, errors, timeouts, and retries.
+* R-92571 The VNF **MUST** provide operational instrumentation such as
+ logging, so as to facilitate quick resolution of issues with the VNF to
+ provide service continuity.
+* R-48917 The VNF **MUST** monitor for and alert on (both sender and
+ receiver) errant, running longer than expected and missing file transfers,
+ so as to minimize the impact due to file transfer errors.
+* R-28168 The VNF **SHOULD** use an appropriately configured logging
+ level that can be changed dynamically, so as to not cause performance
+ degradation of the VNF due to excessive logging.
+* R-87352 The VNF **SHOULD** utilize Cloud health checks, when available
+ from the Network Cloud, from inside the application through APIs to check
+ the network connectivity, dropped packets rate, injection, and auto failover
+ to alternate sites if needed.
+* R-16560 The VNF **SHOULD** conduct a resiliency impact assessment for all
+ inter/intra-connectivity points in the VNF to provide an overall resiliency
+ rating for the VNF to be incorporated into the software design and
+ development of the VNF.