diff options
Diffstat (limited to 'docs/Chapter4/Resiliency.rst')
-rw-r--r-- | docs/Chapter4/Resiliency.rst | 668 |
1 files changed, 491 insertions, 177 deletions
diff --git a/docs/Chapter4/Resiliency.rst b/docs/Chapter4/Resiliency.rst index bf3e2e8..7047962 100644 --- a/docs/Chapter4/Resiliency.rst +++ b/docs/Chapter4/Resiliency.rst @@ -29,33 +29,86 @@ would include techniques such as multiple vLANs, multiple local and geographic instances, multiple local and geographic data replication, and virtualized services such as Load Balancers. - All Layer Redundancy Requirements -* R-52499 The VNF **MUST** meet their own resiliency goals and not rely - on the Network Cloud. -* R-42207 The VNF **MUST** design resiliency into a VNF such that the - resiliency deployment model (e.g., active-active) can be chosen at - run-time. -* R-03954 The VNF **MUST** survive any single points of failure within - the Network Cloud (e.g., virtual NIC, VM, disk failure). -* R-89010 The VNF **MUST** survive any single points of software failure - internal to the VNF (e.g., in memory structures, JMS message queues). -* R-67709 The VNF **MUST** be designed, built and packaged to enable - deployment across multiple fault zones (e.g., VNFCs deployed in - different servers, racks, OpenStack regions, geographies) so that - in the event of a planned/unplanned downtime of a fault zone, the - overall operation/throughput of the VNF is maintained. -* R-35291 The VNF **MUST** support the ability to failover a VNFC - automatically to other geographically redundant sites if not - deployed active-active to increase the overall resiliency of the VNF. -* R-36843 The VNF **MUST** support the ability of the VNFC to be deployable - in multi-zoned cloud sites to allow for site support in the event of cloud - zone failure or upgrades. -* R-00098 The VNF **MUST NOT** impact the ability of the VNF to provide - service/function due to a single container restart. -* R-79952 The VNF **SHOULD** support container snapshots if not for rebuild - and evacuate for rollback or back out mechanism. + +.. req:: + :id: R-52499 + :target: VNF + :keyword: MUST + + The VNF **MUST** meet their own resiliency goals and not rely + on the Network Cloud. + +.. req:: + :id: R-42207 + :target: VNF + :keyword: MUST + + The VNF **MUST** design resiliency into a VNF such that the + resiliency deployment model (e.g., active-active) can be chosen at + run-time. + +.. req:: + :id: R-03954 + :target: VNF + :keyword: MUST + + The VNF **MUST** survive any single points of failure within + the Network Cloud (e.g., virtual NIC, VM, disk failure). + +.. req:: + :id: R-89010 + :target: VNF + :keyword: MUST + + The VNF **MUST** survive any single points of software failure + internal to the VNF (e.g., in memory structures, JMS message queues). + +.. req:: + :id: R-67709 + :target: VNF + :keyword: MUST + + The VNF **MUST** be designed, built and packaged to enable + deployment across multiple fault zones (e.g., VNFCs deployed in + different servers, racks, OpenStack regions, geographies) so that + in the event of a planned/unplanned downtime of a fault zone, the + overall operation/throughput of the VNF is maintained. + +.. req:: + :id: R-35291 + :target: VNF + :keyword: MUST + + The VNF **MUST** support the ability to failover a VNFC + automatically to other geographically redundant sites if not + deployed active-active to increase the overall resiliency of the VNF. + +.. req:: + :id: R-36843 + :target: VNF + :keyword: MUST + + The VNF **MUST** support the ability of the VNFC to be deployable + in multi-zoned cloud sites to allow for site support in the event of cloud + zone failure or upgrades. + +.. req:: + :id: R-00098 + :target: VNF + :keyword: MUST NOT + + The VNF **MUST NOT** impact the ability of the VNF to provide + service/function due to a single container restart. + +.. req:: + :id: R-79952 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** support container snapshots if not for rebuild + and evacuate for rollback or back out mechanism. Minimize Cross Data-Center Traffic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -69,8 +122,14 @@ completed). Minimize Cross Data-Center Traffic Requirements -* R-92935 The VNF **SHOULD** minimize the propagation of state information - across multiple data centers to avoid cross data center traffic. + +.. req:: + :id: R-92935 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** minimize the propagation of state information + across multiple data centers to avoid cross data center traffic. Application Resilient Error Handling ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -83,45 +142,110 @@ applications. Application Resilient Error Handling Requirements -* R-26371 The VNF **MUST** detect communication failure for inter VNFC - instance and intra/inter VNF and re-establish communication - automatically to maintain the VNF without manual intervention to - provide service continuity. -* R-18725 The VNF **MUST** handle the restart of a single VNFC instance - without requiring all VNFC instances to be restarted. -* R-06668 The VNF **MUST** handle the start or restart of VNFC instances - in any order with each VNFC instance establishing or re-establishing - required connections or relationships with other VNFC instances and/or - VNFs required to perform the VNF function/role without requiring VNFC - instance(s) to be started/restarted in a particular order. -* R-80070 The VNF **MUST** handle errors and exceptions so that they do - not interrupt processing of incoming VNF requests to maintain service - continuity (where the error is not directly impacting the software - handling the incoming request). -* R-32695 The VNF **MUST** provide the ability to modify the number of - retries, the time between retries and the behavior/action taken after - the retries have been exhausted for exception handling to allow the - NCSP to control that behavior, where the interface and/or functional - specification allows for altering behaviour. -* R-48356 The VNF **MUST** fully exploit exception handling to the extent - that resources (e.g., threads and memory) are released when no longer - needed regardless of programming language. -* R-67918 The VNF **MUST** handle replication race conditions both locally - and geo-located in the event of a data base instance failure to maintain - service continuity. -* R-36792 The VNF **MUST** automatically retry/resubmit failed requests - made by the software to its downstream system to increase the success rate. -* R-70013 The VNF **MUST NOT** require any manual steps to get it ready for - service after a container rebuild. -* R-65515 The VNF **MUST** provide a mechanism and tool to start VNF - containers (VMs) without impacting service or service quality assuming - another VNF in same or other geographical location is processing service - requests. -* R-94978 The VNF **MUST** provide a mechanism and tool to perform a graceful - shutdown of all the containers (VMs) in the VNF without impacting service - or service quality assuming another VNF in same or other geographical - location can take over traffic and process service requests. +.. req:: + :id: R-26371 + :target: VNF + :keyword: MUST + + The VNF **MUST** detect communication failure for inter VNFC + instance and intra/inter VNF and re-establish communication + automatically to maintain the VNF without manual intervention to + provide service continuity. + +.. req:: + :id: R-18725 + :target: VNF + :keyword: MUST + + The VNF **MUST** handle the restart of a single VNFC instance + without requiring all VNFC instances to be restarted. + +.. req:: + :id: R-06668 + :target: VNF + :keyword: MUST + + The VNF **MUST** handle the start or restart of VNFC instances + in any order with each VNFC instance establishing or re-establishing + required connections or relationships with other VNFC instances and/or + VNFs required to perform the VNF function/role without requiring VNFC + instance(s) to be started/restarted in a particular order. + +.. req:: + :id: R-80070 + :target: VNF + :keyword: MUST + + The VNF **MUST** handle errors and exceptions so that they do + not interrupt processing of incoming VNF requests to maintain service + continuity (where the error is not directly impacting the software + handling the incoming request). + +.. req:: + :id: R-32695 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide the ability to modify the number of + retries, the time between retries and the behavior/action taken after + the retries have been exhausted for exception handling to allow the + NCSP to control that behavior, where the interface and/or functional + specification allows for altering behaviour. + +.. req:: + :id: R-48356 + :target: VNF + :keyword: MUST + + The VNF **MUST** fully exploit exception handling to the extent + that resources (e.g., threads and memory) are released when no longer + needed regardless of programming language. + +.. req:: + :id: R-67918 + :target: VNF + :keyword: MUST + + The VNF **MUST** handle replication race conditions both locally + and geo-located in the event of a data base instance failure to maintain + service continuity. + +.. req:: + :id: R-36792 + :target: VNF + :keyword: MUST + + The VNF **MUST** automatically retry/resubmit failed requests + made by the software to its downstream system to increase the success rate. + +.. req:: + :id: R-70013 + :target: VNF + :keyword: MUST NOT + + The VNF **MUST NOT** require any manual steps to get it ready for + service after a container rebuild. + +.. req:: + :id: R-65515 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide a mechanism and tool to start VNF + containers (VMs) without impacting service or service quality assuming + another VNF in same or other geographical location is processing service + requests. + +.. req:: + :id: R-94978 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide a mechanism and tool to perform a graceful + shutdown of all the containers (VMs) in the VNF without impacting service + or service quality assuming another VNF in same or other geographical + location can take over traffic and process service requests. System Resource Optimization ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -133,42 +257,100 @@ processes or eating memory if they are unable to complete. Critical sections should only contain memory operation, and should not contain any network or IO operation. - System Resource Optimization Requirements -* R-22059 The VNF **MUST NOT** execute long running tasks (e.g., IO, - database, network operations, service calls) in a critical section - of code, so as to minimize blocking of other operations and increase - concurrent throughput. -* R-63473 The VNF **MUST** automatically advertise newly scaled - components so there is no manual intervention required. -* R-74712 The VNF **MUST** utilize FQDNs (and not IP address) for - both Service Chaining and scaling. -* R-41159 The VNF **MUST** deliver any and all functionality from any - VNFC in the pool (where pooling is the most suitable solution). The - VNFC pool member should be transparent to the client. Upstream and - downstream clients should only recognize the function being performed, - not the member performing it. -* R-85959 The VNF **SHOULD** automatically enable/disable added/removed - sub-components or component so there is no manual intervention required. -* R-06885 The VNF **SHOULD** support the ability to scale down a VNFC pool - without jeopardizing active sessions. Ideally, an active session should - not be tied to any particular VNFC instance. -* R-12538 The VNF **SHOULD** support load balancing and discovery - mechanisms in resource pools containing VNFC instances. -* R-98989 The VNF **SHOULD** utilize resource pooling (threads, - connections, etc.) within the VNF application so that resources - are not being created and destroyed resulting in resource management - overhead. -* R-55345 The VNF **SHOULD** use techniques such as “lazy loading” when - initialization includes loading catalogues and/or lists which can grow - over time, so that the VNF startup time does not grow at a rate - proportional to that of the list. -* R-35532 The VNF **SHOULD** release and clear all shared assets (memory, - database operations, connections, locks, etc.) as soon as possible, - especially before long running sync and asynchronous operations, so as - to not prevent use of these assets by other entities. +.. req:: + :id: R-22059 + :target: VNF + :keyword: MUST NOT + + The VNF **MUST NOT** execute long running tasks (e.g., IO, + database, network operations, service calls) in a critical section + of code, so as to minimize blocking of other operations and increase + concurrent throughput. + +.. req:: + :id: R-63473 + :target: VNF + :keyword: MUST + + The VNF **MUST** automatically advertise newly scaled + components so there is no manual intervention required. + +.. req:: + :id: R-74712 + :target: VNF + :keyword: MUST + + The VNF **MUST** utilize FQDNs (and not IP address) for + both Service Chaining and scaling. + +.. req:: + :id: R-41159 + :target: VNF + :keyword: MUST + + The VNF **MUST** deliver any and all functionality from any + VNFC in the pool (where pooling is the most suitable solution). The + VNFC pool member should be transparent to the client. Upstream and + downstream clients should only recognize the function being performed, + not the member performing it. + +.. req:: + :id: R-85959 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** automatically enable/disable added/removed + sub-components or component so there is no manual intervention required. + +.. req:: + :id: R-06885 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** support the ability to scale down a VNFC pool + without jeopardizing active sessions. Ideally, an active session should + not be tied to any particular VNFC instance. + +.. req:: + :id: R-12538 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** support load balancing and discovery + mechanisms in resource pools containing VNFC instances. + +.. req:: + :id: R-98989 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** utilize resource pooling (threads, + connections, etc.) within the VNF application so that resources + are not being created and destroyed resulting in resource management + overhead. + +.. req:: + :id: R-55345 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** use techniques such as "lazy loading" when + initialization includes loading catalogues and/or lists which can grow + over time, so that the VNF startup time does not grow at a rate + proportional to that of the list. + +.. req:: + :id: R-35532 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** release and clear all shared assets (memory, + database operations, connections, locks, etc.) as soon as possible, + especially before long running sync and asynchronous operations, so as + to not prevent use of these assets by other entities. Application Configuration Management ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -178,16 +360,33 @@ to develop gold configurations for technologies like Java, Python, etc. Application Configuration Management Requirements -* R-77334 The VNF **MUST** allow configurations and configuration parameters - to be managed under version control to ensure consistent configuration - deployment, traceability and rollback. -* R-99766 The VNF **MUST** allow configurations and configuration parameters - to be managed under version control to ensure the ability to rollback to - a known valid configuration. -* R-73583 The VNF **MUST** allow changes of configuration parameters - to be consumed by the VNF without requiring the VNF or its sub-components - to be bounced so that the VNF availability is not effected. +.. req:: + :id: R-77334 + :target: VNF + :keyword: MUST + + The VNF **MUST** allow configurations and configuration parameters + to be managed under version control to ensure consistent configuration + deployment, traceability and rollback. + +.. req:: + :id: R-99766 + :target: VNF + :keyword: MUST + + The VNF **MUST** allow configurations and configuration parameters + to be managed under version control to ensure the ability to rollback to + a known valid configuration. + +.. req:: + :id: R-73583 + :target: VNF + :keyword: MUST + + The VNF **MUST** allow changes of configuration parameters + to be consumed by the VNF without requiring the VNF or its sub-components + to be bounced so that the VNF availability is not effected. Intelligent Transaction Distribution & Management ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -204,17 +403,35 @@ deploying non-related elements in the same container). Intelligent Transaction Distribution & Management Requirements -* R-21558 The VNF **SHOULD** use intelligent routing by having knowledge - of multiple downstream/upstream endpoints that are exposed to it, to - ensure there is no dependency on external services (such as load balancers) - to switch to alternate endpoints. -* R-08315 The VNF **SHOULD** use redundant connection pooling to connect - to any backend data source that can be switched between pools in an - automated/scripted fashion to ensure high availability of the connection - to the data source. -* R-27995 The VNF **SHOULD** include control loop mechanisms to notify - the consumer of the VNF of their exceeding SLA thresholds so the consumer - is able to control its load against the VNF. + +.. req:: + :id: R-21558 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** use intelligent routing by having knowledge + of multiple downstream/upstream endpoints that are exposed to it, to + ensure there is no dependency on external services (such as load balancers) + to switch to alternate endpoints. + +.. req:: + :id: R-08315 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** use redundant connection pooling to connect + to any backend data source that can be switched between pools in an + automated/scripted fashion to ensure high availability of the connection + to the data source. + +.. req:: + :id: R-27995 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** include control loop mechanisms to notify + the consumer of the VNF of their exceeding SLA thresholds so the consumer + is able to control its load against the VNF. Deployment Optimization ^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -226,39 +443,87 @@ automation (remove the human from the mix). Deployment Optimization Requirements -* R-73364 The VNF **MUST** support at least two major versions of the - VNF software and/or sub-components to co-exist within production - environments at any time so that upgrades can be applied across - multiple systems in a staggered manner. -* R-02454 The VNF **MUST** support the existence of multiple major/minor - versions of the VNF software and/or sub-components and interfaces that - support both forward and backward compatibility to be transparent to - the Service Provider usage. -* R-57855 The VNF **MUST** support hitless staggered/rolling deployments - between its redundant instances to allow "soak-time/burn in/slow roll" - which can enable the support of low traffic loads to validate the - deployment prior to supporting full traffic loads. -* R-64445 The VNF **MUST** support the ability of a requestor of the - service to determine the version (and therefore capabilities) of the - service so that Network Cloud Service Provider can understand the - capabilities of the service. -* R-56793 The VNF **MUST** test for adherence to the defined performance - budgets at each layer, during each delivery cycle with delivered - results, so that the performance budget is measured and the code - is adjusted to meet performance budget. -* R-77667 The VNF **MUST** test for adherence to the defined performance - budget at each layer, during each delivery cycle so that the performance - budget is measured and feedback is provided where the performance budget - is not met. -* R-49308 The VNF **SHOULD** test for adherence to the defined resiliency - rating recommendation at each layer, during each delivery cycle with - delivered results, so that the resiliency rating is measured and the - code is adjusted to meet software resiliency requirements. -* R-16039 The VNF **SHOULD** test for adherence to the defined - resiliency rating recommendation at each layer, during each - delivery cycle so that the resiliency rating is measured and - feedback is provided where software resiliency requirements are - not met. + +.. req:: + :id: R-73364 + :target: VNF + :keyword: MUST + + The VNF **MUST** support at least two major versions of the + VNF software and/or sub-components to co-exist within production + environments at any time so that upgrades can be applied across + multiple systems in a staggered manner. + +.. req:: + :id: R-02454 + :target: VNF + :keyword: MUST + + The VNF **MUST** support the existence of multiple major/minor + versions of the VNF software and/or sub-components and interfaces that + support both forward and backward compatibility to be transparent to + the Service Provider usage. + +.. req:: + :id: R-57855 + :target: VNF + :keyword: MUST + + The VNF **MUST** support hitless staggered/rolling deployments + between its redundant instances to allow "soak-time/burn in/slow roll" + which can enable the support of low traffic loads to validate the + deployment prior to supporting full traffic loads. + +.. req:: + :id: R-64445 + :target: VNF + :keyword: MUST + + The VNF **MUST** support the ability of a requestor of the + service to determine the version (and therefore capabilities) of the + service so that Network Cloud Service Provider can understand the + capabilities of the service. + +.. req:: + :id: R-56793 + :target: VNF + :keyword: MUST + + The VNF **MUST** test for adherence to the defined performance + budgets at each layer, during each delivery cycle with delivered + results, so that the performance budget is measured and the code + is adjusted to meet performance budget. + +.. req:: + :id: R-77667 + :target: VNF + :keyword: MUST + + The VNF **MUST** test for adherence to the defined performance + budget at each layer, during each delivery cycle so that the performance + budget is measured and feedback is provided where the performance budget + is not met. + +.. req:: + :id: R-49308 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** test for adherence to the defined resiliency + rating recommendation at each layer, during each delivery cycle with + delivered results, so that the resiliency rating is measured and the + code is adjusted to meet software resiliency requirements. + +.. req:: + :id: R-16039 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** test for adherence to the defined + resiliency rating recommendation at each layer, during each + delivery cycle so that the resiliency rating is measured and + feedback is provided where software resiliency requirements are + not met. Monitoring & Dashboard ^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -273,29 +538,78 @@ post-incident diagnostics. Monitoring & Dashboard Requirements -* R-34957 The VNF **MUST** provide a method of metrics gathering for each - layer's performance to identify/document variances in the allocations so - they can be addressed. -* R-49224 The VNF **MUST** provide unique traceability of a transaction - through its life cycle to ensure quick and efficient troubleshooting. -* R-52870 The VNF **MUST** provide a method of metrics gathering - and analysis to evaluate the resiliency of the software from both - a granular as well as a holistic standpoint. This includes, but is - not limited to thread utilization, errors, timeouts, and retries. -* R-92571 The VNF **MUST** provide operational instrumentation such as - logging, so as to facilitate quick resolution of issues with the VNF to - provide service continuity. -* R-48917 The VNF **MUST** monitor for and alert on (both sender and - receiver) errant, running longer than expected and missing file transfers, - so as to minimize the impact due to file transfer errors. -* R-28168 The VNF **SHOULD** use an appropriately configured logging - level that can be changed dynamically, so as to not cause performance - degradation of the VNF due to excessive logging. -* R-87352 The VNF **SHOULD** utilize Cloud health checks, when available - from the Network Cloud, from inside the application through APIs to check - the network connectivity, dropped packets rate, injection, and auto failover - to alternate sites if needed. -* R-16560 The VNF **SHOULD** conduct a resiliency impact assessment for all - inter/intra-connectivity points in the VNF to provide an overall resiliency - rating for the VNF to be incorporated into the software design and - development of the VNF. + +.. req:: + :id: R-34957 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide a method of metrics gathering for each + layer's performance to identify/document variances in the allocations so + they can be addressed. + +.. req:: + :id: R-49224 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide unique traceability of a transaction + through its life cycle to ensure quick and efficient troubleshooting. + +.. req:: + :id: R-52870 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide a method of metrics gathering + and analysis to evaluate the resiliency of the software from both + a granular as well as a holistic standpoint. This includes, but is + not limited to thread utilization, errors, timeouts, and retries. + +.. req:: + :id: R-92571 + :target: VNF + :keyword: MUST + + The VNF **MUST** provide operational instrumentation such as + logging, so as to facilitate quick resolution of issues with the VNF to + provide service continuity. + +.. req:: + :id: R-48917 + :target: VNF + :keyword: MUST + + The VNF **MUST** monitor for and alert on (both sender and + receiver) errant, running longer than expected and missing file transfers, + so as to minimize the impact due to file transfer errors. + +.. req:: + :id: R-28168 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** use an appropriately configured logging + level that can be changed dynamically, so as to not cause performance + degradation of the VNF due to excessive logging. + +.. req:: + :id: R-87352 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** utilize Cloud health checks, when available + from the Network Cloud, from inside the application through APIs to check + the network connectivity, dropped packets rate, injection, and auto failover + to alternate sites if needed. + +.. req:: + :id: R-16560 + :target: VNF + :keyword: SHOULD + + The VNF **SHOULD** conduct a resiliency impact assessment for all + inter/intra-connectivity points in the VNF to provide an overall resiliency + rating for the VNF to be incorporated into the software design and + development of the VNF. + |