From 2afffed5e9487fab47ddcb49063dd2cb55170841 Mon Sep 17 00:00:00 2001 From: Jack Lucas Date: Tue, 7 Dec 2021 17:13:49 -0500 Subject: [HEALTHCHECK] Add healthcheck for dynamically-deployed microservices Add health checks for DCAE microservices deployed after the initial installation of dcaegen2-services. Issue-ID: DCAEGEN2-2959 Signed-off-by: Jack Lucas Change-Id: I9da0216db695f65ed520b509db183170977fce51 --- healthcheck-container/Changelog.md | 3 ++ healthcheck-container/README.md | 63 ++++++++---------------------------- healthcheck-container/get-status.js | 4 ++- healthcheck-container/healthcheck.js | 36 +++++++++++++-------- healthcheck-container/log.js | 4 ++- healthcheck-container/package.json | 2 +- healthcheck-container/pom.xml | 4 +-- 7 files changed, 48 insertions(+), 68 deletions(-) diff --git a/healthcheck-container/Changelog.md b/healthcheck-container/Changelog.md index 7e8db8d..b64dc89 100644 --- a/healthcheck-container/Changelog.md +++ b/healthcheck-container/Changelog.md @@ -4,6 +4,9 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/) and this project adheres to [Semantic Versioning](http://semver.org/). +## [2.4.0] - 2021-12-08 +* [DCAEGEN2-2959] Add healthchecking for microservices deployed after DCAE installation. + ## [2.3.0] - 2021-11-15 * [DCAEGEN2-2983] Update Docker base image to node.js 16.x (the latest LTS release). * [DCAEGEN2-2958] Make sure all logging is directed to stdout/stderr. Enhance logging: diff --git a/healthcheck-container/README.md b/healthcheck-container/README.md index 878ef42..fa08f64 100644 --- a/healthcheck-container/README.md +++ b/healthcheck-container/README.md @@ -2,16 +2,16 @@ The Healthcheck service provides a simple HTTP API to check the status of DCAE or DCAE MOD components running in the Kubernetes environment. When it receives any incoming HTTP request, the service makes queries to the Kubernetes API to determine the current status of the DCAE or DCAE MOD components, as seen by Kubernetes. Most components have defined a "readiness probe" (an HTTP healthcheck endpoint or a healthcheck script) that Kubernetes uses to determine readiness. -Three instances of the Healthcheck service are deployed in ONAP: one for DCAE (dcaegen2), one for DCAE Helm-deployed microservices (dcaegen2-services), and one for DCAE MOD (dcaemod). +Three instances of the Healthcheck service are deployed in ONAP: one for DCAE platform (dcaegen2, to be eliminated during the R10 development cycle), one for DCAE Helm-deployed microservices (dcaegen2-services), and one for DCAE MOD (dcaemod). The Healthcheck service has two sources for identifying components that should be running: 1. A list of components that are expected to be deployed by Helm as part of the ONAP installation, specified in a JSON array stored in a file at `/opt/app/expected-components.json`. - dcaegen2, dcaegen2-services, and dcaemod have configurable deployments. By setting flags in the `values.yaml` file or in an override file, a user can select which components are deployed. The`/opt/app/expected-components.json` file is generated at deployment time based on which components have been selected for deployment. The file is stored in a Kubernetes ConfigMap that is mounted on the healthcheck container at `/opt/app/expected-components.json`. See the Helm charts for DCAE and DCAEMOD in the OOM repository for details on how the ConfigMap is created. + dcaegen2, dcaegen2-services, and dcaemod have configurable deployments. By setting flags in the `values.yaml` file or in an override file, a user can select which components are deployed. The`/opt/app/expected-components.json` file is generated at deployment time based on which components have been selected for deployment. The file is stored in a Kubernetes ConfigMap that is mounted on the healthcheck container at `/opt/app/expected-components.json`. See the Helm charts for dcaegen2, dcaegen2-services, and dcaemod in the OOM repository for details on how the ConfigMap is created. 2. Components whose Kubernetes deployments have been marked with the labeled specified by the environment variable `DEPLOY_LABEL`. These are identified by a query to the Kubernetes API requesting a list of all the deployments with the label. The query is made each time an incoming HTTP request is made, so that as new deployments are created, they will be detected and included in the health check. - For the dcaegen2 instance of the Healthcheck service, the `DEPLOY_LABEL` variable is set to `cfydeployment`. This is the label that the DCAE k8s Cloudify plugin uses to mark every deployment that it creates. The dcaegen2 Healthcheck instance therefore includes all components deployed by the DCAE k8s plugin in its health check. For the dcaemod and dcaegen2-services instances of the Healthcheck service, the `DEPLOY_LABEL` is not set, so the dcaemod and dcaegen2-services health checks do not make any checks based on a label. + For the dcaegen2-services instance of the Healthcheck service, the `DEPLOY_LABEL` variable is set to `dcaeMicroserviceName`. This is the label that the dcaegen2-services-common deployment template inserts into every deployment that uses the template. The dcaegen2-services Healthcheck instance therefore includes in its healthcheck all components deployed using the dcaegen2-services-common deployment template. For the dcaemod and dcaegen2 instances of the Healthcheck service, the `DEPLOY_LABEL` is not set, so the dcaemod and dcaegen2-services health checks do not make any checks based on a label. The Healthcheck service returns an HTTP status code of 200 if Kubernetes reports that all of the components that should be running are in a ready state. It returns a status code of 500 if some of the components are not ready. It returns a status code of 503 if some kind of error prevented it from completing a query. @@ -22,83 +22,48 @@ For the 200 and 500 status codes, the Healthcheck service returns a body consist - `items`: a JSON list(array) of objects, one for each deployment. Each object has the form: `{"name": "k8s_deployment_name", "ready": number_of_ready_instances, "unavailable": number_number_of_unavailable_instances}` -Here's an example of the body, with one component in an unavailable state: +Here's an example of the body. It's showing the four components that are deployed automatically by default when dcaegen2-services is installed (dev-dcae-hv-ves-collector, dev-dcae-prh, dev-dcae-tcagen2, and dev-dcae-ves-collector) along with three components that were deployed (in separate Helm releases) after dcaegen2-services was installed (nginx-dcae-nginx, nginxinst2-dcae-nginx, and nginxinst3-dcae-nginx). Note that nginx-dcae-nginx is not ready. ``` { "type": "summary", - "count": 14, - "ready": 13, + "count": 7, + "ready": 6, "items": [ { - "name": "dev-dcaegen2-dcae-cloudify-manager", + "name": "dev-dcae-hv-ves-collector", "ready": 1, "unavailable": 0 }, { - "name": "dep-config-binding-service", + "name": "dev-dcae-prh", "ready": 1, "unavailable": 0 }, { - "name": "dep-deployment-handler", + "name": "dev-dcae-tcagen2", "ready": 1, "unavailable": 0 }, { - "name": "dep-inventory", + "name": "dev-dcae-ves-collector", "ready": 1, "unavailable": 0 }, { - "name": "dep-service-change-handler", + "name": "nginx-dcae-nginx", "ready": 0, "unavailable": 1 }, { - "name": "dep-policy-handler", + "name": "nginxinst2-dcae-nginx", "ready": 1, "unavailable": 0 }, { - "name": "dep-dcae-ves-collector", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-dcae-tca-analytics", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-dcae-prh", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-dcae-hv-ves-collector", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-dcae-datafile-collector", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-dcae-snmptrap-collector", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-holmes-engine-mgmt", - "ready": 1, - "unavailable": 0 - }, - { - "name": "dep-holmes-rule-mgmt", + "name": "nginxinst3-dcae-nginx", "ready": 1, "unavailable": 0 } ] } -``` \ No newline at end of file +``` diff --git a/healthcheck-container/get-status.js b/healthcheck-container/get-status.js index 1480e21..0fa421e 100644 --- a/healthcheck-container/get-status.js +++ b/healthcheck-container/get-status.js @@ -1,8 +1,9 @@ /* +============LICENSE_START========================================================================= Copyright(c) 2018 AT&T Intellectual Property. All rights reserved. Copyright(c) 2020 Nokia Intellectual Property. All rights reserved. Copyright(c) 2021 J. F. Lucas. All rights reserved. - +================================================================================================== Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -14,6 +15,7 @@ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. +============LICENSE_END=========================================================================== */ /* diff --git a/healthcheck-container/healthcheck.js b/healthcheck-container/healthcheck.js index 9197e4d..a0670f3 100644 --- a/healthcheck-container/healthcheck.js +++ b/healthcheck-container/healthcheck.js @@ -1,6 +1,8 @@ /* +============LICENSE_START========================================================================= Copyright(c) 2018-2020 AT&T Intellectual Property. All rights reserved. - +Copyright(c) 2021 J. F. Lucas. All rights reserved. +================================================================================================== Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -12,6 +14,7 @@ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. +============LICENSE_END=========================================================================== */ // Expect ONAP and DCAE namespaces and Helm "release" name to be passed via environment variables @@ -31,16 +34,16 @@ const HEALTHY = 200; const UNHEALTHY = 500; const UNKNOWN = 503; -const EXPECTED_COMPONENTS = '/opt/app/expected-components.json' +const EXPECTED_COMPONENTS = '/opt/app/expected-components.json'; const LISTEN_PORT = 8080; const fs = require('fs'); -const log = require('./log') +const log = require('./log'); -// List of deployments expected to be created via Helm -let helmDeps = []; +// List of microservices expected to be deployed automatically at DCAE installation time +let expectedMicroservices = []; try { - helmDeps = JSON.parse(fs.readFileSync(EXPECTED_COMPONENTS, {encoding: 'utf8'})); + expectedMicroservices = JSON.parse(fs.readFileSync(EXPECTED_COMPONENTS, {encoding: 'utf8'})); } catch (error) { log.error(`Could not access ${EXPECTED_COMPONENTS}: ${error}`); @@ -51,10 +54,13 @@ const status = require('./get-status'); const http = require('http'); // Helm deployments are always in the ONAP namespace and prefixed by Helm release name -const helmList = helmDeps.map(function(name) { +const expectedList = expectedMicroservices.map(function(name) { return {namespace: ONAP_NS, deployment: HELM_REL.length > 0 ? HELM_REL + '-' + name : name}; }); +// List of deployment names for the microservices deployed automatically at DCAE installation time +const expectedDepNames = expectedList.map((d) => d.deployment); + const isHealthy = function(summary) { // Current healthiness criterion is simple--all deployments are ready return summary.hasOwnProperty('count') && summary.hasOwnProperty('ready') && summary.count === summary.ready; @@ -62,22 +68,24 @@ const isHealthy = function(summary) { const checkHealth = function (callback) { // Makes queries to Kubernetes and checks results - // If we encounter some kind of error contacting k8s (or other), health status is UNKNOWN (500) - // If we get responses from k8s but don't find all deployments ready, health status is UNHEALTHY (503) + // If we encounter some kind of error contacting k8s (or other), health status is UNKNOWN (503) + // If we get responses from k8s but don't find all deployments ready, health status is UNHEALTHY (500) // If we get responses from k8s and all deployments are ready, health status is HEALTHY (200) - // This could be a lot more nuanced, but what's here should be sufficient for R2 OOM healthchecking + // This could be a lot more nuanced, but what's here should be sufficient for OOM healthchecking // Query k8s to find all the deployments with specified DEPLOY_LABEL status.getLabeledDeploymentsPromise(DCAE_NS, DEPLOY_LABEL) .then(function(fullDCAEList) { - // Now get status for Helm deployments and CM deployments - return status.getStatusListPromise(helmList.concat(fullDCAEList)); + // Remove expected deployments from the list + dynamicDCAEDeps = fullDCAEList.filter( (n) => !(expectedDepNames.includes(n.deployment)) ); + // Get status for expected deployments and any dynamically deployed components + return status.getStatusListPromise(expectedList.concat(dynamicDCAEDeps)); }) .then(function(body) { callback({status: isHealthy(body) ? HEALTHY : UNHEALTHY, body: body}); }) .catch(function(error){ - callback({status: UNKNOWN, body: [error]}) + callback({status: UNKNOWN, body: [error]}); }); }; @@ -91,4 +99,4 @@ const server = http.createServer(function(req, res) { }); }); server.listen(LISTEN_PORT); -log.info(`Listening on port ${LISTEN_PORT} -- expected components: ${JSON.stringify(helmDeps)}`); +log.info(`Listening on port ${LISTEN_PORT} -- expected components: ${JSON.stringify(expectedMicroservices)}`); diff --git a/healthcheck-container/log.js b/healthcheck-container/log.js index 655ac6b..93bd591 100644 --- a/healthcheck-container/log.js +++ b/healthcheck-container/log.js @@ -1,6 +1,7 @@ /* +============LICENSE_START========================================================================= Copyright(c) 2021 J. F. Lucas. All rights reserved. - +================================================================================================== Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -12,6 +13,7 @@ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. +============LICENSE_END=========================================================================== */ exports.info = (message) => { console.log(`${(new Date()).toISOString()}: ${message}`); diff --git a/healthcheck-container/package.json b/healthcheck-container/package.json index e0dcd25..4fd9cbb 100644 --- a/healthcheck-container/package.json +++ b/healthcheck-container/package.json @@ -1,7 +1,7 @@ { "name": "k8s-healthcheck", "description": "DCAE healthcheck server", - "version": "2.3.0", + "version": "2.4.0", "main": "healthcheck.js", "author": "author", "license": "(Apache-2.0)" diff --git a/healthcheck-container/pom.xml b/healthcheck-container/pom.xml index f15a6ea..1f2b152 100644 --- a/healthcheck-container/pom.xml +++ b/healthcheck-container/pom.xml @@ -1,6 +1,6 @@