Data flow

The “end-to-end” data flow is described there (including Notification Writer service part):

A customer cluster with Insights Operator installed sends new data containing info about the cluster into Ingress service
The Ingress service consumes the data, writes them into an S3 Bucket, and produces a new message into a Kafka topic named platform.upload.announce.
The CCX Data pipeline service consumes the message from the platform.upload.announce Kafka topic.
That message represents an event that contains (among other things) an URL to S3 Bucket.
Insights operator data is read from S3 Bucket and insights rules are applied to that data in ccx-data-pipeline service.
Results (basically organization ID + cluster name + insights results JSON) are stored back into Kafka, but into different topic named ccx.ocp.results.
That results are consumed by ccx-notification-writer service.
ccx-notification-writer service stores insights results into AWS RDS database into new_reports table.
Content of that table is consumed by ccx-notification-service periodically.
Newest results from new_reports table is compared with results stored in reported table. The records used for the comparison depend on the configured cool down time.
If changes (new issues) has been found, notification message is sent into Kafka topic named platform.notifications.ingress. The expected format of the message can be found here
New issues are also sent to Service Log via REST API. Because Service Log accepts description and content to be represented in Markdown, issues are “rendered” first by Insights Content Template Renderer. To use the Service Log API, the ccx-notification-service uses the credentials stored in vault.
The newest result is stored into reported table to be used in the next ccx-notification-service iteration.

Remarks

Steps 1 to 5 are shared with the CCX Data pipeline
Steps 7 and 8 are performed by ccx-notification-writer service.
Steps 9 to 13 are performed by ccx-notification-service service with the help of Insights Content Service and Insights Content Template Renderer.

Cool down mechanism

The cool down mechanism is used to filter the previously reported issues so that they are not continuously sent to the customers. It works by defining a minimum amount of time that must elapse between two notifications. That cool down time is applied to all the issues processed during an iteration.

The value expected in the configuration is a string that respects the format of PostgreSQL’s interval type.

Data flow of the notification service without cool down

See steps 9 to 12 of the data flow section

Data flow of the notification service with cool down

The latest entry for each distinct cluster in the new_reports table is consumed by the ccx-notification-service.
Results stored in reported table within the cool down time are retrieved. Therefore all the reported issues that are not older than the configured cool down are cached in a previouslyReported map by the service in each iteration.
When checking for new issues in the report, the ccx-notification-service looks up each issue in the previouslyReported map, and if found, that issue is considered to still be in cool down and is not processed further. If not found, the processing of the issue continues.
If changes (new issues) has been found between the previous report and the new one, a notification message is sent into Kafka topic named platform.notifications.ingress. The expected format of the message can be found here.
New issues is also sent to Service Log via REST API. To use the Service Log API, the ccx-notification-service uses the credentials stored in vault.
The newest result is stored into reported table to be used in the next ccx-notification-service iteration.

Configuring the cool down mechanism

The cooldown mechanism can be configured by specifying the cooldown field under each integrated service’s configuration. Currently, the cooldown field can be configured under the kafka_broker and service_log configurations in the config.toml file or by setting the CCX_NOTIFICATION_SERVICE__KAFKA_BROKER__COOLDOWN and CCX_NOTIFICATION_SERVICE__SERVICE_LOG__COOLDOWN environment variables respectively.

The value set is used directly within an SQL query, so the expected format is an integer followed by a valid SQL epoch time units (year[s] month[s] week[s] day[s] hour[s] minute[s] second[s])

As of today, since we only keep 8 days of data in our database, the maximum cooldown that can be applied is 8 days.

In case new services are supported in the future, the corresponding code would need to be added for it to support the cooldown mechanism.