Stub for all CCX services based on Insights Core Messaging framework
View the Project on GitHub RedHatInsights/insights-ccx-messaging
The config.yaml is an standard Insights Core Messaging configuration file.
To learn about its structure and configuring some common things, you probably
want to read its documentation: Insights Core Messaging documentation.
Let’s use the ccx-data-pipeline2 configuration as an example to
understand the configuration points. The service section, where the specific
consumer, downloader, engine and publisher are configured.
consumer name refers to the class
ccx_messaging.consumers.kafka_consumer.KafkaConsumer. The arguments passed
to the initializer are defined in the kwargs dictionary initializer.
The most relevants are:
incoming_topic: the Kafka topic to subscribe the consumer object.platform_service: Used by KafkaConsumer to filter the received messages.group.id: Kafka group identifier. Several instances of the same pipeline
will need to be into the same group in order to not process the same
messages.bootstrap.servers: a list of “IP:PORT” strings where the Kafka server is
listening.max_record_age: an integer that defines the amount of seconds for ignoring
older Kafka records. If a received record is older than this amount of
seconds, it will be ignored. By default, messages older than 2 hours will be
ignored. To disable this functionality and process every record ignoring its
age, use -1.downloader: name refers to the class
ccx_messaging.downloaders.http_downloader.HTTPDownloader. Some of the accepted
argument are:
max_archive_size: this is an optional argument. It will specify the
maximum size of the archives that can be processed by the pipeline. If the
downloaded archive is bigger, it will be discarded. The parameter should be
an string in a human-readable format (it accepts units like KB, KiB, GB,
GiB…allow_unsafe_link: it is used mostly in test environments in order to avoid
checking the URL of the archive to allow downloads from not recognized URLs.engine: name refers to the class
ccx_messaging.engines.ocp_engine.OCPEngine, that applies the rules to the downloaded archive.
publisher name refers to the class
ccx_messaging.publishers.rule_processing_publisher.RuleProcessingPublisher and it
also allow to define the arguments passed to the initializer modifying the
kwargs dictionary:
outgoing_topic: a string indicating the topic where the reported results
should be sent.bootstrap.servers: same as in consumer, a list of Kafka servers to
connectwatchers: it has a list of Watcher objects that will receive notifications
of events during the pipeline processing steps. The default configured one is
ccx_messaging.watchers.stats_watcher.StatsWatcher that serve some
statistics for Prometheus service. The port where
the prometheus_client library will listen for petitions is configurable
using kwargs dictionary in the same way as consumer and publisher. The
only recognized option is:
prometheus_port: an integer indicating the port where the
prometheus_client will listen for server petitions. If not present,
defaults to 8000.In addition to the configuration mentioned above, some other behaviors can be configured through the definition of environment variables.
All the YAML file is parsed by the Insights Core Messaging library, that includes support for using environment variables with default values as values for any variable in the configuration file.
As an example, given an environment variable named KAFKA_INCOMING_TOPIC that
contains the Kafka topic name where the consumer should read, you can put
${KAFKA_INCOMING_TOPIC} as the value for the consumer/incoming_topic
configuration.
Following the same example, if you want that a default value is used in case of
KAFKA_INCOMING_TOPIC is not defined, you can specify
${KAFKA_INCOMING_TOPIC:default_value}. In this case, the environment variable
will take precedence over the default value, but this default will be used in
case the environment variable is not defined.
In addition to the YAML configuration, another important note about the needed environment variables:
To enable the sending of log messages to a Cloud Watch (AWS) instance, you should define all the following environment variables:
CW_AWS_ACCESS_KEY_ID: The AWS access key for creating the Cloud Watch
session.CW_AWS_SECRET_ACCESS_KEY: The AWS secret access key for creating the Cloud
Watch session.AWS_REGION_NAME: An AWS region name where the Cloud Watch authentication
should be done.CW_LOG_GROUP: The logging group that will be used to
publish the log messages.CW_STREAM_NAME: A name to distinguish this application logs inside the log
group.If any of these environment variables are not defined, the Cloud Watch service cannot be configured and won’t be used at all.