This is a service used to process and store media content in the most optimized way possible so that it can be streamed to the Hangout app users effectively. Currently it contains the pipeline for video procesing and the implementation of image processing pipeline is underway.
In the final version this service will be able to process any image or video content effectively to optimize them for storage and streaming.
- kafka Broker url and a topic to connect to
- FFMPEG to be installed locally in the machine/node in which this service would run
In the service the main input comes in the form of kafka events. A single event maps to a single file uploaded by the user. Whenver the event occurs the consumer consumes the event then puts it in a channel. A pool of workers listen to that channel. Whenever a new event is put in the channel one of the available worker picks up the task and starts processing it. The whole architecture aims to be very concurrent in nature to allow scaling up and down as needed.
Let us dive deep ,we will go from the basic things to thigs of more complexiy.
-
Lets start with configuration. The different settings of the application can be configured through a single yaml file or environment variables or mix of both.
- you will find a yaml file under
resourcesfolder. - Every entry in yaml file can be configured through environment variables also. You will find the name of the environment variables inside
config/config.gofile.
- you will find a yaml file under
-
After the configuration is loaded, lets look at the loggers.
- Here we have 2 implementations of logger namely
zerolog&slog. - They are hidden behind a common interface.
- The design of this logger packge is quite novel to me. I think we can publish that as a separate package.
- You can switch the implementation of the logger backed that will be used in runtime using
log.backendyaml config orLOG_BACKENDenv variable. - There are 2 acceptable values for this config. Namely
zerologandslog. If any other value is providedzerologwill be used as a default logging backend.
- Here we have 2 implementations of logger namely
-
After that lets come to the Kafka consumers.
- Here we are using the very famous IBM Sarama library.
- Every instance will have a single kafka consumer that runs in its own go routine so that the main application flow is not blocked.
- We are using the
ConsumerGroupapi to make kafka allocate & rebalance the topic partitions to be allocated to the consumer of the current instance. - This also means if we have multiple instances of this application. All the consumers will belong to the same consumer group.
- As the partitions are allocated by kafka and the events are distributed in different partitions this prevents a race condition where multiple instances of this application will race to consume the same event and there by process the same file twice.
- Topic names and the file upload path can be configured from the yaml file or env variables which ever is preffered.
- The service expects the individual events the have this structure:
{ "key": "<HTTP contentType header value of the file>", "body": "<filename with extension>" }keyis the event key that you can provide while producing kafka events to a topic. Andbodyis the event body
-
Now we come to the workers. The number of workers in the worker pool can be configured using
hangout.worker-pool.strengthyaml property or theHANGOUT_WORKER_POOL_STRENGTHenv variable. This can help in case you have a beefier machine you can increase the number of workers this would in turn mean more number of files can be processed oncurrently at any point of time. -
There is a channel which links the kafka consumer and the workers. This is a buffered channel.
- The length of the buffered channel can be configured using
hangout.media.queue-lengthyaml property orHANGOUT_MEDIA_QUEUE_LENGTHenv variable. - In case you have a low spec machine or an unreliable kafka connection you can increase the channel length so that you consume events as quickly as they come and then you keep them in your own buffer so that they are processed on as the worker becomes idle without any rush.
- The length of the buffered channel can be configured using
-
Lets get into the files package. There is a common interface called
Filethat the workers call and bothImageandVideopipelines can be intiated depending upon the value of thekeyfield of the kafka event. -
Lets go over the overall execution flow of the service:
- First all the values from the yaml file and the env variables are loaded.
- Then depending on the config, the suitable logger is initiated.
- Then the worker pool spawns the required number of workers and makes them available.
- Then the connection to Kafka is setup and the consumer joins the consumer group.
- The consumer remains active thorugh out the time the service runs. It runs parallely.
- Whenever there is an event in the topic, the consumer consumes it then sends it to the
eventChannel. - Every worker in the worker pool listens to this channel. As, the message consumption in go channels is blocking in nature only one of the idle worker picks up the event and starts processing the file. While the other idle workers in the pool keep listening for the next event to process. This enables us to process multiple events concurrently.
- If a situation occours an event comes and all the workers are busy processing previous files then the buffered nature of the channel comes into play and it holds some events in buffer whenever any of the worker becomes idle it will pickup the next event from buffer. This helps to eliviate some back pressure problems too.
- But if the channel becomes full too then the kafka consumer will no longer consume new events until there is space for a new event in the channel.
- Video processing and Image procesing pipelines are synchronous in nature so, the worker will be blocked until the file it is processing gets processed finishes fully or it erros out for some reason.
- When you shutdown the service it tries for graceful shutdown where all the workers and the consumers complete their ongoing work and then shut themselves down. So, shutdown doesn't necessarily stop and crash the service immedietly. If aany of the workers or the kafka consumer is idle they would shutdown immedietly but the worker which is busy will complete its work then shutdown.
