Processing S3 Files using Knative Eventing

Murugappan Sevugan Chetty
2 min readApr 18, 2020

The Use Case:

Processing flat files is a common use case in enterprises. In these files, each line is treated as a record and each record will have an identifier based on which it is parsed. These files could be of type txt or csv etc. This solution is built for files in a S3 compatible storage like AWS S3, Minio, other on premise S3 compatible object storages.

Approach:

Build a generic component which would parse the file from a s3 based store and emit each line as an event which could be processed by a pipeline of services.

Knative Eventing:

Knative is a popular open source serverless platform which has 2 components, Serving and Eventing. Serving manages the lifecycle of serverless service while eventing provides various ways to deliver events to a service. Eventing is a rapidly evolving eco system. See details of Eventing in this page.

Sink Binding:

Sink Binding is an effective and easy option to create new event sources. Sink binding enables delivery of events to any any addressable resource like knative service or a kubernetes service. Best of all, this binding is lazy, i.e when I write an event source, I dont need to know to whom I am going to deliver the events to, it is bound at resource creation. Please take a look at this demo by Matthew Moore and Scott Nichols for better understanding. For this use case, I have created file source which could read any file in s3 and emit lines as cloud events.

S3 File Source:

The file source plugin is a reusable component, consist of 2 resources, Knative service and a kubernetes Job. The knative service creates the job which reads the file and emits content as Cloud Events.

This plugin uses the 2.0 version of cloud events Go SDK. All the details are provided in the Git repo.

Processing the Events:

Once file source is set up, processing pipelines can be set up to process the incoming events. The processing services and file source is binded via the sink-binding. This is an example of how a drug list csv file is processed.

Complex Delivery Events:

In this example I had used a simple delivery approach (source to service) . Any source which has the label that sinkbinding is looking for, will be delivered to the service (n to 1). Instead of directly delivering to a service , we could make the file source deliver to a broker which would enable n-n communication.

Apart from this example, I processed a 2 GB file, the resource usage was optimal with quick processing time.

--

--