Serverless FaaS

General Technical Approach 

In the EO4EU platform, the main concern is to provide processing capabilities to the user. Capabilities that will extend the perspective of the user to the EO data. The Function as a Service (FaaS) functionality will provide a best effort hard coded processing that a user can adapt to fit the needs of his EO data processing. It is able to reserve the needed resources according to the work requirements and instantiate the needed functionality on demand . The outcome will be a FaaS system that is dynamic and versatile, processing a vast load of information with minimum delays.   

Figure 1 – FaaS architecture 

Process decomposition 

The nature of serverless functionality relies in lightweighted short time executions of a particular block of code (process). Trying to leverage FaaS service, the EO4EU platform avoids driving the enormous amount of data, that need to be processed, through one single instance of FaaS function and distributes the data load accordingly providing the capability to scale up or down, and discard the resources as required. Minimum intervention 

When comes to data processing, interventionism in data is adding complexity that finally is ending up to false outcome. For this reason, the FaaS proxy is not intervening in any phase of the processing providing only an abstraction of the service. 

Data distribution  

Based in the templated form of the provided serverless function, the FaaS proxy delivers only the address of the data that is going to be processed within the FaaS function as reference cursor to the data storage. This templated form of the serverless function provides the capability to interact the data in the data store without prior knowledge of this store. 

Provision Service 

After the Provision Service (PS) receives confirmation and the pipeline successfully finished, it begins to create the Kubernetes deployment resource yaml files, to dispatch it to the cluster for creation and execution. The PS generates an appropriate name for this deployment which must be unique and follow the RFC1123 naming requirements, otherwise the Kubernetes instance of the cluster will reject it. Additional Kubernetes resource files have to be created, like a Kubernetes secret object, containing authorisation info of the container registry the FaaS Docker image resides. Failing to provide this will result in ImagePullBackOff errors inside the cluster. 

Gitlab pipelines 

The use of FaaS is inextricably linked to the use of Linux containers. At the same time, the high-risk process and execution of code, which the user enters to be executed on the platform, requires the use of automation tools for the production, of these Linux containerised applications. 

Basic tools that offer the possibility of complete standardisation of the process are pipeline tools pronounced by DevOps platforms. In the current implementation, well-known tools such as Git Lab Runners, Jenkins Agents and Circle CI are evaluated. The Git Lab platform was chosen precisely because of its existence in the EO4EU project but also because it is provided as self-hosted. It can run on top of a k8s Cluster, and it is also having capacity for horizontal scaling. 

For the prototyping part, the FaaS CLI scaffolding tool was used, which provides fully functional templates both for script-based programming languages such as Python, JavaScript, and for type of safe-object-oriented programming languages such as Java and C\#. 

Pipeline description 

The pipeline process is described entirely in a yaml file that includes all the processes. The basic structure consists of stages and individual jobs. The jobs that make up the basic structural unit are executed in independent runners and are isolated from each other.  

Choosing to install runners on k8s provides an additional level of security and usability given that you run each task in a separate ephemeral deployment. 

In addition, the containerised environment of the runner can use any image as base image with the possibility of selection both at the pipeline level and at the job level.     

Jobs and Stages 

The process of creating the production images for FaaS consists of two stages and a total of three jobs. 

Build 

The first stage named build contains two tasks that perform either the first purpose of encapsulating the user's input code in the appropriate template and the second the build docker image process. 

The last stage in this particular architecture presented a set of difficulties related to the fact that the image builder required a privileged access level to the docker engine.  

This situation was remedied by using Google's Kaniko tool, which give us the ability to produce docker images, inside a container (docker-in-docker architecture) without exposing structural elements of the operating system. 

Deploy 

The next and last stage is called deploy and it performs a task that aims to upload the image to the private docker repository of the EO4EU platform. 

Dynamic configuration and on demand execution 

Each pipeline supports the ability to define variables. These can either be defined by the environment (env variables) or defined within the pipeline. In our scenario, the pipeline requires as inputs the template that the user has chosen from the graphical interface (between Python, Javascript and Php) and the code that he wishes to execute. The group of templates that integrate python along with python editions for scientific purposes were built from the ground up. Action of strategic importance since the EO4EU platform 

The PS forms the URL encoded request including all given specifications from the graphical user interface. 

On the fly build 

After the successful creation of the FaaS Docker image, the PS prepares the required deployment resource to submit it to the cluster. This creates the required pods which execute the FaaS inside the cluster.