pythonflaskboto3AWScloud-computingweb-developmentbackendschool-project

Genomics Annotation Service

Private source code.

CLARIFICATION

Source code for school projects cannot be public.

Introduction

Genomics annotation service (GAS) was the capstone project for my cloud computing class at UChicago. As the product of ten weeks of dedicated work, this school project has a relatively larger scale and a more sophisticated design. GAS (fictitious, of course) is a cloud-native SaaS (software-as-a-service) product deployed on AWS that provides (fictitious) genomics annotation service.

Project Architecture

A simplified illustration of the project architecture:

GAS project structure

GAS consists of three servers:

A frontend server: when a user visits GAS, all the pages are generated by this server. Yes, GAS is server-side-rendered.
An annotator server: the "core business functionality" - genomics annotation - is run on this server.
A util server: supporting utilities, including email notifications, file archiving, file thawing, and file restorating are handled by this server.

These three servers coordinate with each other asynchronously with message queues (AWS SQS). As a cloud-native application, many AWS services are used as well.

Technologies & Tools

Backend framework: Flask. GAS is server-side-rendered, and we use Jinja2 to create all the HTML templates.
Boto3 and AWS CLI to interact with AWS (we also use the console, of course).
EC2 instances to host the web servers (in this project, we do not use containers, to keep things relatively simple).
S3 for persistent storage of the users' files. We also use S3 Glacier to archive non-premium users' files. This is typically for cost effectiveness.
Lambda for file restoration.
DynamoDB for KV storage related to users' job requests (to keep track of job status).
Simple Notification Service (SNS) and Simple Queue Service (SQS) for asynchronous inter-server communications (known as the publish-subscribe pattern).
Simple Email Service (SES) for email notifications.
Step Function for scalable waiting.

Features

Core business functionalities
- File annotation
  A user can upload a file through a pre-signed S3 POST request. The annotator server will then process the user's file.
- Tracking job status
  Users can see a list of their past job requests, and they can check the real-time status of an ongoing job. All job status is maintained in DynamoDB.
- Downloading results
  After an annotation job finishes, the user can download the resulting file through a pre-signed S3 URL.
- Email notification
  After an annotation job finishes, the system automatically sends an email to the user, which contains a link to the resulting file.
Tiered user features
- There are two user tiers: free tier and premium tier.
  - Free tier: files get archived to S3 Glacier after a certain period of time. Input files have a size limit.
  - Premium tier: files are always available. No size limit on input files.
- Upgrading
  A free-tier user can upgrade to the premium tier by subscribing to a Stripe plan. Upon upgrading, all archived files are automatically thawed and restored.
- Downgrading
  Certainly, users can unsubscribe and downgrade as well.
Engineering
- Decoupling
  The servers follow the high-cohesion-low-coupling principle. Each server is completely decoupled from the others, and their communications are asynchronous and rely on a scalable publish-subscribe pattern.
- Error handling
  Proper error handling played a critical role in grading the project. Therefore, I implemented extensive error handling in the project to make sure it is robust.

Challenges & Learning Outcomes

Challenges

The most challenging part of this project is that despite its large scale and cloud-native nature, very little specific instruction was given about how to implement it. Every week we were assigned some general goals, and we needed to read the documentations and do the research ourselves to figure out the implementation details. This is also why I like this project a lot - it is quite realistic, despite being a school project.

Learning Outcomes

Through this project, I

gained practical experience with AWS,
gained practical experience with Python and Flask,
gained practical experience with cloud computing,
learned how to read documentations and look for the relevant information,
accumulated project design and development experience.

Extensions

Here I list some potential extensions to this project.

Extension	Explanation
Distributed Servers	In this project, there is only one frontend server, one annotator server, and one util server. To make the project more scalable, we can define an auto-scaling rule and put the servers in auto-scaling groups (ASG). We can set up elastic load balancer (ELB) as well.
Convert to Web Hooks	Instead of (long) polling the SQS queues, create web hooks for the annotator server and util server and subscribe them to the SNS topics.

Genomics Annotation Service ​

Introduction ​

Project Architecture ​

Technologies & Tools ​

Features ​

Challenges & Learning Outcomes ​

Challenges ​

Learning Outcomes ​

Extensions ​