How we develop software at Eighty20
Dealing with Data means dealing with Software. Sometimes it makes sense to use off-the-shelf software solutions that others have spent years developing and sometimes it makes sense to build your own. At Eighty20, we like to live by our pragmatic name and there are many instances when we absolutely just do the former. However, there are lots of cases where it has definitely been the right decision .
Open Source
When we do, we’ve generally opted for Open Source Software (OSS) solutions, such as Python, as our preferred programming language and PostgreSQL and MySQL as our preferred transactional database technologies. As the exception that proves the rule, we use the proprietary Vertica database for our analytics workloads. There are a few reasons for this skew towards OSS:
- 1. We service multiple clients across multiple industries, each with their own preferred technology stacks.
- They may operate a Microsoft ecosystem, or Oracle or SAP or Salesforce or some combination.
- Given the complexity and deep integration of these systems and technologies, it would be impossible for us to attempt to become experts in all of them.
- Many of our clients have vendors who sell them these components and then manage them on their behalf for just this reason.
- One of the guiding principles behind OSS is to do one thing, do it well and make sure you’re able to interface with different components that do other things.
- By not buying completely into any particular one of our client’s proprietary tech stacks, we give ourselves the ability to interface with all of them.
- That said, we still aim to have a good enough understanding of those tech stacks to make sure that we can interface with them effectively.
- 2. We don’t want to spend time solving problems that have been solved before.
- Most technology problems are like this and some helpful person in the open source community, who’s encountered a similar problem before, has probably published their solution.
- So, if we need to query a particular type of database we’ve not encountered before, somebody has probably built a Python library that does this and pushed it to GitHub.
- Or, if we need to run our own instance of that obscure database, we can pull an image off DockerHub and run that, without needing to fight to get all its dependencies installed before it will even start.
- Of course, there are risks with just running random code one finds on the internet and so we follow established best practices for ensuring that libraries we use are vetted and safe, using tools such as Snyk.
- In a worst case scenario, we’re able to fork an open source library and modify the code ourselves to ensure that it’s safe to use.
- 3. There’s one other small matter: OSS is generally free to use.
- Some things are definitely worth paying for and it’s very easy to get caught in a trap of spending hours instead of dollars getting something to work. However, we’ve developed some good patterns that give us confidence that we can build solutions efficiently off our base tech stack.
- For once-off projects and clients, or even for producing a proof of concept for an existing client, this low cost to just get started certainly has its advantages.
Our Swiss Army Knife
The saying goes that Python is the second-best programming language for everything. We’ve certainly found this to be the case.
For any specific problem or domain, there generally are languages that are probably better suited to that specific arena. Good examples are R for statistical analysis (which we still use when it makes sense to), GoLang for distributed processing and C or C++ for raw computational speed. However, those other languages either don’t work all that well outside of their specific domain, or they require a significant amount of time, training and experience to use properly without resulting in memory leaks or other subtle bugs.
The reason that Python can be used for basically anything, is not that it was designed to be an all-purpose generalist language but rather that it was designed to be easy to write and, perhaps more importantly, easy to read. A famous quote by Robert C. Martin (Uncle Bob) from his book “Clean Code: A Handbook of Agile Software Craftsmanship” goes: “Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. [Therefore,] making it easy to read makes it easier to write.”
Python reads like pseudo-code, an almost plain English description of what the code is attempting to do. Because of this, many people across varied disciplines – without much coding experience but with incredible domain knowledge – have gravitated towards Python. Together with more experienced developers, they have created libraries and frameworks that make working with Python in these domains easy and possibly even joyful.
- Good examples of these are:
- Django and Flask for Web Development
- Pandas and Scikit Learn for Data Analysis and Machine Learning
- Jupyter notebooks for exploratory processing and analysis, weaving code, commentary and results into a single document
- Apache Airflow for ETL Pipelines
- Command Line Interface (CLI) tools and shell scripting (both the docker-compose and aws-cli tools are written in Python).
Although the way the language actually gets used in each of these applications can be vastly different, having experience using it in one domain definitely grants transferable skills in being able to pick it up in a different domain.
This ability to apply the same base skill set to different problems and domains is a trait that our people share with our language choices. Although we do have subject matter experts in our teams, who have tremendous business, statistical and technology knowledge, our modus operandi is to recruit smart young people who are really good at problem solving and critical thinking, then train them up on the specific domain knowledge they might need. That training goes all the more easily when you’re using a language like Python that’s easy for them to get into, start reading what already exists, and start writing code and contributing to it.
- Python Flask web server for the backend API
- SqlAlchemy to communicate with the database
- Alembic to manage database migrations
- Swagger for API documentation
- Marshmallow for (de)serialisation of requests and responses
- All the above have companion libraries that wrap them nicely for use in Flask
- React.js for the frontend user interface
- PostgreSQLÂ database for transactional workloads and Vertica database for analytics workloads
- Apache Airflow for building ETL pipelines and orchestrating automated processes
- PowerBI for dashboarding and ad hoc graphical data analytics
- Docker and docker-compose for building reproducible container images for deployment across environments.
When we get to production we generally use AWS as our default cloud environment (though we have some decent experience on Azure and GCP as well).
- Where possible, we use AWS cloud native services to reduce our workload of maintenance and management
- Managed RDS Databases
- Lambda and API Gateway for managed scaling of high demand APIs
- AWS Systems Manager for patching and maintenance tasks on EC2 instances
- CloudWatch for logging, monitoring and alerts
- CodeBuild for automated builds of production images
- All configured as Infrastructure as Code (IaC) using the AWS Cloud Development Kit (CDK).
For our local dev environments, we all generally still live on Windows Laptops. This is mainly because, as a consulting business, we still need to make regular use of Microsoft Office products, especially when exchanging documents with clients. Until recently this has been at odds with our proclivity for Linux based tooling, but Microsoft has made some really good steps recently in enabling Linux-friendly development on Windows, using the Windows Subsystem for Linux (WSL) as a launching point.
- WSL running the latest LTS version of Ubuntu (20.04 at time of writing)
- Pyenv for managing multiple Python versions
- Pipenv for managing multiple Python virtual environments
- PostgreSQL database running on WSL for local synthetic development data
- Pytest for writing and running unit tests
- Flake8 for linting
- Black for autoformatting
- Jupytext for syncing Jupyter notebooks as raw python files for better diffing of changes
- Windows Terminal for managing multiple shell sessions
- Visual Studio Code with the Remote Development extension for running code inside WSL
- Docker Desktop for running docker containers within WSL
- DBeaver as a desktop client for querying databases
- Postman as a desktop client for querying HTTP APIs
- Bitbucket as our remote git repo and using Bitbucket Pipelines for running automated testing and Continuous Integration (CI)
- Jupyterhub and Rstudio running on internal servers and accessible via web browser, for prototyping and ad hoc analytics
- ClickUp for managing our tasks and sprints.
This local dev solution and environment are a continuous work in progress. The fact that “try running wsl.exe –shutdown” is a common troubleshooting response on our team chat is testament to that. But we’re definitely a lot better off than we were, say, three years ago. We can onboard new team members relatively easily and our more experienced devs are able to work productively both independently and collaboratively.
We’ve recently started playing around with dev environments inside docker containers with integrated VS Code support. However, we’ve found this to be even less mature, especially when managing user and file permissions. At the pace these technologies are progressing, though, it wouldn’t be surprising if we’re using that or perhaps even some other solution when we update this post in another three years from now.