Beyond Good and Error: Red Team

Serge Gershkovich
4 min readAug 19, 2020
Foto by Pixabay on Pexels

A Portuguese merchant fleet of forty two ships sat securely moored in Brazil’s Bay of all Saints laden with treasure pilfered from the New World and awaiting the arrival of two men-of-war to escort them back to Lisbon. The year was 1719 — the height of the period known as The Golden Age of Piracy — and just outside sighting distance, the notorious Bartholomew “Black Bart” Roberts was on the hunt for his next victim.

From port, even a pirate flag sighted on the horizon would not have caused much concern. The harbor not only gave the fleet strength in numbers, it also offered them the security of coastal batteries and garrisoned ground troops. Some of their larger ships would have even been capable of engaging Roberts on their own; bringing to bear upwards of forty cannon against his thirty two. The harbor was secure and Black Bart was hopelessly outgunned.

So confident were the Portuguese in their ability to deal with any adversary foolhardy enough to attempt an attack, that when a lone ship, flying proper Portuguese colors drifted into the bay, they simply took no notice. Black Bart sailed right into the middle of the fleet and quickly identified the richest vessel among them, the Santa Maria. Using the element of surprise, he captured the Santa Maria and escaped before a coordinated defense could be mounted.

The capture earned Bart a place in history as being among the top five most lucrative pirate prizes of all time as well as one of the most audacious. The Portuguese learned a lesson that day: even the strongest brute-force defense won’t save you from a trojan attack. They didn’t need men-of-war, they needed a “red team.”

The term red team comes from the realm of cybersecurity. Red team(ing) is the practice of rigorously challenging plans, policies, systems and assumptions by adopting an adversarial approach. A red team looks for weaknesses and vulnerabilities in what is already presumed to be a strong design. In short, it’s taking your own code and trying to find ways to break it. Despite the name, you don’t need an entire team of hackers to implement this technique. You can be a red team army of one.

Whether working on a fix or a new feature, a developer approaches the change with a concrete goal in mind. Assuming that due diligence in testing is carried out once the change is made, we can rule out compiling errors and count on the output being correct for purposes of this example. Nothing left to worry about, right?

Unfortunately, it is human nature to lose sight of our periphery when focused narrowly on a specific goal (don’t believe me?) Even during the course of our testing, we are focused almost exclusively on our task and can easily overlook its impact on related parts of the system. As the Portuguese learned: heavily-guarded doesn’t mean pirate-proof.

I like to begin by visualizing the day of deployment. I then take the “adversarial approach”: I run through every possible type of failure I can imagine and I ask if it’s plausible that my code is to blame. As an example, let’s say I was making a change to the logic used to calculate the product_type field for product “XYZ,” and:

  • A syntax error broke the build: Did I test the code exactly as it appears in the version control repository (and not the other way around)?
  • Product “ABC” has incorrect product_type: Since my change focused on product “XYZ,” so did my testing. Did I test the products outside the scope of my change to make sure they were not affected?
  • Downsteam apps/processes have failed: Have I looked bottom-up at where the product_type field is being referenced and ensured that the expected values are consistent with my change?
  • The system has ground to a halt: did I run my test on a data set of comparable volume to the one used in production?
  • Manual step not performed: Table alters or parameter changes tend to happen early in the development process and are often forgotten by the end. Am I forgetting something?
  • Company-specific problem X as a result of not doing Y: Deployments vary greatly depending on the protocols and tools that have been put in place to help store, deploy, and run the code you’re working on. There are many organization-specific idiosyncrasies to be aware of. Did I cover all the bases?

If the answer to any of the above is “yes,” then it’s time to go back and re-test. The validations I included in the example above are pretty rudimentary but they are far from obvious after days/weeks spent in intense concentration on a single change. Red teaming is not a checklist, it’s a tool to help you shift your mindset. It helps you unfocus from the specific, and refocus on the big picture.

I have lost count of how many times this method has proven effective in my work and I encourage everyone to try it. I can’t guarantee that you’ll spot an issue every time, but I promise you’ll sleep better on the night before “go-live.”

If the same technique can do everything from keeping your productive system stable, to preventing cyber attacks, to making sure that pirates don’t make off with your treasure galleon, it can undoubtedly be used to avoid disaster in many areas of your life. If you have a good example of a time when this technique came in handy in your life (or when you wish it had), I’d love to hear about it in the comments. Otherwise, stay tuned for the next article, where we’ll learn to kick ass, write better code, and chew bubble gum (even when we run out of gum).

This article is part of a series.

--

--

Serge Gershkovich

I am the author of "Data Modeling with Snowflake" and Product Success Lead at SqlDBM. I write about data modeling and cloud cost optimization.