Firestore Automated Backups

Reddit
Linkedin

Anything that can go wrong will go wrong

This is part 4 of a series of devops 101 with firebase cloud functions.

You can check the working repo here.

  1. Firebase Cloud Functions CI/CD with Cloud Build
  2. Firebase Cloud Functions and Slack notifications
  3. Firebase Cloud Functions Logging events
  4. Firestore Backups
  5. Firebase Disaster Recovery

Ok friends, it's me again… let's talk about what happens when bad stuff happens (?)… have you ever had that feeling on your stomach? you did something wrong 💨… your mom arrived and you didn't clean your room now she is mad af 😤… an intern just truncated the database 😳… Karen just went crazy and made a massive update on all your docs 🤳🏻… your dog ate your database 🐶 … you were just hacked and your data is not there anymore 🇻🇪… you name it. Shit happens and you have to be prepared to mitigate them and for us, right now, mitigate means making backups, period.

I will make it short and go direct to the point. Let's automate our backup process so it runs one a day. I was making research a long time ago about the best way to do this a found the following:

  • Official Samples: this is good, but I needed something easier 🏚
  • Some people told about querying the collections and save them in a json file. I have collections with 100k+ documents. This would expensive af 💰
  • Run a cron job on my computer to export the data 🤯
  • Use the real-time database cause it has a built feature for this (?)

A lot of people solve this problem in many different ways, I needed something easier so I made a combination of almost all of them plus the beauty of cloud functions and the firebase cli.

Let's do this 

We are going to export all our collections and save them into a storage bucket.

  • Firebase Blaze plan
  • Set one IAM permission
  • Create a storage bucket (to save your backups)
  • Create a small cloud function
  • 10 minutes of your time (yeah baby, that's why I came here)
  • Follow me on twitter and subscribe to my newsletter (this part is important #not)

Set IAM permission

Go to the IAM section of your problem and find the App Engine service account , you will need to add the role Cloud Datastore Import Export Admin 

  iam-foles-one

Create a bucket

Go to the Storage section and create a bucket, mine is called devops101-backup-backup 🧳

  iam-roles-two

Are old backups useless? 🗑

This will depend on your use case, do your homework. In my case, 3 days is good enough, data changes a lot so basically, 72 hours means useless data, maybe your case is different, this is up to you. Play it safe, storage is cheap… like very cheap… In my case, I want to delete everything after 3 days, so I keep only up to date data. On your bucket list, click the lifecycle options.

  storage

There you can set policy like this:

  storage-two

The function

Ok, so far we have done standard jobs, this is the part where the magic happens. what do we need?

  • A cron function
  • A couple of dependencies
  • Deploy the project

Cron functions?

Just a fancy way to name a cloud function that will run on a given schedule, if you come from the AWS world, this would be the equivalent to use CloudWatch and a Lambda but cooler.

Dependencies

npm install --save dateformat google-auth-library

Javascript

import * as functions from 'firebase-functions'
import * as dateformat from 'dateformat'
import { auth } from 'google-auth-library'
import { backupSlackNotification } from '../helpers';

import { IncomingWebhook }  from '@slack/client'
import { logErrors } from '../helpers/index';

const environment = functions.config()
const webhook = new IncomingWebhook(environment.slack.deploymentWebhook)

export const generateBackup = async () => {
  const client = await auth.getClient({
    scopes: [
      'https://www.googleapis.com/auth/datastore',
      'https://www.googleapis.com/auth/cloud-platform' // We need these scopes
    ]
  })
  const timestamp = dateformat(Date.now(), 'yyyy-mm-dd') // a nice way to name your folder
  const path = `${timestamp}`
  const BUCKET_NAME = `YOUR_BUCKET_NAME_HERE`

  const projectId = await auth.getProjectId()
  const url = `https://firestore.googleapis.com/v1beta1/projects/${projectId}/databases/(default):exportDocuments`
  const backup_route = `gs://${BUCKET_NAME}/${path}`
  return client.request({
    url,
    method: 'POST',
    data: {
        outputUriPrefix: backup_route,
        // collectionsIds: [] // if you want to specify which collections to export, none means all
    }
  }).then(async (res) => {
    console.log(`Backup saved on folder on ${backup_route}`)
    // @ts-ignore
    await webhook.send(backupSlackNotification(`completed`)) // notify slack we are done
  })
  .catch(async (e) => {
    await logErrors(e, { message: e.message })
    // @ts-ignore
    await webhook.send(backupSlackNotification(`error`)) // notify to slack something back happened
    return Promise.reject({ message: e.message })
  })
}

google-auth-library is a really cool project. It allows you to identify your current environment and set permissions based on it, in this case, it will look for the IAM Role we set before. Name the folder using dates it's a good practice cause it will allow to you identify quickly which is the latest backup later. Lastly, it's optional, you can notify your slack channel each time you run this process and report on your errors logs if something goes wrong. More about this on my previous post.

Subscribe the cloud function

Straight forward, we need to export our function on the index.ts file and past a cron expression, on this part, you need to do your homework again:

  • Time is based on the server time (arent we serverless?)
  • For me, will be every day at 00:00am
  • If you need help to configure your cron expression, this site is my jam
  • There are a few English expressions as well
import * as functions from 'firebase-functions'

export const automatedBackups = functions.pubsub
    .schedule('0 0 * * *')
    .onRun(generateBackup)

Test

After deploying your function, you can trigger the function using the Cron Schduler like this:

  cron Voila! if everything went good, you can go and check your bucket, but notice the backups are not on a readable format, so you will basically have to trust they are ok :).

Conclusion

That's basically it, this is the easiest way I've found to automate my backups until the feature is supported by one click. On my next post, I will teach you how to actually use that backup and how to go back to the game as soon as possible after something bad happens. If you liked this content and want to be notified when the next part is released, please consider subscribe to my newsletter or follow me on twitter.

Peace out! check the repo here.

Enjoyed this post? Receive the next one in your inbox!

I hand pick all the best resources about Firebase and GCP around the web.


Not bullshit, not spam, just good content, promised 😘.


Reddit
Linkedin