Atomic Operations with Firestore

Reddit
Linkedin

Write or not to write... it's not a question anymore ;-)

When talking about databases, atomic means one single thing, all or nothing. Sometimes we need to execute a few operations that can possibly create relations between each other, we need a way to guarantee that the whole process was successful or it did fail, one of the best examples is a payment processor, we cannot have grays on this process, it happened or it did not. This is a common practice on SQL databases, but in this case, Firebase team got your back.

Introducing Batched Writes

Batched writes are a set of write operations on one or more documents into a single operation, let's see an example:

Imagine we are creating an onboard process for venues owners, we need to know information about their locals, their services and if they have any promotions.

We decided to separate all this data into 4 collections onboarding venues services promotions using the following interfaces:

Onboard {
   _id: string
   user_id: string
   status: oneOf<'draft', 'ready', 'fail', 'completed'>
   venue: Venue
   services: Array<Services>
   promotions: Array<Promotions>
   ... any other information
}
Venue {
   _id: string
   user_id: string
   onboard_id: string
   ... any other information
}
Services {
   _id: string
   user_id: string
   onboard_id: string
   venue_id: string
   ... any other information
}
Promotions {
   _id: string
   user_id: string
   onboard_id: string
   venue_id: string
}

These mean that given the moment, we can query items from their view using the following query:

const db = firebase.firestore()
const services = db.collection(`services`).where(`venue_id`, `==`, venueID)
const = db.collection(`promotions`).where(`venue_id`, `==`, venueID)

Ok, assuming we are storing all the onboarding information on the onboard collection, once the status changes to ready we will trigger a cloud function which will perform the onboard and generate the documents and place them into their respective collections.

Note: we don't know how many services neither promotions a venue can have, let's assume for the sake of the example, they are more than 100. Our goal is to make all these copies at once, in case this fail, we won't have any orphan data or duplicated if we need to execute the same operation again.

diagram

const functions = require('firebase-functions')
const admin = require('firebase-admin')
const uuidv1 = require('uuid/v1')
const _ = require('_')

admin.initializeApp()
const db = admin.firestore()

export const onboard = functions.firestore
  .document(`onboard/{doc}`)
  .onUpdate((change, context) => {
    const data = change.after.data()
    if (data.status === 'draft' || data.status === 'completed') {
      // ! Not ready to process yet or alredy processed
      return Promise.resolve()
    } else if (data.status === 'ready') {
      return execOnboard(data)
    } else if (data.status === 'fail') {
      // ! We need an exit clouse in case the onboard keep failing
      if (data.retry >= 3) {
        return Promise.resolve()
      } else {
        return execOnboard(data)
      }
    }
  })

  const execOnboard = data => {
    const batch = db.batch()
      const venueId = uuidv1()
      const services = data.services
      const promotions = data.promotions
      
      const servicesRef = db.collection(`services`)
      // ! Create all services docs
      for (let service of services) {
         
      }

      const promotionsRef = db.collection(`promotions`)
      // ! Create all promitions docs
      for (let promotion of promotions) {
        const _id = uuidv1()
        const promotionRef = promotionsRef.doc(`${_id}`) // ! set the promotion inique id
        batch.set(promotionRef, Object.assign({}, promotion, {
          _id, // ! promotion unique id
          venue_id: venueId // ! create the relation with the venue
        }))
      }

      // ! Create Venue
      const venuesRef = db.collection(`venues`)
      const venueRef = venuesRef.doc(`${venueId}`) // ! set the venue inique id
      const venue = _.omit(data, ['services', 'promotions'])
      batch.set(venueRef, venue, {
        _id: venueId
      })

      // ! update the onboard status
      const onboardsRef = db.collection(`onboard`)
      const onboardRef = onboardsRef.doc(`${data._id}`)
      batch.update(onboardRef, {
        status: `completed`
      })

      return batch
        .commit()
        .catch(e => {
          const retry = _.isNil(data.retry) ? 0 : data.retry
          return db.collection(`onboard`)
            .doc(data._id)
            .update({
              status: `fail`,
              retry: retry + 1
            })
        })
  }

This is a lot to digest so let's go by part:

The cloud function needs to manage different status and ensure it does not hang forever in case the onboard is not ready to process or if it fails several times.

We need to call `db.batch()` in order to group all the operations, we need to execute.

We assume the object is going to be saved as they are, we create id for each element and make sure we attach the venueid for each element.

const _id = uuidv1()
const serviceRef = servicesRef.doc(\`${_id}\`) 
batch.set(serviceRef, Object.assign({}, service, { _id, venue_id: venueId }))

Assuming there are more than 100 services and promotions, batch operations is really useful here, just image we are not using batch but write the docs one by one, if the operation fails in the middle of the process, there will be a lot of data incomplete in order database and we probably will need to repeat the operations, so possible we will have a lot of duplication.

Also, we are changing several collections on the same batch operation.

When we are done we just commit the changes. batch.commit()

In every case, we are returning promises which are mandatory for cloud functions.

And that's how Batched Writes shine. A few notes:

  • Batches Writes are only for writes, if you need to read data, you will have to use Transactions.
  • This operation can be executed on the client (but it will perform better on the backend).
  • There is a maximum of 500 operations, ideally, keep it around 300.
  • Once again, it's all or nothing. Everything is copied or everything fails.

When to use it?

  • When you need to create different docs with linked between each other.
  • When you need to do several operations create/update/delete in a single operation.
  • Data migration.

Transactions

They're basically the same BUT you also can to read documents (and a little more).

Transactions shine when you need to do operations locking one the document, in other words, if you execute them from the backend, you can make sure that the values you read at the beginning of the operation are still pristine, it hasn't changed. From the client, this is not 100% the same, after the operation is complete, it will sanity check this the initial doc and if it changed, it will retry the operation with the new data.

It might sound abstract, so it's better with one example 🤷🏻‍♂️.

Imagine you have a banking account with 10 coins and you want to try to buy two items which cost 10 coins each, a user will try to hack our system performing the same operation at the same time if our buying process operation is not that fast, the user can successfully buy the two items and we have a problem 🧐.

diagram 2

Before getting into the code, there are some rules for the transactions:

  • Reads must precede writes
  • They might run more than once (if a concurrent edit affects a document that the transaction reads)

They can fail if:

  • Read operations after write operations
  • Read a document that was modified outside of the transaction
  • Exceeded the maximum request size of 10 MiB
  • Executed on an offline mode

Again, it's important to note that they will never apply writes partially, they are atomic, you should be familiar with this concept now ;-).

const functions = require('firebase-admin')
const admin = require('firebase-admin')
const uuidv1 = require('uuid/v1')

admin.initializeApp()
const db = admin.firestore()

export const execPurchase = functions.https.onCall(async (data, context) => {
  const userId = context.auth.uid
  if (!userId) {
    throw new Error(`Missing Authentication`)
  }
  
  const { itemId, quantity } = data

  const purchaseID = uuidv1()
  const userRef = await db.collection(`users`).doc(userId)
  const itemRef = await db.collection(`items`).doc(itemId)
  const purchaseRef = await db.collection(`purchases`).doc(purchaseID)
    
  // ! check if the
  return db.runTransaction((transaction) => {
    return transaction.get(userRef).then((userDoc) => { // ! lock user data
      // ! get the item information
      const itemDoc = transaction.get(itemRef)
      
      // ! check if user can buy
      const { coins, _id: userId } = userDoc.data()
      const { price, id: itemId, name, quantity: availableQuantity } = itemDoc.data()
      const total = price * quantity
      const canBuy =  total < coins
      if (!canBuy) throw new Error(`Not enough coins`)
      
      // ! check if there are enough quantities
      const enoughQuantity = availableQuantity <= quantity
      if (!enoughQuantity) throw new Error(`Not enough ${name}`)
      
      // ! update collections with new values
      transaction.update(userRef, {
        coins: coins - total 
      })
      
      transaction.update(itemRef, {
        quantity: availableQuantity - quantity
      })
      
      // ! register the purchase
      transaction.set(purchaseRef, {
        user_id: userId,
        item_id: itemId,
        quantity,
        total
      })
      
      // ! we are done! let's execute all the operations
    })
  })
}).then(result => {
  console.log('Transaction success!')
}).catch(err => {
  console.log('Transaction failure:', err);
})

Clean and elegant. Long story made short, set all your references and operations into the runTransaction closure and you are done. It might look a little weird but after doing a couple of them (and have a basic understanding of closures) you will own it!

Also noticed I used a callable function here, if you are not familiar with it, I will write a post about them cause they are the new big thing, but for now you can read about it here.

When to use it?

You need to exec operations based on a current document state (it should not change during the operation). You need to do multiple reads and writes. You want to impress your peers with new stuff.

Now you have a new tool under your belt for stop bug happening in production ;-)

Enjoyed this post? Receive the next one in your inbox!

I hand pick all the best resources about Firebase and GCP around the web.


Not bullshit, not spam, just good content, promised 😘.


Reddit
Linkedin