Investigate Firestore ingestion optimizations #11

rviscomi · 2023-10-25T13:52:14Z

We're using Firestore as the intermediary storage layer for the API. The problem is that we're only able to import 500 rows of data at a time, so it's taking a very long time and creating issues with the initial backfill.

Investigate whether it's possible to import the entire table in one go, or at least in larger batches. This will speed up the backfill and monthly import jobs and also simplify the pipeline.

tunetheweb · 2023-10-25T13:57:35Z

Hmmm from a quick Google it does look like it's limited to 500 "operations":

maceto · 2023-12-04T15:01:14Z

Hi @rviscomi @tunetheweb,

I think we can close this issue, with Giancarlo help we were able to incorporate the process into DataFlow pipeline. For the full historical process takes severals hours and for last month updates takes under 25 mins.

Dataflow does a great job processing in parallel and inserting into Firestore.

rviscomi · 2023-12-04T15:11:29Z

Awesome! Were does that Dataflow code live, and are there any remaining documentation tasks worth tracking?

rviscomi assigned maceto and tunetheweb Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Firestore ingestion optimizations #11

Investigate Firestore ingestion optimizations #11

rviscomi commented Oct 25, 2023

tunetheweb commented Oct 25, 2023

maceto commented Dec 4, 2023

rviscomi commented Dec 4, 2023

Investigate Firestore ingestion optimizations #11

Investigate Firestore ingestion optimizations #11

Comments

rviscomi commented Oct 25, 2023

tunetheweb commented Oct 25, 2023

maceto commented Dec 4, 2023

rviscomi commented Dec 4, 2023