Forum Documentation Showcase Pricing Learn more

Importing large number of records - 3.2M records in a table


#1

I have tried to enter a large number of records via CSV and the import stops at various points. I have 285,000 records and it stops at 35,000 or 45,000.

I have broken that down now into 20,000 chunks, but it is incredibly slow. I have added a data boost and this has not helped in the performance.

If I was just doing this in SQL it would be a matter of minutes, not hours or days. I understand that there must be more care taken when on a shared db storage system, but wondering if there is a tool or way to do this a bit more programatically.

Further to that, is there a way like I would do in SQL to run an update in the future if my master db list (stored elsewhere) has updates, inserts, deletions and changes via the Modify" tool in the backend? I searched for a plugin that might help with this but have not found anything.

I hope I am missing something.

THanks

Craig


#2

We continue to struggle with getting the data from another source into bubble, but we have now after a week successfully brought in 3.5M records in one table and 300k records in 4 other tables.

After many attempts, we realized that the API connection to a shared instance is not perfect, whether it is the load on the bubble server from others or if it is related to any latency across the path to connection. (I am on a 600mb cable modem that does 20mb upload, but seems to vary pretty dramatically).

We ended up writing a php/curl script to talk to a custom endpoint API created in bubble to receive the records and to return an error code if the records is not created.

We initiated 8 instances of the script and dialed up the CPU blocks to get to the magical 1K API/Minute. It ended up taking 11 CPU blocks to do this! It seems like a lot of money just for importing data, but none the less, we have most of the data in after 5 days of script running. :slight_smile:

I understand that since our 3.2M record table connects to two different data types, that this is probably the only way to get the data in, we could not use the backend upload or modify with any success.

Now, finally moving back from data processing to MVP development


#3

Hi Craig, I am glad you found a solution. Have you also considered, instead of using Bubble’s database and uploading all your records to rather use your SQL database instead. You can connect Bubble to your SQL tables using the SQL database connector plugin. Using stored procedures you’re not limited to the number of records you can create, read, update and delete directly within Bubble.

I am just curious, what is the reason you wanted to use the Bubble database instead of keeping it all in SQL?


#4

Hi Leonidas,

Great question. As we are familiar with AWS we had thought keeping this in an RDS db separately, but we had two reasons for exploring it in bubble.

  1. To see how it would handle the records and understand where the limitations of scale are. We are like many hoping to have an application that grows to a huge number of users. Our competitive has over 100m users, so we thought it prudent to look at this.

  2. This table is so integral to our system, even though the data is supplied by a third party, we have a large portion of our functionality tied to these tables.

Further, we don’t actually have the original source data in our own SQL, we get it via CSV and then i imported it into mysql so I could better manage cutting it up and putting it into bite size chunks multiple scripts could handle and track the segmentation of the data. I can not open a 3.2M record csv in excel, so I know MySQL well enough to manage it all. So each quarter when the data is updated, I get CSV files.

My thought is to either keep it in another SQL DB local and build an application that feeds it to bubble or keep it in adjoining cluster at AWS/RDS.

To be honest, we are not sure how we will handle this in the long haul. We are hopeful that bubble can scale with a dedicated cluster, and with the ability to dial up the elasticache and db CPUs independently, that it would not be that much different than a traditional AWS cluster running laravel or django or some other middleware/application server, on a “hand coded” application server.

We see the huge benefits on development time and turns on iteration. We had originally thought, “let’s just build the MVP in bubble and then use that as reference to build the production app”, but we are now hopeful it can reside in bubble. Tools for big DP will be required I think for Bubble to become the large IT shops tool of productivity, which it could be(I believe) , freeing up IT staff to focus on their job and business people who own the business intelligence to build apps for their internal customers.

Anxious to see how everything works once we build the full app, but I am concerned at this point as if I go to the data tab and search for a unique number in the 3.2M database table, bubble gives me an error, “come back later” response. So I am not sure how to get around that without building page that admin this table separately. I assume an “any field” query like the canned one in data tab is not going to hack it.

I dialed up to 11 CPU units to get this to be able to achieve the 900-1000 Api/calls a minute on a shared/public instance of bubble.

More than you asked for, but would love your thoughts? We have only been 45 days into bubble.

Craig


#5

How is the search speed on Bubble with large amount of records like this? 1. If you do a search based on ID
2. You add more filters

I’m just interested to know


#6

it seems very quick given the number of records.

The way my data is constructed, I have 260,000 items. each of those items has 1-25 releated items in another table, which ends up being 3,300,000 records. Rarely will I do a user search on the 3.2M records, but will often search the 260,000 and want to refer to, join or add the related table in the report or have a “show detail” group that shows the related records.

So far, so good. I am assuming once we get to the point of needing a dedicated plan, we will want to give the DB CPU and Cache some good resource! $$$