Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Much Does it Cost? 💸 😬 #98

Open
1 task done
nelsonic opened this issue Apr 5, 2024 · 29 comments
Open
1 task done

How Much Does it Cost? 💸 😬 #98

nelsonic opened this issue Apr 5, 2024 · 29 comments
Assignees
Labels
discuss Share your constructive thoughts on how to make progress with this issue learn Learn this! question A question needs to be answered before progress can be made on this issue technical A technical issue that requires understanding of the code, infrastructure or dependencies

Comments

@nelsonic
Copy link
Member

nelsonic commented Apr 5, 2024

As noted by @LuchoTurtle in #97 (comment) 💬
This "hobby" app is costing us considerably more money than we originally expected. 📈
The most recent invoice on Fly.io was $48.61 for Mar 1 - Apr 1, 2024 https://fly.io/dashboard/dwyl-img-class/billing

img-classifer-fly-invoice-march-2024

The current month (April 2024) Amount Due is already $14.34 and we're only on the 4th!!

img-classifier-fly-invoice-april-projected

If we extrapolate the total will be 7.5 x ($14.34 - $5) + $5 = $75 💸 🔥

This is already more than we spend on our Internet & Phone bill ... 🤯
If the cost could be kept to $10/month it would be fine. 👌

Todo

  • @LuchoTurtle please have a think about what can be done to reduce or cap the cost. ✂️

I'm keen to keep this app available for people to test without having to run it on localhost. 💻
But if the casual visitor is costing us this kind of cash, imagine if this got to the top of HN! 😬

@nelsonic nelsonic added bug Suspected or confirmed bug (defect) in the code help wanted If you can help make progress with this issue, please comment! priority-1 Highest priority issue. This is costing us money every minute that passes. T25m Time Estimate 25 Minutes discuss Share your constructive thoughts on how to make progress with this issue technical A technical issue that requires understanding of the code, infrastructure or dependencies labels Apr 5, 2024
@ndrean
Copy link
Collaborator

ndrean commented Apr 5, 2024

Not sure to understand these costs: Fly.io pricing).

  1. 19760s = 5h29min20s => is this the uptime?

  2. Every time you start the app, you need to upload 2G of data of the models (volume is pruned with the VM), see Fly.io docs). This means you recreate a volume of 2G and load 2G into memory. Is this the meaning of the 2 highlighted lines?

@LuchoTurtle
Copy link
Member

LuchoTurtle commented Apr 5, 2024

@ndrean the models are not pruned with the VM every time the app is started (it was a misconception that was fixed in #82). Currently, the models are being saved in the volume and they are not re-downloaded every time it restarts. You can see it in the logs, actually:

2024-04-05T05:04:33.414 app[080e325c904168] mad [info] 05:04:33.414 [info] ℹ️ No download needed: Salesforce/blip-image-captioning-base

2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.414 [info] ℹ️ No download needed: openai/whisper-small

2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.415 [info] ℹ️ No download needed: sentence-transformers/paraphrase-MiniLM-L6-v2

The problem is that models take up a fair amount of space (Salesforce/blip-image-captioning-base, especially). So we're basically paying for every additional space the models take over the free tier.

However, the cost is definitely bigger with the RAM usage. There's no way around this, as it's used to run inference on the images being uploaded. Although GPUs are much better at this, the costs are severely higher.

@nelsonic unfortunately there's no way around this. There's been people using the application, which is great. But, as with any LLM/ML-based application, it's hard to make it in any free-tier cloud solution without putting the money in.

The app has already been optimized to reduce costs (inbound/outbound data reduced with persistent storages, reducing file limit of images by optimizing it before feeding into the model, reducing Hz of audio file on the client-side before feeding it into the model).

Unfortunately, we have to pull the plug. With increased activity, this "hobbyist" project sucks money (even more so when they're stopped, as shown in your image).

image

I'd love to have this online for any person to check it out. But it's not feasible :(

I'm going to shut down the machine now.
I'll keep the database, though. It has the images and index files saved. So we can still have the uploaded data and have the app running normally by just spawning a new machine whenever we want to. The machine will look for the index file in the database (since it doesn't have any on its own filesystem), download the models and the index file and it will resume where it stopped gracefully :)

@LuchoTurtle LuchoTurtle added question A question needs to be answered before progress can be made on this issue learn Learn this! and removed bug Suspected or confirmed bug (defect) in the code priority-1 Highest priority issue. This is costing us money every minute that passes. T25m Time Estimate 25 Minutes labels Apr 5, 2024
@ndrean
Copy link
Collaborator

ndrean commented Apr 5, 2024

@LuchoTurtle
Many thanks for your explanations.

I admit I don't understand the Fly.io docs ....🤔

Screenshot 2024-04-05 at 09 06 51

@LuchoTurtle
Copy link
Member

@ndrean the documentation are correct. The application's filesystem is wiped whenever they restart. That's why they offer volumes to place persistent data (data we want to keep in-between restarts). Currently, the models are inside one of these volumes, hence why they don't need to be downloaded.

What #82 did was fix the path of the volume inside Fly.io, which was previously incorrect.

@LuchoTurtle LuchoTurtle removed the help wanted If you can help make progress with this issue, please comment! label Apr 5, 2024
@LuchoTurtle
Copy link
Member

I've deleted the machine. We can spawn a new one whenever we want to.
I'm keeping this issue open for other people to see it as a reference of how much it costs to run this on fly.io (without GPU!)

@ndrean
Copy link
Collaborator

ndrean commented Apr 5, 2024

You opted for the machine below , didn't you? Where is it in the bill?

#fly.toml
[[vm]]
  size = 'performance-4x'
Screenshot 2024-04-05 at 09 02 28

@LuchoTurtle
Copy link
Member

I don't think they differentiate it on the billing page, unfortunately.

According to https://fly.io/docs/about/billing/#machine-billing:

Started Machines are billed per second that they’re running (the time they spend in the started state), based on the price of a named CPU/RAM combination, plus the price of any additional RAM you specify.

For example, a Machine described in your dashboard as “shared-1x-cpu@1024MB” is the “shared-cpu-1x” Machine size preset, which comes with 256MB RAM, plus additional RAM (1024MB - 256MB = 768MB). For pricing and available CPU/RAM combinations, see Compute pricing.

So they bill based on the preset per second it is used + any additional RAM we specify. Because the machine wasn't always being used, we didn't pay the 124 dollars that you showed in the picture.

@nelsonic
Copy link
Member Author

@LuchoTurtle didn't want you to DELETE the machine ... 🙃
Just wanted it to run more efficiently ... 💭
But if that is going to take too much time, fair enough. 👌

@nelsonic
Copy link
Member Author

nelsonic commented Apr 29, 2024

@LuchoTurtle quick question: (though probably a rabbit hole…)

  1. Do you think we could run the image Classifier App on a Server (custom build on-prem machine) @home connected to our main App via Back Channel such that:

A. Person uploads the image to AWS S3
B. This triggers a request to the AI BOX to classify it
C. AI BOX classifies the image and returns its guess

  1. How much would it cost to build a machine that can do basic inference?

This would mean that our only marginal cost would be electricity and no surprise bills when it gets to the top of HN.

asking cause if we could put together a decent machine for ~€600
including an NVIDIA GeForce RTX 4060 EAGLE with 8 GB GDDR6:
https://amzn.eu/d/5fr9N5J
IMG_2460

This could serve our needs quite well and we could run other models on it without ever having to worry about boot times etc.

Thoughts? 💭

@nelsonic
Copy link
Member Author

Though we probably have to spend a decent chunk on the GPU ...
https://www.reddit.com/r/MachineLearning/comments/17x8kup/d_best_value_gpu_for_running_ai_models/
image

We will use it for a few tasks so I think it's worth investigating. 💭

@LuchoTurtle
Copy link
Member

Currently, this project targets the CPU (by default, since running on GPU entails having specific drivers according to the hardware).
To run on GPUs, I think we only need to change a few env variables (https://github.com/elixir-nx/xla#usage) but further testing may be necessary.

Regarding which GPU to choose, I can't really provide an informed decision. I know vRAM is quite important.

image

Of course, I'm not expecting you to get a H100, that's wayyy too overkill. But it seems that the 3090 seems like a good compromise and a performance-to-cost ratio.

I'd hold on purchasing anything yet, though. It needs to be confirmed that inference can be run on the GPU with Bumblebee before making any purchases that can be rather costly :)

@nelsonic
Copy link
Member Author

Ok. Thanks for your reply. Seems like this will require some further thought. What do we need to do next? 💭

@LuchoTurtle
Copy link
Member

I'd need to check running locally on the GPU to see if it works. Since I've a 1080, it's CUDA, so it should work with a 3090, theoretically. I just need to know it actually uses the GPU first :)

@nelsonic
Copy link
Member Author

https://blog.themvp.in/hardware-requirements-for-machine-learning/
image

Used - like new - the 3090 with 24GB VRAM costs ~£650: https://www.ebay.co.uk/itm/176345660501
image

This is certainly more than we were spending on Fly.io but if it means we can do more with Machine Learning with a baseline load I think it's worth it. 💭

@ndrean
Copy link
Collaborator

ndrean commented May 2, 2024

My 2ç input.

If Whisper (Speech-to-Text) is the sink or bottleneck, can a cloud service be considered?
https://elevenlabs.io/docs/introduction seems to offer WS connection to stream down the response.

Screenshot 2024-05-02 at 15 45 31

Did not see figures on pricing.

Screenshot 2024-05-02 at 15 47 46

@nelsonic
Copy link
Member Author

nelsonic commented May 4, 2024

@ndrean your insight is always welcome. ❤️
Yeah, the speech part shouldn’t be the bottleneck, 🤔
and importantly the purpose of building our own project
Instead of using an API (or Google Lens) for image classification was to not send personal data to a 3rd party. 💭

We want to be 100% certain that an image we classify is not being used for any other purpose. Same goes for voice recordings. Ref: dwyl/video#91
While I might be OK with making a recording of my voice public, I know people who wouldn’t do it because they are way more privacy conscious.

@ndrean
Copy link
Collaborator

ndrean commented May 4, 2024

Fair point. If you open your machine and offer such a service, how do you guarantee the user's privacy? I mean, you store images on S3 - publicly available - and run a local database.
What is your architecture? The HTTPS termination would be a reverse proxy, so that any app served by your machine is routed to as a sub domain?
Also, is a simple declaration of intention enough? Something like "we don't store your data nor transmit them to any external service of any kind"?

@nelsonic
Copy link
Member Author

nelsonic commented May 4, 2024

It really depends on if we want to make the service public or just for people using our App. For people using our App they know we aren’t using their images for “training” and also won’t leak them. But we don’t have advanced access controls on images yet beyond restricting access to just the person that uploaded them.
Ideally once we have the “groups” feature, it will be easy to restrict access.
But if we were running the service as a general purpose privacy-first classifier, we’d just store the images in /temp and then delete them after classifying. 💭

@ndrean
Copy link
Collaborator

ndrean commented May 4, 2024

If you use the app as such, all images are saved altogether in your public bucket, and the corresponding URL is saved in a database, meaning a semantic search can deliver any image approximating your query, yours or not.

A simple login and the addition of the user_id in the image schema can overcome images from becoming publicly available, at least through the semantic search. But when you receive a response, you receive an URL to display the image. Doesn't the URL display the bucket origin name? Since the bucket is public, can't you exploit this?

But if we were running the service as a general purpose privacy-first classifier, we’d just store the images in /temp and then delete them after classifying.

Instead of an S3 URL, you use a path on the Filesystem. If you erase the uploaded paths after a search, you can run a search only once.

@nelsonic
Copy link
Member Author

nelsonic commented May 5, 2024

Predictably, someone has already setup a biz around renting GPU time: https://www.gpudeploy.com/connect
Via: https://news.ycombinator.com/item?id=40260259

@nelsonic
Copy link
Member Author

nelsonic commented Aug 1, 2024

@LuchoTurtle did you re-enable this App? We just got a invoice for $74.16 USD 😮

fly-image-classifier-$74 16-invoice

💸 🔥

@nelsonic nelsonic moved this to ⏳Awaiting Review in dwyl app kanban Aug 1, 2024
@nelsonic
Copy link
Member Author

nelsonic commented Aug 1, 2024

@LuchoTurtle If you're not applying for "AI Jobs" and needing to showcase this classifier App,
please scale it down and ensure that we do not get another surprise bill like this. 😢

@nelsonic
Copy link
Member Author

nelsonic commented Aug 1, 2024

Why on earth are we using this many resources for a non-production DB?! 😮
https://fly.io/dashboard/dwyl-img-class
image

@LuchoTurtle
Copy link
Member

LuchoTurtle commented Aug 1, 2024

@nelsonic I did not do any modifications to the app (you can check the activity logs on fly.io) except merging dependencies.
Last time I interacted with the CI to deploy to fly.io was to comment it and deleted all instances of any app running, leaving only the database to hold data for historical purposes.

Checking the logs of the db machine show no activity spike that would justify the price increase in the last month 🤔

Usage tab says the same thing.

Should I stop these machines? https://fly.io/apps/imgai-db/machines
image

Maybe they're the culprit. I'm stopping them, just in case.

image

The DB is now stopped.
There are no active applications within the organization, so billing should have stopped from now on.

image

@LuchoTurtle LuchoTurtle removed their assignment Aug 1, 2024
@nelsonic
Copy link
Member Author

nelsonic commented Aug 1, 2024

But if we are just holding the data in the DB does it need to have 4GB RAM? 🤷‍♂️

@LuchoTurtle
Copy link
Member

LuchoTurtle commented Aug 1, 2024

Probably not, but I haven't made any modifications to the db application. If you check https://github.com/dwyl/image-classifier/blob/main/deployment.md, I've only changed the Elixir applications to upgrade them, Postgres applications have been deployed with the default settings.

My fear is that maybe we're having costs with the volumes?

https://fly.io/docs/about/billing/#volume-billing

But we're only using 7GB at most, across three different volumes?
image

So they should fall under the Free Allowance...?

image

@nelsonic
Copy link
Member Author

nelsonic commented Aug 1, 2024

Indeed. "haven't made any modifications". To be clear: the June bill was the same!
image

May was $72:
image

So the change that was made in May simply wasn't enough for us to avoid these silly high bills!
It's just than in June I was so focussed on the work-work trip that I didn't even notice the Fly.io bill!

But it's there and we're spending more money on an App that nobody is using than all our other apps combined!

Please just scale down the RAM of the DB. if we don't need to be spending $74/month for storing "historical data". 🙏

And in future please don't unassign yourself from an issue until it's resolved, that's not what responsible adults do! "Not My Problem" is a classic unaccountable child reaction and would 100% get you fired in most serious organisations because it's the attitude that matters! You scaled up the DB to make this work - that's fine! - but you're responsible for scaling it back down so I don't have this silly bill appearing on my credit card!

image

image

🤦‍♂️

Part of being a "Senior" engineer is taking responsibility for your actions. Like spending the company's money.
I don't have $900/year to spend on a "Demo" app that nobody is using.
Please just fix it by scaling down the RAM to the bare minimum so I don't see this again. 🙏
I didn't ask you to DELETE the VMs for this App; in fact that's the exact opposite of what you should have done.
You should have invested the time to write the Ops code to automate the scaling so that it still works!
Not having the App instances but keeping the DB is the worst of all worlds because we are still paying but have nothing to show for it! 😢
Ideally you should have proactively written a few lines of code to count how many times the App gets booted/invoked so that we can cap it at a LIMIT and show a "sorry we are overloaded" page with samples/history but no expensive instances so that we don't incur a bill of thousands for being top on HackerNews #22

Apologies if this comes across as "harsh", but most people in the world don't have $900/year sitting around that they can burn on hosting a "demo" AI app ... 🙃 (maybe VC-funded startups do, I certainly don't!)

@LuchoTurtle
Copy link
Member

The amount of time I'd take to respond to this properly is definitely not worth it given my backlog of tasks. I'll let the comments about optimizations regarding this topic, issues, commits, the README.md's and related asked questions (e.g. https://community.fly.io/t/build-failed-with-elixir-bumblebee/15779/4) speak for themselves :)

  • All Bumblebee Elixir apps need to load models on startup. The list of optimizations that were made to reduce cold startups and processing times are thoroughly documented. This is why one needs more computing power to initially load the model.
  • It is impossible to run the wanted image classifier model without more processing power in VMs in fly.io. Scaling was already automated to minimize costs, given that the billing on these machines works on a "time usage basis". Each machine was turned off after an hour of inactivity (smallest number fly.io allows. The startup was optimized with volumes to reduce bandwidth and avoid re-downloading models. Again, all of this is documented. The reason that VMs were deleted was because, even with these optimizations to reduce bandwidth (and, in consequence, costs), the costs were still too high. Since I didn't have the proper time to dedicate to this topic, deleting the VMs (even if temporary) was the quickest solution to avoid unnecessary costs.
  • "Ideally you should have proactively written a few lines of code to count how many times the App gets booted/invoked so that we can cap it at a LIMIT and show a "sorry we are overloaded" page" - unfortunately, even though this was thought of before, this is not possible given how billing works in fly.io. Even if we implemented that process, you'd still need to have App VMs working to display that warning and you'd be billed the same, regardless if the app is not making requests to the DB and is throttled. Ultimately, 99% of the cost stems from provisioned computing power, regardless of usage. Because it's impossible to run the classifying model on a free-tiered machine, this becomes a non-starter for free apps.

I digress.

To reduce costs:

  • I initially (and erroneously) thought that only the volume used was priced. They price it on provisioned capacity
    after all. So, I've taken the steps necessary to pass over the data from Postgres volume that has 40GB to one that has 3GB.
image

This volume has 2GB but only around 180MB of data from Postgres. Most of the volume is used to load the model that is used when an instance boots up and finds no model, persisting in the model.

image

With all of the above in mind, I initially shut down the machines because it's where the higher cost was. The steps that I made addressed the extra computing power and paying for extra provisioned volume. A single node with 256MB of RAM and a volume of 3GB -> https://fly.io/apps/imgai-db/machines.

Now there should be no charge per month.

@ndrean
Copy link
Collaborator

ndrean commented Aug 2, 2024

And the winners are cloud service providers, and GPU companies more recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Share your constructive thoughts on how to make progress with this issue learn Learn this! question A question needs to be answered before progress can be made on this issue technical A technical issue that requires understanding of the code, infrastructure or dependencies
Projects
Status: Awaiting Review
Development

No branches or pull requests

3 participants