30 Comments
User's avatar
⭠ Return to thread
Alberto Romero's avatar

I had an answer ready until you said "offline and self-hosted" and then "run on a consumer PC." That really makes it much harder. As you surely know models are getting much cheaper, but it's much harder to achieve the efficiency that Anthropic, Google, and OpenAI are obtaining with models that you self-host. I don't think you can make Llama run as efficiently.

First, the API companies compete with one another so they're incentivized to go as low as possible, even *too low* while they try to make up for the costs somewhere else or because they have someone else's money (OpenAI has Microsoft's and Anthropic Google and Amazon's). Second, Meta isn't worried about making Llama inference efficient because they're merely training it for you to download it and do whatever. But those aren't finetuned or adapted to your use case. You have to do that yourself.

Anyway, if you're not willing to relax some of your requisites, I'd say Meta's models are the way to go. Mistral also. The thing is your use case (knowledge management on large documents) isn't well-handled by models that are too small and could run well on a consumer PC (even if it's high end). I'm sorry I'm not able to give you a satisfying answer!!

Expand full comment
Christian's avatar

Thanks in any case!

If there were any safe options for using the online, i.e. API-based, versions that would ensure privacy and avoid intelectual property being automatically leaked over, that could surely be a much better solution. Although (to my knowledge) the APIs may be configured to (presumably) avoid data sent to be included in further training the corresponding model, there's no guarantee that the AI providers won't get hold on whatever is technically entering their domain ..

Expand full comment
Alberto Romero's avatar

That's right. Officially you can configure it so that models aren't trained on your stuff etc, but in practice it's not that easy. As Mark suggests above, using RAG can be of help for your specific use case, which is mostly retrieval.

Expand full comment