What Becomes Possible With 100x More Tokens?
Every time an LLM generates a response, it incurs a cost. For AI infrastructure, the total cost of ownership (TCO) is multifaceted, including – but not limited to – the capital expenditure (CapEx) for processing hardware, the cost of energy consumed per inference, and the costs of networking and cooling. Ultimately, the more tokens a system generates, the more expensive it is, and at current prices, those costs add up fast.
Unfortunately, in the real world, TCO often puts a hard limit on which applications are viable to pursue and which are permanently out of reach. Even if an application is practical and could benefit society, if the cost of processing tokens is too high, economic forces will stop it from ever coming to fruition.
But what if we could generate 100x more tokens for the same budget? What new possibilities and world-changing applications would open up if the cost of AI stopped being the deciding factor?
At ElastixAI, that is the world we are building toward. In this whitepaper, we look at what happens to two important areas of human life when the economics of AI generation no longer stand in the way.
AI for the Future of Drug Discovery
Imagine a child diagnosed with a rare genetic disorder that affects fewer than 50,000 people worldwide. Doctors share that no approved treatments exist for the disorder yet. Fundamentally, science is advanced enough to cure the disease, and pharmaceutical companies could use AI to find a cure.
But in the real world, companies have access to only a limited number of tokens. For treating a condition that is so rare, the return on investment is simply too small to justify spending their AI usage here.
Now imagine a different future where companies can freely deploy AI to cure any condition without concerns about token limitations. In this future, tokens are so freely available that companies can treat more rare diseases and help more patients without detracting from other priorities.
That future is possible today. The technology and science already exist. What is missing is the economic infrastructure to run it at the scale it demands.
Why AI Hasn't Solved It Yet
AI-powered drug discovery works through an iterative process: the system generates a population of candidate molecules, evaluates each one against target properties, generates refined variants based on what it has learned, and repeats the process [1][2][3]. Every step in that cycle is an inference event, and every inference event costs money. At current token prices, a research team might afford three to five rounds of this process before the budget runs out.
Unfortunately, the quality of a discovered molecule is directly tied to how many rounds the AI gets to run. The deeper the exploration, the better the result. In certain settings, AI-discovered molecules have already achieved up to a 90% Phase I success rate, compared to historical industry averages of 40-65%.
The potential is clear, but the economics are the ceiling.
What the Future Could Look Like
If researchers could access 100x more tokens at the same cost, LLM-based drug discovery would no longer be constrained by economics.
In this world, a pipeline that runs five rounds today could run 500. At this iteration depth, the AI can genuinely learn the shape of the solution space across thousands of generated hypotheses before committing to a direction. In turn, that deeper exploration yields better candidates, fewer dead ends in later-stage trials, and faster time bringing new treatments to market.
And because the per-experiment cost drops so dramatically, it becomes viable to pursue treatments for patient populations the industry has historically passed over. There are thousands of diseases like the one described above where the science is ready and the patient population is real, but the investment is too high to justify. In a world of 100x more tokens, that barrier is gone.
Even more radical, such systems wouldn’t have to wait to be directed toward a target. Capable of continuously generating and evaluating hypotheses, these systems could explore the entire landscape of known disease mechanisms, flagging novel therapeutic opportunities that no human research team had considered. A system that expands the agenda itself, these systems could generate leads in disease areas that have stalled, propose mechanisms that cross disciplinary boundaries no single research team would span, and do so around the clock.
With greater access to LLM-based drug discovery, researchers can bring new medicines to market sooner, reduce prescription costs, and save more lives.
A Future With Always-On AI
Consider the Chief of Staff at a 200-person startup. She’s expected to be the source of truth for her company, with eyes on every person and every project.
But every day, she processes hundreds of emails, sits through hours of meetings, and tries to remember the strategic context of dozens of ongoing projects. No human can hold that much organizational context in their head. In the real world, this kind of institutional memory loss is simply accepted as the cost of doing business at scale.
In a future with access to AI, that narrative changes. With AI assistants running quietly in the background of every organization, every email, meeting, and commitment can be stored and accessed at any time. And, with such a breadth of information, the assistants can provide genuine oversight, flag conflicts, and help teams make better business decisions.
In this future, the cost of continuous AI inference is low enough that always-on organizational intelligence is as economically routine as any other piece of business software.
Why Always-On AI is Limited
What many don’t know is that such a solution is not technically out of reach. The LLMs capable of this performance already exist. What doesn't exist is an economic model that makes continuous generation affordable at the per-user level.
An AI that watches, remembers, and reasons continuously generates an enormous number of inferences across a full workday. That number of inferences alone is cost-prohibitive, but, to confound matters, those costs also grow non-linearly as context accumulates [4]. Because AI systems must re-process all prior context with each new generation step, an assistant that has been running since morning is significantly more expensive to operate by afternoon.
To illustrate the scale of that cost, consider what a realistic deployment would require. A knowledge worker sends and receives roughly 200 emails, 150 messages, and participates in several hours of meetings each day. For an AI assistant to meaningfully catch conflicts, it needs to analyze each incoming communication against a substantial window of organizational context. Running at a reasonable monitoring cadence of every 5 minutes over an 8-hour workday, and processing around 100,000 context tokens per check, the assistant consumes roughly 9.6 million input tokens and 29,000 output tokens per day.
At current prices on OpenRouter, GPT-4o costs approximately $2.2 per million input tokens and $10 per million output tokens, putting that assistant at around $21.41 per employee per day [5]. For context, enterprise software tools like Microsoft Copilot cost around $0.60 per employee per day [6]. At that budget, an organization running GPT-4o can afford to cover approximately 2.8% of a workday – or roughly 13 minutes out of eight hours – before the budget runs out. The assistant is nowhere near being truly always-on.
Today, that compounding cost makes always-on AI impractical for most organizations. The assistant has to be periodic rather than continuous, only running when you prompt it, not continuously thinking alongside you.
A New Future for Work
With 100x more tokens at the same cost, that changes entirely. In this future, the AI can run all day and build context as it goes, unlocking a set of powerful new features for the organization.
Organizational memory: AI assistants could create a living map of who made what commitment, when, and why, so that context is never lost as teams grow and change.
Relationship monitoring: With oversight into all sales and client-facing conversations, AI tools could automatically flag when a client partnership has gone quiet for too long, before the relationship cools entirely.
Decision oversight: AI could provide real-time alerts when the team is about to repeat a past mistake, enabling better, more informed decisions moving forward.
Competitive intelligence: With an understanding of the specific challenges and decisions the company is facing, AI assistants could provide topical, context-specific daily briefings on industry news, patent filings, and competitor activity.
Institutional knowledge retention: When someone leaves the company, their knowledge doesn't have to leave with them. Where AI assistants have been quietly documenting correspondence and project status, institutional knowledge stays institutional, and onboarding new employees becomes easier.
For the Chief of Staff, this future means she can actually think strategically, because the exhausting work of maintaining organizational memory is no longer hers alone to carry. For all workers, it means wasting less time searching for context and more time doing the work that actually moves the business forward.
How ElastixAI is Making More Tokens a Reality
At ElastixAI, we are solving the high cost of tokens through innovative software-ML-hardware co-design. Rather than working with general-purpose, fixed architectures like GPUs, our solution leverages reconfigurable hardware that we optimize to the specific model and application at hand.
The process begins with a standard, off-the-shelf LLM provided by the user. The ElastixAI stack then performs the following automated steps:
Proprietary optimization: The system applies advanced post-training optimizations, which are often poorly supported by standard GPUs.
Processor generation: Instead of compiling code to run on a fixed chip, the ElastixAI solution automatically generates a processor design specifically optimized for the algorithms and bit widths of that unique model.
FPGA deployment: The custom design is scanned into an off-the-shelf FPGA in seconds.
Drop-in execution: The system is a backend replacement for the NVIDIA plugin, meaning users can retain existing OpenAI API or PyTorch workflows without change.
By combining state-of-the-art FPGAs with proprietary ML optimizations, ElastixAI's solution achieves a 5–50x cost-per-token advantage over standard GPU-based solutions, stemming from significantly lower hardware costs and 80% lower power consumption. For the same power budget, ElastixAI can unlock up to 50x more tokens for the end user, moving the world to a future where the applications described in this paper are finally feasible.
Unlocking a New World of Possibilities
Where the cost of generating tokens is prohibitive, potentially world-changing applications are dismissed as either unaffordable at scale or are financially restricted to organizations with large infrastructure budgets. But, if users had 100x more tokens for the same budget, the questions around AI deployment move from how many tokens an organization can afford to what AI can do. With ElastixAI’s unique approach to LLM inference, we are actively making this vision a reality.
References
[1] https://academic.oup.com/bib/article/26/1/bbae693/7942355
[2] https://www.biorxiv.org/content/10.1101/2025.07.02.662875v1
[3] https://arxiv.org/abs/2508.03444
[4] https://arxiv.org/abs/2603.20224
[5] https://openrouter.ai/openai/gpt-4o/pricing
[6] https://www.microsoft.com/en-us/microsoft-365-copilot/pricing