By Christopher Berry in newsletter — Dec 10, 2024

Practically tractably intractable

This week: Braintrust evals, inference scaling, tractable agreement protocols, overlooked use case of GenAI

Custom LLM As A Judge

Braintrust evals are composed of three components — a prompt, scorers, and examples

“In all cases, you should strive to evaluate your results, so you can rigorously assess the impact of each change.”

https://cookbook.openai.com/examples/custom-llm-as-a-judge

Inference Scaling for Long-Context Retrieval Augmented Generation

More than 4000 tokens

“By systematically studying the performance with different inference configurations, we demonstrate that RAG performance improves almost linearly with the increasing order of magnitude of the test-time compute under optimal inference parameters. Based on our observations, we derive inference scaling laws for RAG and the corresponding computation allocation model, designed to predict RAG performance on varying hyperparameters. Through extensive experiments, we show that optimal configurations can be accurately estimated and align closely with the experimental results. These insights provide a strong foundation for future research in optimizing inference strategies for long-context RAG.”

Yue, Z., Zhuang, H., Bai, A., Hui, K., Jagerman, R., Zeng, H., ... & Bendersky, M. (2024). Inference Scaling for Long-Context Retrieval Augmented Generation. arXiv preprint arXiv:2410.04343.

https://arxiv.org/pdf/2410.04343

Tractable Agreement Protocols

If we decide to hire agents as negotiators, they may end up getting better at changing their own minds before they try to change ours.

“Our work suggests a third approach: We make computationally and statistically tractable calibration assumptions that are strict relaxations of Bayesian rationality, and hence are satisfied by perfect learners, but do not require implausible assumptions. In the case of agreement theorems, we have shown that these tractable calibration conditions were all that was needed from Bayesian rationality, in that we are able to prove (and generalize) agreement theorems that recover the same quantitative bounds that were known under full Bayesian rationality under our weaker assumptions. Is this a more general phenomenon? Perhaps in many other settings in which Bayesian rationality was previously thought to be a necessary modeling assumption, the same results can be obtained under significantly weaker calibration-based assumptions that can be guaranteed by efficient online calibration algorithms of various flavors.”

Collina, N., Goel, S., Gupta, V., & Roth, A. (2024). Tractable Agreement Protocols. arXiv preprint arXiv:2411.19791.

https://arxiv.org/abs/2411.19791

The Overlooked GenAI Use Case

It’s data analytics

“The results are revealing: relative to the current interest level in the enterprise, GenAI for data analytics and decision support is receiving limited attention. In this category, I include cleaning, processing and analyzing data to answer business questions.”

https://blog.sumble.com/the-overlooked-genai-use-case/

https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#business-value

Reader Feedback

“What’s the difference between an algorithm envelope and a data silo? No, really though.”

Footnotes

For some, digital transformation meant decision automation. Others appeared surprised by the automated aspects of digital. That variance might have allowed for some deeper disruption than would have been otherwise possible.

The latest wave of AI is different in part because decision automation is more known. For some, decisions are promises, personal commitments, categorical pledges, to take ownership of a course of action. For others, decisions are symbolic. Symbols have symbolic value. For some, the stakes have never been higher. For others, the value of symbols is merely symbolic: what’s the outcome? And for others still: some decisions simply should not be automated. This might force some organizations to confront questions they’ve been successful in avoiding: which decisions should be automated, and which should not?

Do my views intrigue you and you wish to subscribe to my newsletter? Subscribe here.