By Christopher Berry in newsletter — Nov 26, 2024

Agents, Agents, Agents

This week: LangChain, Taxonomy of Agents, GAI, GUI Agents

Langchain State of Agents Report

Humans in the loop

“Despite the excitement that folks have projected onto agents, most are taking a more conservative approach when it comes to how far we’ll let agents go off the leash. Very few respondents allow their agent to read, write, and delete freely. Instead, most teams allow either read-only tool permissions or require human approval for more significant actions, such as writing or deleting.”

“The inherent unpredictability of agents using LLMs to control workflows introduces more room for error, making it tough for teams to ensure that their agent consistently provides accurate, contextually-appropriate responses.”

https://www.langchain.com/stateofaiagents

A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents

Introducing….AgentOps

“LLM-based agents are receiving increasing attention across various fields, but the reliability of their outputs poses significant challenges for developers. The introduction of the AgentOps platform is designed to streamline the agent development process, making it more efficient, observable, and reliable.”

Dong, L., Lu, Q., & Zhu, L. (2024). A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents. arXiv preprint arXiv:2411.05285.

https://arxiv.org/abs/2411.05285v1

Generative AI Is Still Just a Prediction Machine

Good judgement

“The proliferation of generative AI tools poses serious questions for managers, such as: What tasks can be done by AI, what will humans still need to do, and what are the sustainable sources of competitive advantage as AI continues to improve?”

“While AI excels at making predictions and generating content, it cannot replace human judgment.”

“Organizations with good judgment will thrive as AI diffuses.”

“While generative AI tools may appear different than predictions of fraud, failures of generative AI initiatives are almost always failures of judgment.”

“Strategy for the effective use of generative AI requires understanding that the words and images generated are predicted from underlying data. It also requires judgment: understanding the relative cost of different types of prediction errors (“the stakes”). In considering an off-the-shelf generative AI model, it is essential to consider whether the data and judgment underlying the AI system are appropriate for your business.”

https://hbr.org/2024/11/generative-ai-is-still-just-a-prediction-machine

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Planning, Action, Critic

“To isolate specific aspects of the model’s capability, we evaluate the performance of API-based GUI automation models rigorously across three dimensions:

• Planning: Assessing the model’s ability to generate an executable plan from the user’s query. The plan should have a correct flow, allowing the overall successful operations of the software, with each step being clear and executable.

• Action: Evaluating whether the model can accurately ground the interactable GUI elements and execute the action step-by-step from the derived plan.

• Critic: Measuring the model’s awareness of the changing environment, including its ability to adapt to the outcomes of its actions, such as retrying tasks if unsuccessful or terminating execution when the task is completed.”

Hu, S., Ouyang, M., Gao, D., & Shou, M. Z. (2024). The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use. arXiv preprint arXiv:2411.10323.

https://arxiv.org/pdf/2411.10323

Reader Feedback

“Are the foundation models getting better, slower?”

Footnotes

There are several challenges with agents, even in low-stakes contexts, that appear to reduce into classic Operations Research (OR) problems. When I listen to Agrawal, Gans, and Goldfarb, I hear a nudge back into those.

One of the choices we’ll have to make is how much human to keep in the loop.

The co-creators will have a proportionate impact on how that balance turns out.

CTA

Was this forwarded to you? Do my views intrigue you and you wish to subscribe to my newsletter? Subscribe here.