From Agents Shopping to Chain-of-Thought Crisis AI_Distilled #104: What’s New in AI This Week The #1 Newsletter to Master AI Agents — Human in the Loop Human in the Loop is your weekly newsletter read by 12,000+ professionals. It breaks down the latest news on AI agents, real-world use cases and enterprise adoption. 100% free. 100% insight. → Join 12,000+ AI professionals & stay ahead of the curve. JOIN NOW This past week, the AI world accelerated at full speed. The tech giants struck major government deals, unveiled powerful new models, and revealed bold infrastructure plans, while researchers joined forces to steer the future of intelligent machines. It’s a moment of both high-stakes ambition and fast-paced collaboration, and the momentum isn’t slowing down. Let’s unpack what happened. LLM Expert Insights, Packt In today's issue: 🧠 Expert Deep Dive: Walk-through LLM model pruning—magnitude-based, structured, iterative, and post-training techniques—to build leaner, faster agents. 🔧 Function Calling, Simplified: Learn how to wire OpenAI’s function-calling into your agent systems, enabling contextual action and real-world task execution. 🧱 Agent Protocols in Action: Hands-on workshops this July dive into MCP and A2A—from beginner-friendly coding to orchestration and registry infrastructure. 📊 Sigma's Next Gen BI: Discover how Sigma is reshaping BI with collaborative data apps and a top debut on Gartner’s Magic Quadrant. ⚠️ Transparency Crisis Ahead?: OpenAI, Anthropic, and DeepMind warn that step-by-step AI reasoning ("chain-of-thought") may soon disappear in newer LLMs. 🛡 DoD’s $200M Agentic Push: Google, Anthropic, xAI, and OpenAI win Pentagon contracts to advance frontier AI capabilities for national security. 🛍 AWS Agent Store Launches: Amazon introduces an AI Agent Marketplace with Anthropic as launch partner—bringing agent deployment to the cloud masses. ⚡ Meta’s Supercluster Reveal: Zuckerberg unveils Prometheus and Hyperion—two giant-scale AI compute hubs powering Meta’s AI future. 🧩 GenAI Processors from DeepMind: A new open-source toolkit for building real-time, multimodal agent pipelines—no more glue code. 🐉 China’s Kimi K2 Hits 1 Trillion: Moonshot AI releases the world’s largest open-source LLM, rivaling GPT-4 and Claude with Mixture-of-Experts and MuonClip training. 📈UPCOMING EVENTS Next up, we bring you a hand-picked line-up of workshops and developer meetups on agent protocols like MCP and A2A. Let’s Learn – MCP Events: A Beginner’s Guide to MCP Date: July 9–21, 2025 Location: Virtual Cost: Free Focus: Introductory MCP coding (C#, Java, Python, TypeScript) Conversational & Deep Research Analytics Agents: MCP, A2A & Knowledge Graph Date: July 18, 2025 Location: Virtual Cost: Free Focus: Deep-research LLM agents using MCP and A2A AI Agent Learning Series with Google – Episode 3: Hierarchical Agents & Orchestration Date: July 1 – August 21, 2025 Location: Virtual Cost: Free Focus: Agent orchestration and hierarchical structures using MCP/A2A Website: AICamp MCP Developers Summit Date: October 2, 2025 Location: In-person – Venue TBA Cost: TBA Focus: MCP roadmap, security, observability, and agent registries Website: MCPDevSummit.ai Upskilling with MCP and A2A protocols is your gateway to building AI agents. Don’t miss the chance to explore these events and get ahead. DeepSeek is fast becoming the open-source LLM of choice for developers and engineers focused on speed, efficiency, and control. Join "DeepSeek in Production" summit to see how Experts are fine-tuning DeepSeek for real-world use cases, building agentic workflows, and deploying at scale. Seats are filling fast. Limited slots left. Book now at 50% off. SECURE YOUR SPOT NOW! Apply codeDEEPSEEK50at checkout to avail your 50% off. EXPERT INSIGHTS A step-by-step guide to using OpenAI tools for function calling. A Quick Start Guide to Model Pruning for Large Language Models As large language models (LLMs) continue to grow in size and complexity, optimizing their efficiency without compromising performance has become a keyprimary challenge. One of the most effective methods to address this challenge is model pruning. Ken Huang presents an overview of model pruning techniques in his book LLM Design Patterns. So, let’s take a sneak peek. Understanding Model Pruning Model pruning involves systematically removing parameters from a neural network that contribute the least to its output. These are often weights with the smallest magnitude, low sensitivity, or minimal gradient impact. The primary goal is to reduce model size and computational demands while retaining acceptable accuracy. Here are some of the techniques you could try. You can use PyTorch version 1.7.0 to experiment with these examples. Magnitude-Based Pruning The most straightforward technique is magnitude-based pruning, where weights with the lowest absolute values are removed. This method assumes that smaller weights have less impact on the model's predictions. By pruning these, models are made more compact and faster. import torch import torch.nn.utils.prune as prune # Assume model is a pre-trained LLM model = ... # Prune 30% of the lowest magnitude weights in Linear layers for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): prune.l1_unstructured(module, name='weight', amount=0.3) prune.remove(module, 'weight') Structured vs. Unstructured PruningTwo pruning paradigms are typically used:Unstructured pruning removes individual weights, resulting in sparse matrices that may be harder to optimize on standard hardware.Structured pruning removes entire neurons, filters, or channels, making the pruned model more compatible with conventional hardware and often yielding better speedups.Structured pruning is more hardware-friendly but may lead to a larger drop in accuracy. # Structured pruning: Remove entire neurons for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): prune.ln_structured(module, name='weight', amount=0.3, n=2, dim=0) Iterative Pruning Techniques Rather than pruning large portions of the model in a single step, iterative pruning prunes small fractions across multiple training cycles. This gradual reduction enables the model to adapt to the reduced capacity, thus minimizing accuracy degradation. for epoch in range(1, num_epochs + 1): train(model, train_loader, optimizer) if epoch % 10 == 0: for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): prune.l1_unstructured(module, name='weight', amount=0.1) prune.remove(module, 'weight') validate(model, val_loader) Pruning During Training vs. Post-Training A key decision in pruning strategy is timing: - Pruning during training integrates pruning steps throughout the training process. - Post-training pruning applies pruning after the model is fully trained. # During training pruning every 5 epochs for epoch in range(1, 21): train(model, train_loader, optimizer) if epoch % 5 == 0: for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): prune.l1_unstructured(module, name='weight', amount=0.2) prune.remove(module, 'weight') Balancing Pruning with Performance The art of pruning lies in striking the right balance. Excessive pruning can harm accuracy, while minimal pruning might offer negligible gains. Fine-tuning with lower learning rates post-pruning is commonly employed to recover lost performance. # Fine-tune pruned model optimizer = torch.optim.Adam(model.parameters(), lr=1e-5) for epoch in range(5): train(model, train_loader, optimizer) validate(model, val_loader) Combining Pruning with Other Techniques For enhanced efficiency, pruning is often paired with other compression methods: - Quantization: After pruning, dynamic quantization can be applied to further reduce model size. import torch.quantization as quant quantized_model = quant.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) - Knowledge Distillation: A smaller, pruned student model is trained to replicate the behavior of a larger teacher model. def distillation_loss(student_outputs, teacher_outputs, temperature): return torch.nn.KLDivLoss()( torch.nn.functional.log_softmax(student_outputs / temperature), torch.nn.functional.softmax(teacher_outputs / temperature) ) # Training loop for student model for batch in train_loader: inputs, _ = batch teacher_outputs = teacher_model(inputs) student_outputs = student_model(inputs) loss = distillation_loss(student_outputs, teacher_outputs, temperature=2.0) loss.backward() optimizer.step() Model pruning offers a robust path to optimize LLMs, with techniques ranging from basic magnitude-based methods to advanced combinations with distillation and quantization. Each method presents trade-offs between performance, complexity, and hardware compatibility. For practitioners looking to design efficient LLMs, pruning provides a versatile toolkit that can be tailored to specific constraints. Liked the Insights? Want to dig in deeper? A Practical Guide to Building Robust and Efficient AI Systems Learn comprehensive LLM development, including data prep, training pipelines, and optimization Explore advanced prompting techniques, such as chain-of-thought, tree-of-thought, RAG, and AI agents Implement evaluation metrics, interpretability, and bias detection for fair, reliable models BUY NOW 📈LATEST DEVELOPMENT Here is the news of the week. AI labs unite to warn of lost “chain-of-thought” visibility In an unusual collaboration, over 40 researchers from OpenAI, Google DeepMind, Anthropic, and other top AI labs published a joint warning about the fragility of AI chain-of-thought transparency. Modern advanced LLMs have shown an ability to think out loud by producing step-by-step reasoning in plain English before final answers. This interim reasoning can reveal a model’s true intentions or potential mistakes, offering a valuable chance to monitor and intervene if the AI is heading down a harmful path. However, the researchers caution that as AI models evolve, this transparency could disappear; future models might learn to perform reasoning internally or in indecipherable ways, closing a critical window for safety oversight. The paper urges AI developers to prioritize methods for evaluating and preserving chain-of-thought visibility, calling it a brief and fragile opportunity to align AI behavior before more opaque systems arrive. Read the paper. Pentagon taps Google, Anthropic, xAI in $200M AI push Reuters reports that the U.S. Department of Defense (DoD) awarded contracts (with a $200 million ceiling each) to Anthropic, Google, OpenAI, and Elon Musk’s xAI to accelerate “frontier” AI capabilities for national security. These partnerships will help the DoD develop agentic AI workflows across a range of missions. The Pentagon’s initiative underscores the government’s urgency in tapping top AI labs for defense innovation. Read more. In a related development Elon Musk’s xAI has officially launched Grok for Government, a tailored suite of frontier AI tools (including Grok 4, Deep Search, Tool Use). Now available via the GSA schedule, it supports federal, state, local, and national security agencies under a $200 million DoD ceiling contract. Read more here. AWS debuts AI agent marketplace with Anthropic partnership At AWS Summit New York, AWS unveiled a new AI Agent Marketplace in collaboration with Anthropic. This platform will serve as a one-stop shop where startups can sell AI agents directly to AWS’s enterprise customers. Businesses will be able to browse and install third-party AI agents suited to their needs from a single catalog. Anthropic, already an Amazon-invested company, is a key launch partner, which could broaden its reach for more customers via AWS. Amazon will take a small revenue share while enabling an ecosystem of AI agents, much like an app store. Read more. Meta building multi-gigawatt Prometheus and Hyperion AI superclusters Meta CEO Mark Zuckerberg revealed plans for unprecedented AI infrastructure, announcing that Meta is constructing multiple multi-GW AI supercomputers. The Hyperion cluster in Louisiana will scale up to 5 gigawatts of power, with a footprint large enough to cover most of Manhattan. In addition, a 1 GW supercluster named Prometheus is slated to come online in 2026 in Ohio. Together, these AI centers will provide Meta with enormous computational capacity to train and serve advanced AI models, positioning it to better compete with peers like OpenAI and Google DeepMind. View the announcement. Google DeepMind open-sources GenAI Processors for agent pipelines Google DeepMind introduced GenAI Processors, a new open-source Python library to simplify building complex AI workflows for LLM-powered applications. The toolkit defines a standardized Processor interface for handling all stages of an AI pipeline, from input ingestion and pre-processing to model inference calls and output handling. Developers can chain or parallelize these modular processors to create asynchronous, composable AI pipelines. Notably, GenAI Processors integrates with Google’s Gemini (next-gen LLM) APIs and supports multimodal data streams (text, images, audio, PDFs) in a unified framework. By open-sourcing this library, Google aims to help developers build real-time AI agents and data-processing workflows more reliably and with less custom glue code. Read more. Chinese 1-trillionparam Kimi K2 model challenges GPT-4, DeepSeek Shanghai-based startup Moonshot AI has unveiled Kimi K2, a large language model with a staggering 1 trillion parameters, released as open-source. Kimi K2 now ranks as one of the world’s most powerful LLMs, reportedly matching the performance of top proprietary models like OpenAI’s GPT-4 and Anthropic’s Claude on complex tasks. It excels at coding benchmarks, essentially rivaling or outperforming Anthropic’s best Claude model in that domain. The model was trained using a novel “MuonClip” optimization technique that prevented the training instabilities that often plague ultra-large models, potentially saving millions in compute costs. Observers have compared Kimi K2’s architecture to DeepSeek V3 – the 673 billion–parameter model behind the famed DeepSeek-R1 assistant, noting that Kimi K2 similarly uses Mixture-of-Experts layers to boost capability. The launch of Kimi K2 highlights the rapid progress of China’s open-source AI efforts. (Earlier this month, Baidu open-sourced its ERNIE 4.5 model (424 B parameters), which reportedly beat DeepSeek V3 on 22 of 28 benchmarks despite being much smaller.) Read more here. Built something cool? Tell us. Whether it's a scrappy prototype or a production-grade agent, we want to hear how you're putting generative AI to work. Drop us your story at nimishad@packtpub.com or reply to this email, and you could get featured in an upcoming issue of AI_Distilled. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! That’s a wrap for this week’s edition of AI_Distilled 🧠⚙️ We would love to know what you thought—your feedback helps us keep leveling up. 👉 Drop your rating here Thanks for reading, The AI_Distilled Team (Curated by humans. Powered by curiosity.) *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} .go1475592160{height:0;}.go1671063245{height:auto;}.go1888806478{display:flex;flex-wrap:wrap;flex-grow:1;} @media (width: 100%;width: 100%;}}.go167266335{background-color:#313131;font-size:0.875rem;line-height:1.43;letter-spacing:0.01071em;color:#fff;align-items:center;padding:6px 16px;border-radius:4px;box-shadow:0px 3px 5px -1px rgba(0,0,0,0.2),0px 6px 10px 0px rgba(0,0,0,0.14),0px 1px 18px 0px rgba(0,0,0,0.12);}.go3162094071{padding-left:20px;}.go3844575157{background-color:#313131;}.go1725278324{background-color:#43a047;}.go3651055292{background-color:#d32f2f;}.go4215275574{background-color:#ff9800;}.go1930647212{background-color:#2196f3;}.go946087465{display:flex;align-items:center;padding:8px 0;}.go703367398{display:flex;align-items:center;margin-left:auto;padding-left:16px;margin-right:-8px;}.go3963613292{width: 100%;position:relative;transform:translateX(0);top:0;right:0;bottom:0;left:0;width: 100%;}.go1141946668{box-sizing:border-box;display:flex;max-height:100%;position:fixed;z-index:1400;height:auto;width: 100%;transition:top 300ms ease 0ms,right 300ms ease 0ms,bottom 300ms ease 0ms,left 300ms ease 0ms,max-width 300ms ease 0ms;pointer-events:none;max-width: 100%;}.go1141946668 .notistack-CollapseWrapper{padding:6px 0px;transition:padding 300ms ease 0ms;} @media (max-width: 100%;max-width: 100%;}}.go3868796639 .notistack-CollapseWrapper{padding:2px 0px;}.go3118922589{top:14px;flex-direction:column;}.go1453831412{bottom:14px;flex-direction:column-reverse;}.go4027089540{left:20px;} @media (width: 100%;}} @media (max-width: 100%;}}.go2989568495{right:20px;} @media (width: 100%;}} @media (max-width: 100%;}}.go4034260886{left:50%;transform:translateX(-50%);} @media (width: 100%;}}
Read more