ZenML

Modernizing DevOps with Generative AI: Challenges and Best Practices in Production

Various (Bundesliga, Harness, Trice) 2025
View original source

A panel of experts from various organizations discusses the current state and challenges of integrating generative AI into DevOps workflows and production environments. The discussion covers how companies are balancing productivity gains with security concerns, the importance of having proper testing and evaluation frameworks, and strategies for successful adoption of AI tools in production DevOps processes while maintaining code quality and security.

Industry

Tech

Technologies

Overview

This InfoQ Live roundtable brings together practitioners from diverse backgrounds to discuss the practical realities of integrating generative AI into DevOps workflows. The panelists include Christian Bonzelet (AWS Solutions Architect at Bundesliga), Jessica Andersson (Cloud Architect at Trice and CNCF Ambassador), Garima Bajpai (DevOps leader and author), and Shobhit Verma (Engineering Manager at Harness, leading AI agent development). The discussion provides a balanced, practitioner-focused perspective on where generative AI currently delivers value in production environments and where significant challenges remain.

Current State of GenAI in DevOps

The panelists emphasize that generative AI has evolved beyond simple code generation chatbots to become more deeply integrated into the software development lifecycle. Bonzelet observes that the transformation extends beyond code generation to areas where individual developers may lack expertise, such as documentation, unit tests, and architectural scaffolding. Andersson highlights that the integration of GenAI tools directly into IDEs has been a critical inflection point for adoption, enabling a more seamless back-and-forth workflow rather than context-switching to external tools.

From a production perspective, Verma notes that Harness is actively building AI agents that compete with GitHub Copilot, designed to help developers write higher-quality code faster within their preferred IDE. This represents a concrete example of LLMOps in practice, where the challenge is not just building AI capabilities but integrating them into existing developer workflows in a way that adds measurable value.

Practical Use Cases in Production

The discussion identifies several concrete use cases where generative AI is being deployed in production DevOps contexts:

Challenges and Limitations

The panelists provide a refreshingly honest assessment of current limitations. Bonzelet identifies two primary challenges: organizational concerns around compliance and source code protection, and the fundamental tension between engineering’s preference for deterministic behavior and AI’s inherently non-deterministic outputs. The same prompt can yield different results on different days as model context and weights evolve.

Andersson raises the often-overlooked issue of confidentiality concerns—organizations remain hesitant about how much of their proprietary code they’re willing to send to external AI services. This creates friction in adoption, particularly for enterprises with strict data governance requirements.

The 2023 DORA report finding that AI indulgence can actually hurt developer productivity and performance is cited by Bajpai as evidence that systematic thinking about onboarding is essential. This serves as an important counterweight to vendor marketing claims, highlighting that careless or unstructured AI adoption can be counterproductive.

The “Human in the Lead” Principle

A recurring theme is the importance of maintaining human oversight and judgment. Bonzelet describes his organization’s framing as “human in the lead” rather than “human in the loop,” emphasizing that developers should be making decisions rather than simply reviewing AI outputs. This distinction has practical implications for how AI tools are integrated into workflows.

Verma offers a memorable analogy comparing AI management to startup hiring: “You don’t hire if you cannot manage.” The same applies to AI—developers should only delegate tasks to AI that they can effectively evaluate and verify. If you’re asking AI to generate code, you need the expertise to test and validate that code.

Adoption Strategies and Best Practices

The panelists offer practical guidance for teams looking to adopt generative AI in their DevOps workflows:

Start with IDE-integrated tools: Andersson recommends beginning with chat and code assistants integrated into development environments. This represents the most mature category of tools and offers the clearest path to productivity gains.

Experiment with foundation models directly: Verma suggests that developers should spend time working directly with foundation models (OpenAI, Anthropic, etc.) rather than only through product interfaces. This builds intuition about what’s genuinely possible versus what’s marketing, making it easier to evaluate tools.

Accept the learning curve: Bonzelet emphasizes that organizations need to protect engineers during the initial adoption period, recognizing that productivity may decrease before it increases as teams learn effective prompting and workflow integration.

Focus on specific problems: Rather than broad adoption, pick specific challenges where AI can add value and iterate from there. Document both successes and failures to build organizational knowledge.

Create safe paths for experimentation: Organizations need to acknowledge that developers will use AI tools regardless of policy, so creating secure, sanctioned paths is preferable to prohibition.

Metrics and Measurement Challenges

The panelists express frustration with the current state of metrics around AI-assisted development. Bonzelet specifically calls for moving beyond “X% of companies use generative AI” statistics to understanding actual impact on metrics like lead time, mean time to repair, and deployment frequency.

Verma notes that traditional metrics may become misleading in an AI-augmented world. If developers complete coding tasks faster but the proportion of time spent on “toil” activities appears to increase, that may simply reflect accelerated feature delivery rather than increased burden. New frameworks for understanding productivity in AI-augmented environments are needed.

Future Considerations

The discussion touches on emerging considerations for LLMOps in the medium term. Verma speculates that organizations may eventually need to optimize for “AI readability” and “AI maintainability” alongside human-readable code, creating scaffolding that helps AI understand and work with codebases effectively.

Bajpai notes that the space remains dominated by a few tech giants, creating competitive dynamics that organizations must navigate. The role of open source in AI development remains unsettled, with ongoing debates about open-source AI definitions and concepts like “fair source” and “ethical source.”

New roles are emerging to address AI governance, including AI moderators, machine learning engineers focused on developer tools, and compliance engineers specializing in AI systems. These roles represent organizational infrastructure needed for sustainable AI adoption.

Platform Engineering Implications

Andersson brings a platform engineering perspective to the discussion, noting that organizations should apply the same principles to AI enablement that they apply to other developer tools: make it easy to do the right thing. Rules and policies alone are insufficient; organizations need to build guardrails and enablement that guide developers toward secure, compliant AI usage without friction.

The discussion acknowledges that the platform engineering approach to generative AI is not yet a solved problem, representing an area of active innovation that will likely see significant development in the coming years.

Balanced Assessment

While the panelists are generally optimistic about generative AI’s potential, they maintain appropriate skepticism about vendor claims and acknowledge that adoption requires careful thought rather than wholesale embrace. The consensus is that generative AI tools are production-ready when used with appropriate human oversight and integrated into mature CI/CD practices, but that organizations still have significant work to do in understanding how to measure value and structure workflows around these new capabilities.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building Production-Ready AI Agent Systems: Multi-Agent Orchestration and LLMOps at Scale

Galileo / Crew AI 2025

This podcast discussion between Galileo and Crew AI leadership explores the challenges and solutions for deploying AI agents in production environments at enterprise scale. The conversation covers the technical complexities of multi-agent systems, the need for robust evaluation and observability frameworks, and the emergence of new LLMOps practices specifically designed for non-deterministic agent workflows. Key topics include authentication protocols, custom evaluation metrics, governance frameworks for regulated industries, and the democratization of agent development through no-code platforms.

customer_support code_generation document_processing +41

Building Enterprise-Ready AI Development Infrastructure from Day One

Windsurf 2024

Codeium's journey in building their AI-powered development tools showcases how investing early in enterprise-ready infrastructure, including containerization, security, and comprehensive deployment options, enabled them to scale from individual developers to large enterprise customers. Their "go slow to go fast" approach in building proprietary infrastructure for code completion, retrieval, and agent-based development culminated in Windsurf IDE, demonstrating how thoughtful early architectural decisions can create a more robust foundation for AI tools in production.

code_generation code_interpretation high_stakes_application +42