Pinterest implemented GitHub Copilot for AI-assisted development across their engineering organization, focusing on balancing developer productivity with security and compliance concerns. Through a comprehensive trial with 200 developers and cross-functional collaboration, they successfully scaled the solution to general availability in less than 6 months, achieving 35% adoption among their developer population while maintaining robust security measures and positive developer sentiment.
Pinterest, the visual discovery platform with over 300 billion ideas indexed, embarked on a journey to enable AI-assisted development for their engineering teams. This case study documents their approach to evaluating, piloting, and rolling out GitHub Copilot across their organization while carefully managing the security, legal, and operational risks inherent in deploying LLM-powered tools in an enterprise development environment.
The initiative arose from organic developer demand—engineers were already using AI-assisted development tools for personal projects and were eager to bring these capabilities into their professional work. However, like many enterprises, Pinterest had initially prohibited LLM usage until they could properly assess the implications. This case study represents a methodical approach to enterprise LLM adoption that balances innovation with risk management.
One of the first strategic decisions Pinterest made was whether to build their own AI-assisted development solution or purchase a vendor solution. Despite possessing substantial in-house AI expertise (Pinterest builds many of their own developer tools and runs sophisticated ML systems for their core product), they determined that building from scratch was not essential to their core business. This is a noteworthy decision point for enterprises considering LLM adoption—the recognition that leveraging existing vendor solutions can accelerate time-to-value.
Pinterest chose GitHub Copilot specifically based on several criteria: its feature set, the robustness of the underlying LLM, and importantly, its fit with their existing tooling ecosystem. The breadth of IDE support (both VS Code and JetBrains IDEs were mentioned as being used by their developers) was cited as a factor that accelerated adoption.
Pinterest’s approach to the trial program demonstrates several LLMOps best practices for enterprise evaluation. Rather than running a small trial of fewer than 30 people over a few weeks (which they note many companies do), Pinterest opted for a larger and longer trial:
The rationale behind this design was multifaceted. The larger cohort allowed them to include developers across various “personas”—likely meaning different specializations, experience levels, or working contexts. The longer duration helped control for the “novelty effect” and other measurement issues, providing more reliable data about sustained productivity impact rather than just initial enthusiasm.
An important cultural aspect was also mentioned: even if the evaluation led them in a different direction, they wanted to give developers the opportunity to try something cutting edge and include them in the journey. This speaks to change management practices that can ease enterprise LLM adoption.
Pinterest leveraged their existing frameworks for measuring engineering productivity, applying them specifically to the Copilot trial. Their evaluation combined both qualitative and quantitative approaches:
Qualitative Measurement: They collected weekly sentiment feedback through a short Slack bot-based survey. The choice of Slack over email was deliberate—they had previously observed higher completion rates with Slack-based surveys and wanted to meet developers where they spend time while reducing friction. The NPS (Net Promoter Score) approach gave them a consistent metric to track over time. Early results showed an NPS of 75, which is considered excellent, and scores improved as the trial continued.
User feedback highlighted specific value propositions of AI-assisted development. Comments included observations that Copilot suggestions improved over time based on work context, and that the tool was particularly valuable when working with unfamiliar languages (Scala was mentioned as an example), allowing developers familiar with general programming concepts to let Copilot handle syntax details while still understanding the suggestions.
Quantitative Measurement: Their approach compared relative change over time for the trial cohort versus a control group from before the Copilot trial. Running the trial for longer than a few weeks helped isolate external temporal influences like holidays. However, the article does not specify what quantitative metrics were actually measured or share detailed results—a commenter on the original post asked about this, suggesting the specifics were not disclosed.
Pinterest’s approach to security and legal compliance demonstrates the cross-functional coordination required for enterprise LLM deployment:
Legal Review: They worked closely with their legal team to ensure usage adhered to all relevant licensing terms and regulations. While specific concerns aren’t detailed, common considerations in this space include intellectual property issues around code generated by LLMs trained on open-source repositories.
Security Assessment: The security team conducted a thorough assessment of security implications. Two key concerns were addressed:
Vulnerability Scanning: A notable security practice was the continuous auditing of code using vulnerability scanning tools. Importantly, they scanned code from both Copilot participants and non-participants, allowing them to compare whether AI-assisted development introduced more vulnerabilities. This comprehensive approach enabled them to monitor for potential degradation of their security posture due to AI-generated code.
Based on positive trial results, Pinterest made the decision to expand Copilot access to all of engineering. The timing was strategic—they did this in advance of their annual “Makeathon” (a hackathon-style event), which had an AI focus that year.
To drive adoption post-GA, Pinterest implemented several operational improvements:
The quantitative outcomes reported include:
Pinterest framed the 35% adoption rate in terms of the Technology Adoption Lifecycle, noting they had moved well into the “early majority” phase. This provides useful context for understanding where they were in the adoption curve at the time of publishing.
Pinterest outlined plans for continued improvement and evolution of their AI-assisted development program:
While the case study presents a methodical and thoughtful approach to enterprise LLM adoption, there are some limitations to note:
That said, the case study offers valuable insights into enterprise considerations for deploying LLM-powered developer tools, including the importance of cross-functional collaboration, extended trial periods, hybrid evaluation approaches, and ongoing security monitoring. The emphasis on meeting developers where they are (Slack surveys, IDE integration breadth) and including them in the journey reflects mature change management practices for technology adoption.
Predibase, a fine-tuning and model serving platform, announced its acquisition by Rubrik, a data security and governance company, with the goal of combining Predibase's generative AI capabilities with Rubrik's secure data infrastructure. The integration aims to address the critical challenge that over 50% of AI pilots never reach production due to issues with security, model quality, latency, and cost. By combining Predibase's post-training and inference capabilities with Rubrik's data security posture management, the merged platform seeks to provide an end-to-end solution that enables enterprises to deploy generative AI applications securely and efficiently at scale.
Mendix, a low-code platform provider, faced the challenge of integrating advanced generative AI capabilities into their development environment while maintaining security and scalability. They implemented Amazon Bedrock to provide their customers with seamless access to various AI models, enabling features like text generation, summarization, and multimodal image generation. The solution included custom model training, robust security measures through AWS services, and cost-effective model selection capabilities.
HumanLoop, based on their experience working with companies from startups to large enterprises like Jingo, shares key lessons for successful LLM deployment in production. The talk emphasizes three critical aspects: systematic evaluation frameworks for LLM applications, treating prompts as serious code artifacts requiring proper versioning and collaboration, and leveraging fine-tuning for improved performance and cost efficiency. The presentation uses GitHub Copilot as a case study of successful LLM deployment at scale.