ZenML

Leveraging RAG and LLMs for ESG Data Intelligence Platform

ESGPedia 2024
View original source

ESGpedia faced challenges in managing complex ESG data across multiple platforms and pipelines. They implemented Databricks' Data Intelligence Platform to create a unified lakehouse architecture and leveraged Mosaic AI with RAG techniques to process sustainability data more effectively. The solution resulted in 4x cost savings in data pipeline management, improved time to insights, and enhanced ability to provide context-aware ESG insights to clients across APAC.

Industry

Finance

Technologies

Overview

ESGPedia is an environmental, social, and governance (ESG) data and technology platform that supports companies across the Asia-Pacific region in their sustainability journey toward net zero goals and ESG compliance. The company provides sustainability data and analytics to corporations and financial institutions, helping them make informed decisions around sustainable finance and green procurement. This case study demonstrates how ESGPedia leveraged Databricks’ platform capabilities, particularly around LLM-powered RAG solutions, to transform their data operations and deliver enhanced ESG insights to clients.

The Business Problem

Before their partnership with Databricks, ESGPedia faced substantial challenges in managing their complex data landscape. The company was dealing with approximately 300 different data pipelines, each requiring extensive pre-cleaning, processing, and relationship mapping. The fragmentation of data across multiple platforms created several operational bottlenecks.

According to Jin Ser, Director of Engineering at ESGPedia, the fragmented data hampered the organization’s efficiency and ability to provide timely, personalized insights. Internal teams struggled to quickly access necessary information, which led to slower response times and reduced ability to assist clients effectively. The complexity of managing and coordinating multiple models across various systems was identified as a significant obstacle that not only affected operational efficiency but also hindered the development of AI-driven initiatives.

The challenge was particularly acute given the nature of ESG data, which comes from diverse sources and requires careful curation, classification, and contextualization before it can be useful for sustainability assessments and decision-making.

Technical Architecture and Solution

ESGPedia’s solution centered on implementing the Databricks Data Intelligence Platform, with several key components working together to address their data management and AI challenges.

Lakehouse Architecture Foundation

The core of ESGPedia’s implementation was a lakehouse architecture that unified data storage and management, facilitating easier access and analysis. This approach allowed the company to consolidate their fragmented data estate into a single, coherent platform. The lakehouse architecture served as the foundational layer upon which AI capabilities could be built, addressing the prerequisite of having well-organized, accessible data before attempting to layer on LLM-powered features.

Streaming Data Capabilities

The Databricks Platform enabled continuous data ingestion from various ESG data sources through streaming capabilities. This is particularly relevant for ESG data, which can come from diverse sources including corporate disclosures, regulatory filings, news feeds, and third-party data providers. The ability to process streaming data ensures that ESGPedia’s platform can provide near-real-time updates on sustainability metrics and developments.

Data Governance with Unity Catalog

Unity Catalog played a critical role in data management and governance, supporting compliance requirements with stringent access controls and detailed data lineage. This unified approach to governance was essential for accelerating data and AI initiatives while maintaining regulatory compliance. Given that ESGPedia operates across distributed teams in Singapore, the Philippines, Indonesia, Vietnam, and Taiwan, Unity Catalog enabled secure cross-team collaboration while maintaining appropriate data access controls.

The data lineage capabilities are particularly important for ESG applications, where understanding the provenance and transformation history of data points is crucial for credibility and auditability of sustainability assessments.

LLM and RAG Implementation

The most directly relevant aspect for LLMOps is ESGPedia’s implementation of retrieval augmented generation (RAG) using Databricks Mosaic AI and the Mosaic AI Agent Framework.

RAG Architecture

ESGPedia developed a RAG solution specifically tailored to improve the efficiency and effectiveness of their internal teams. The RAG framework runs on Databricks and allows the company to leverage LLMs enhanced with their proprietary ESG data and documents. This approach enables the generation of context-aware responses that are grounded in ESGPedia’s curated sustainability data rather than relying solely on the general knowledge embedded in foundation models.

The use of RAG is particularly well-suited for ESG applications because sustainability assessments require highly specific, current, and verifiable information that may not be present in general-purpose LLM training data. By combining LLM capabilities with ESGPedia’s structured ESG data, the company can provide nuanced insights that would not be possible with either component alone.

Prompt Engineering with Few-Shot Prompting

ESGPedia employs few-shot prompting techniques to help with the classification of their datasets. This approach involves providing the LLM with a small number of examples demonstrating the desired classification behavior before asking it to classify new data points. Few-shot prompting is a pragmatic choice for data classification tasks, as it can achieve reasonable accuracy without the need for extensive fine-tuning of models.

The classification use case is particularly important for ESG data, which often arrives in unstructured or semi-structured formats and needs to be categorized according to various sustainability frameworks, industry sectors, and geographic regions. Using LLMs with few-shot prompting for this task can significantly reduce the manual effort required for data processing.

Customization for Industry, Country, and Sector

According to Jin Ser, ESGPedia aims to provide “highly customized and tailored sustainability data and analytics for our customers based on their industry, country and sector.” The RAG framework enables this level of customization by allowing the retrieval component to pull relevant context based on these dimensions, which then informs the LLM’s responses.

Results and Outcomes

The implementation delivered several quantifiable benefits:

The RAG implementation has enhanced ESGPedia’s ability to provide nuanced, context-aware insights to corporate and bank clients. Rather than relying on opaque scoring systems, the company can now offer granular data points about the sustainability efforts of companies and their value chains, including SMEs, suppliers, and contractors.

Critical Assessment

While this case study presents compelling benefits, it’s important to note some limitations in the information provided:

Architectural Considerations for LLMOps

The case study illustrates several important principles for LLMOps in enterprise contexts:

Future Directions

The case study indicates that ESGPedia continues to explore AI and machine learning to further enhance their operations. The company aims to democratize access to high-quality insights through their integrated data and AI architecture, which suggests ongoing investment in LLM-powered features and capabilities as part of their growth strategy across the Asia-Pacific region.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI-Powered Network Operations Assistant with Multi-Agent RAG Architecture

Swisscom 2025

Swisscom, Switzerland's leading telecommunications provider, developed a Network Assistant using Amazon Bedrock to address the challenge of network engineers spending over 10% of their time manually gathering and analyzing data from multiple sources. The solution implements a multi-agent RAG architecture with specialized agents for documentation management and calculations, combined with an ETL pipeline using AWS services. The system is projected to reduce routine data retrieval and analysis time by 10%, saving approximately 200 hours per engineer annually while maintaining strict data security and sovereignty requirements for the telecommunications sector.

customer_support classification data_analysis +35

Enterprise-Scale GenAI and Agentic AI Deployment in B2B Supply Chain Operations

Wesco 2025

Wesco, a B2B supply chain and industrial distribution company, presents a comprehensive case study on deploying enterprise-grade AI applications at scale, moving from POC to production. The company faced challenges in transitioning from traditional predictive analytics to cognitive intelligence using generative AI and agentic systems. Their solution involved building a composable AI platform with proper governance, MLOps/LLMOps pipelines, and multi-agent architectures for use cases ranging from document processing and knowledge retrieval to fraud detection and inventory management. Results include deployment of 50+ use cases, significant improvements in employee productivity through "everyday AI" applications, and quantifiable ROI through transformational AI initiatives in supply chain optimization, with emphasis on proper observability, compliance, and change management to drive adoption.

fraud_detection document_processing content_moderation +52