Evaluating Thanoy The Thai Legal AI Assistant Performance

March 19, 2025 · 5 min read

CEO @ iApp Technology

The following evaluation report assesses Thanoy, the Thai Legal AI Assistant powered by OpenThaiGPT, which is designed to provide accurate and reliable legal advice across various legal documents and queries. Trained on over 10,000 Thai legal articles and regulations, Thanoy offers an advanced solution for legal professionals and general users seeking legal guidance.

Thanoy AI Assistant

1. Introduction to Thanoy

Thanoy is an AI-powered assistant developed to enhance access to Thai legal information and advice. It leverages OpenThaiGPT to analyze and respond to user queries, offering insights into Thai laws and regulations. Key features include its availability through a LINE chatbot interface, ensuring users can access legal advice anytime. Thanoy is designed to ensure its responses are based on a comprehensive understanding of Thailand's legal landscape, making it an invaluable tool for both professionals and non-experts alike.

2. Evaluation Methodology

2.1 Evaluation Team and Approach

This comprehensive evaluation was conducted by iApp's LLM Team, led by @Por, using an automated assessment approach to ensure objectivity and scalability.

2.2 Technical Setup

Evaluation Model: OpenAI GPT-4o API
Temperature Setting: 0 (for maximum consistency and accuracy)
Sample Size: 1,000 samples from the first batch
Data Source: Over 100,000 chat logs in JSON-Lines format
Future Batches: Subsequent batches will randomly sample additional 1,000 samples

2.3 Evaluation Criteria

The evaluation assessed three key components for each interaction:

Query: User's legal question
Context: Retrieved legal documents and regulations
Response: Thanoy's AI-generated legal advice

For each sample, GPT-4o evaluated:

Relevance: Whether Thanoy's response relates to the user's query and retrieved context
Quality Score: Rating from 0-10 for the overall response quality

3. Detailed Experimental Results

3.1 Overall Performance Metrics

Total Samples Evaluated: 1,000
Mean Relevance Score: 4.325/10
Standard Deviation: 3.29
Reference Documentation: Internal LarkSuite Wiki

3.2 Relevance Distribution

Category	Count	Percentage
Not Relevant	659 requests	65.9%
Relevant	341 requests	34.1%

3.3 Top Score Distributions

Score	Count	Percentage
2 points	248 requests	24.8%
3 points	244 requests	24.4%
8 points	165 requests	16.5%

4. In-Depth Analysis and Key Findings

4.1 Response-Query Alignment

Finding: The majority of Thanoy's responses demonstrate strong alignment with user queries, indicating effective natural language understanding and legal reasoning capabilities.

4.2 Context Retrieval Challenges

Critical Issue Identified: The primary performance bottleneck lies in the Retrieval-Augmented Generation (RAG) system:

Frequent Context Mismatches: The RAG system often retrieves irrelevant legal documents
Score Impact: Irrelevant context significantly reduces evaluation scores despite accurate responses

4.3 Context Dependency Analysis

Scenario 1 - Unnecessary Context with Retrieval:

Some queries don't require legal document context for accurate responses
When irrelevant context is provided, scores decrease even if responses are correct

Scenario 2 - Unnecessary Context without Retrieval:

Queries that don't need context and receive none often score lower
This occurs even when responses correctly address the user's question

4.4 Underlying Model Capabilities

Positive Finding: OpenThaiGPT demonstrates strong baseline legal knowledge:

Can provide accurate legal advice even with incorrect context retrieval
Shows robust understanding of Thai legal principles and concepts
Maintains response quality despite RAG system limitations

5. Technical Recommendations and Future Improvements

5.1 RAG System Enhancement Priority

Immediate Action Required: The current RAG system requires comprehensive redevelopment:

Current Challenge: Frequent retrieval of irrelevant legal documents
Proposed Solution: Implementation of GraphRAG technology
Team Status: Active research and review of GraphRAG methodologies

Future Batch Strategy:

Continue with 1,000-sample batches using random sampling
Implement A/B testing with improved RAG systems
Develop more granular evaluation metrics for legal accuracy

5. The Future of Thanoy and AI in Legal Services

Thanoy is not just a tool for immediate legal advice but also a foundation for future advancements in AI-driven legal services. As AI technology evolves, the capabilities of assistants like Thanoy are expected to improve, particularly in terms of understanding complex legal language and providing even more precise insights. The feedback and performance from this evaluation are essential in driving future improvements and ensuring Thanoy can meet the growing demand for accessible legal assistance in Thailand.

6. Conclusion and Impact Assessment

This comprehensive evaluation of 1,000 samples reveals important insights about Thanoy's performance as a Thai Legal AI Assistant:

6.1 Key Strengths

Strong Language Understanding: OpenThaiGPT demonstrates robust comprehension of Thai legal queries
Baseline Legal Knowledge: Capable of providing accurate advice even with suboptimal context retrieval
Response Consistency: Maintains quality across diverse legal topics and question types

6.2 Critical Areas for Improvement

RAG System Overhaul: Primary focus on improving context retrieval accuracy (65.9% irrelevance rate)
GraphRAG Implementation: Active research toward next-generation retrieval technology
Evaluation Refinement: Enhanced metrics for legal-specific accuracy assessment

6.3 Strategic Significance

This evaluation, conducted by iApp's LLM Team, provides crucial data for Thanoy's evolution as a leading Thai legal AI assistant. The findings demonstrate both the potential and current limitations, establishing a clear roadmap for achieving higher performance standards in AI-driven legal services.

Reference: Complete evaluation methodology and detailed results are documented in iApp's internal research wiki for ongoing technical improvements.

Evaluating Thanoy The Thai Legal AI Assistant Performance

1. Introduction to Thanoy

2. Evaluation Methodology

2.1 Evaluation Team and Approach

2.2 Technical Setup

2.3 Evaluation Criteria

3. Detailed Experimental Results

3.1 Overall Performance Metrics

3.2 Relevance Distribution

3.3 Top Score Distributions

4. In-Depth Analysis and Key Findings

4.1 Response-Query Alignment

4.2 Context Retrieval Challenges

4.3 Context Dependency Analysis

4.4 Underlying Model Capabilities

5. Technical Recommendations and Future Improvements

5.1 RAG System Enhancement Priority

5.2 Evaluation Methodology Refinement

5. The Future of Thanoy and AI in Legal Services

6. Conclusion and Impact Assessment

6.1 Key Strengths

6.2 Critical Areas for Improvement

6.3 Strategic Significance

ChindaX

SpeechFlow

ChindaGO

1. Introduction to Thanoy​

2. Evaluation Methodology​

2.1 Evaluation Team and Approach​

2.2 Technical Setup​

2.3 Evaluation Criteria​

3. Detailed Experimental Results​

3.1 Overall Performance Metrics​

3.2 Relevance Distribution​

3.3 Top Score Distributions​

4. In-Depth Analysis and Key Findings​

4.1 Response-Query Alignment​

4.2 Context Retrieval Challenges​

4.3 Context Dependency Analysis​

4.4 Underlying Model Capabilities​

5. Technical Recommendations and Future Improvements​

5.1 RAG System Enhancement Priority​

5.2 Evaluation Methodology Refinement​

5. The Future of Thanoy and AI in Legal Services​

6. Conclusion and Impact Assessment​

6.1 Key Strengths​

6.2 Critical Areas for Improvement​

6.3 Strategic Significance​

1. Introduction to Thanoy

2. Evaluation Methodology

2.1 Evaluation Team and Approach

2.2 Technical Setup

2.3 Evaluation Criteria

3. Detailed Experimental Results

3.1 Overall Performance Metrics

3.2 Relevance Distribution

3.3 Top Score Distributions

4. In-Depth Analysis and Key Findings

4.1 Response-Query Alignment

4.2 Context Retrieval Challenges

4.3 Context Dependency Analysis

4.4 Underlying Model Capabilities

5. Technical Recommendations and Future Improvements

5.1 RAG System Enhancement Priority

5.2 Evaluation Methodology Refinement

5. The Future of Thanoy and AI in Legal Services

6. Conclusion and Impact Assessment

6.1 Key Strengths

6.2 Critical Areas for Improvement

6.3 Strategic Significance