llm-evaluation-framework

Here are 7 public repositories matching this topic...

promptfoo / promptfoo

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.

testing ci evaluation ci-cd cicd prompts evaluation-framework rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Apr 30, 2024
TypeScript

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Apr 30, 2024
Python

parea-ai / parea-sdk-py

Star

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

metrics good-first-issue llm prompt-engineering generative-ai llmops llm-eval llm-tools llm-evaluation llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework

Updated Apr 29, 2024
Python

aws-samples / fm-leaderboarder

Star

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

llm-evaluation llm-evaluation-framework llm-benchmarking

Updated Apr 11, 2024
Python

parea-ai / parea-sdk-ts

Star

TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

llm prompt-engineering llms llm-eval llm-tools llm-evaluation llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework

Updated Apr 29, 2024
TypeScript

Networks-Learning / prediction-powered-ranking

Star

Code for the paper Prediction-Powered Ranking of Large Language Models, Arxiv 2024.

ranking-algorithm llm-eval llm-evaluation llm-evaluation-framework prediction-powered-inference rank-sets

Updated Mar 15, 2024
Python

stair-lab / villm-eval

Star

Evaluation of Language Models in Non-English Languages

llms-benchmarking llm-evaluation-framework

Updated Apr 2, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation-framework topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation-framework topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation-framework

Here are 7 public repositories matching this topic...

promptfoo / promptfoo

confident-ai / deepeval

parea-ai / parea-sdk-py

aws-samples / fm-leaderboarder

parea-ai / parea-sdk-ts

Networks-Learning / prediction-powered-ranking

stair-lab / villm-eval

Improve this page

Add this topic to your repo