Skip to main content

Preparing Evaluation Samples

Currently, the mainstream API call method for large language models is in the form of chat messages. In EvalsOne, each evaluation sample included in a run is also in the form of a chat message.

EvalsOne supports the following three methods for dynamically adding chat samples when creating a run:

  1. Synthesizing samples using templates and variable value lists
  2. Using pre-prepared sample sets
  3. Adding samples by inputting or copying/pasting code

Using Templates and Variable Value Lists​

If only a portion of the content changes between different samples used for evaluation (e.g., the user's latest question), while the rest of the content remains the same (such as system prompts or previous rounds of chat content), you can create a chat template. Insert variable names into the relevant parts of the chat content, then prepare a corresponding variable value list. During run creation, you can dynamically replace the variable names with different values to generate evaluation samples in bulk.

Using Pre-Prepared Sample Sets​

Using pre-prepared sample sets is suitable when you already have ready-made test data, which can be directly imported into EvalsOne for evaluation. Samples in the sample set can be imported via JSONL files, added through API calls, or manually inputted as sample code.

This method is more suitable when there is not much similarity between the samples.

Adding Samples by Inputting or Copying/Pasting Code​

You can also add chat message samples by directly inputting JSON code in the editor or by copying chat sample code from Playgrounds provided by vendors like OpenAI, Claude, Gemini, etc., and pasting it into the EvalsOne editor. EvalsOne will automatically convert it into evaluation samples for creating runs.

This method is suitable for running single samples in bulk and for users who are accustomed to using Playgrounds to test chat effects.