Skip to main content
LLM Testing Strategy

Quickly learn from exemplary testing strategy how to test LLMs

Zorica Micanovic avatar
Written by Zorica Micanovic
Updated over a week ago

Testing Language Learning Models (LLMs) is a crucial step in the development of any LLM solution. It ensures that the LLM is functioning as expected and can handle a variety of user inputs.

Each LLM Application is unique, and you will need to adjust your Testing Strategy accordingly.

For those new to this field, here are some tips to guide you through the process.

Start with Positive Testing

The first step in testing LLMs is to conduct positive testing. This involves putting yourself in the shoes of the real customer and sending intuitive messages to the chatbot.

The goal is to ensure that the chatbot can handle standard, expected inputs and provide appropriate responses.

e.g. If you're testing a chatbot designed to assist with hotel bookings, you might start by asking it to book a room for a specific date. The chatbot should be able to handle this request smoothly and provide you with the necessary information or steps to complete the booking.


For more details on Positive Testing LLMs visit this link.

Don't Neglect Negative Scenarios

While Positive Testing is important, it's equally crucial to consider Negative scenarios. This involves removing some parts of the previous scenario or introducing unexpected inputs to see how the chatbot responds.

e.g. In the hotel booking scenario, you might ask the chatbot to book a room for a date in the past or for a type of room that the hotel doesn't offer. These scenarios help to test the chatbot's error-handling capabilities and ensure that it can provide helpful responses even when it can't fulfill a user's request.

Remember: Focusing only on Positive scenarios might give you a false sense of security. Negative scenarios can often reveal bugs that you wouldn't have found otherwise. You could increase the chance of getting a Bug Like with bugs you found while testing Negative scenarios.

For more details on Negative Testing visit this link.

Important: The LLMs have a tendency not to contradict the user which unlocks the possibility of abuse from malicious users.

e.g. If the user asks an insurance chatbot to share all the details of the free insurance campaign that doesn't even exist and the chatbot shares the terms and conditions of the campaign, it will lead to a loss of trust and accountability of the insurance company.

Think Out of the Box with Edge Case Testing

Finally, don't forget to conduct Edge Case Testing. This involves thinking out of the box and testing scenarios that are unlikely to occur but still possible.

e.g. What happens if a user asks the chatbot to book a room for 100 people? Or what if they ask it to book a room in a city where the hotel chain doesn't have a location? These scenarios might seem far-fetched, but they can help to reveal issues that you wouldn't have discovered through standard Positive and Negative testing.

For more details on Edge Case Testing visit this link.

Testing Different Personas

Using different personas in LLM Testing is a powerful strategy. Here's why:

  1. Diverse User Scenarios: Different personas represent different user scenarios and backgrounds. This helps in understanding how the system would respond to a diverse range of users. For example, a teenager might use different language or slang compared to a senior citizen.

  2. Different Needs and Goals: Each persona has different needs, goals, and ways of using the system. Testing with different personas helps ensure that the system can cater to a wide range of user requirements.

  3. Uncover Undetected Biases: Different personas can help uncover biases that might not be apparent when testing from a single perspective.

  4. Ensure Inclusivity: By considering different personas, you can improve the overall user experience by making sure the system is inclusive and caters to the needs of a diverse user base.

e.g. In some tests, you will test as a new customer, a returning customer, a satisfied customer, a malicious client, a client without money, or a dissatisfied customer.

Using different personas in your Testing Strategy can provide a more comprehensive understanding of potential real-world usage scenarios, helping to create a more robust and user-friendly system.

Did this answer your question?