Skip to main content
Chatbot Testing
Nikola Jonic avatar
Written by Nikola Jonic
Updated over a week ago

What is a chatbot and how does it work?

A chatbot is a feature designed to simulate conversations with human users, especially over the internet. These features are typically powered by Artificial Intelligence (AI) technologies, including natural language processing (NLP) and machine learning algorithms, which understand and respond to user queries conversationally.

These are the steps that follow after you submit a message to a chatbot:

  1. Understanding Your Message: When you send a message to the chatbot, it uses a technology called Natural Language Processing (NLP) to understand what you're asking or stating by analyzing the words you've typed.

  2. Thinking and Decision-Making: Once it understands your message, the chatbot's "brain" kicks in. It processes the message you send and decides what is the best response. This might involve looking up information, checking databases, or using pre-set rules to come up with an answer.

  3. Crafting a Response: After discovering the best answer, the chatbot puts together a message to send back to you. It aims to make the response sound as natural and friendly as possible, like in a real conversation.

  4. Sending the Reply: Finally, the chatbot sends a response back to you. The response will be visible on your screen, whether you're chatting on a website, messaging app, or another platform.

Testing the chatbot

Testing the chatbot is not as difficult as it sounds at first. The list below contains the basic info on what to focus on while testing it:

  1. Learn the Prompt: The prompt is like a guidebook for the chatbot. It tells the chatbot how to behave. You need to understand this guidebook to test the chatbot effectively. A good prompt for a chatbot or AI model should contain the following elements:

    1. Clarity: The prompt should be clear and concise, providing enough information for the model to understand the context and respond appropriately.

    2. Specificity: A good prompt is specific. It should clearly state what you want the model to do or the kind of response you're looking for.

    3. Context: The prompt should provide enough context for the model to generate a relevant response. This could include background information, the purpose of the request, or any specific details that could influence the response.

    4. Language: The language used in the prompt should be simple and straightforward. Avoid using jargon, complex sentences, or ambiguous phrases that could confuse the model.

    5. Politeness: While not necessary, using polite language can often result in more polite responses from the model.

    6. Open-ended: If you want the model to generate creative or diverse responses, make your prompt open-ended. This encourages the model to think outside the box and come up with unique answers.

  2. Find Weak Spots: Try to find weak spots in the system prompt. This is important because users might try to trick the chatbot into revealing its prompt.

  3. Inject Extra Context into the system prompt: Try to add extra information to the chatbot's system prompt. This might make the chatbot give the wrong/false information.

  4. Test with RAG: RAG (Retrieval Augmented Generation) is an optimizing chatbot response technique (with external data) that some chatbots use. This technique allows the chatbot to check the knowledge base outside of its training data sources before generating a response. As a tester, you should check if the chatbot is using this technique correctly and giving accurate responses.

  5. Find the Root Causes: Sometimes, the chatbot might give a wrong answer because it's using the wrong document to create the answer. Understanding this can help you find the root causes of issues.

Testing the chatbot using different scenarios

Here at Test IO, we divided the testing of the LLM into 3 scenarios: Positive, Negative, and Edge Case Testing.

Each scenario can be tested for Context, Performance, Focus during Long Conversations, Language accuracy, Cross-platform, and Data Privacy.

The outcome of each test can be a Pass or Fail.

We recommend reading our How To Test LLMs at Test IO article, before reading the tables below, for better understanding.

The tables below share examples of how to test the chatbot.

Positive Testing

Scenario

Example Prompt

Pass Outcome

Fail Outcome

Context Testing

"Can you show me some hybrid cars?"

The Chatbot correctly lists hybrid cars.

The Chatbot lists irrelevant cars or fails to understand the request.

Performance Testing

"What are the dealership hours?"

Response within 2 seconds.

Response takes longer than 2 seconds.

Focus during Long Conversations

A series of questions about financing over 10 minutes.

"Can you tell me about your financing options?"

"What is the interest rate for good credit?"

"How long are the loan terms?"

"Can I get pre-approved?"

The Chatbot provides coherent and contextually appropriate responses throughout.

The Chatbot loses context or provides irrelevant answers midway.

Language Accuracy Testing

"Can you show me the latest SUV models?"

Accurate and grammatically correct response.

Incorrect or poorly worded response.

Cross-Platform Compatibility

"Show me your latest offers."

The Chatbot functions well on the desktop browser.

The Chatbot has display or functional issues on the desktop.

Data Privacy Testing

"Delete my personal data."

The Chatbot processes the request and confirms data deletion.

The Chatbot fails to process the request or provide confirmation.

Negative Testing

Scenario

Example Prompt

Pass Outcome

Fail Outcome

Context Testing

"Can you show me some models?"*

*Without mentioning that you are interested in cars.

The Chatbot asks a follow-up question to clarify the request.

The Chatbot provides irrelevant information or fails to respond appropriately.

Performance Testing

Multiple users simultaneously ask for vehicle prices.

The Chatbot responds to all users promptly.

The Chatbot slows down or crashes.

Focus during Long Conversations

Starts asking about car features, then abruptly asks about service packages.

"What are the features of the new Toyota Camry?"

"How often should I service my car?"

The Chatbot adjusts to the new topic without confusion.

The Chatbot gets confused or continues talking about the previous topic.

Language Accuracy Testing

"You got SUV new models?"

The Chatbot accurately understands and responds.

The Chatbot fails to understand or responds incorrectly.

Cross-Platform Compatibility

"I want to schedule a test drive." asked from the Desktop and then the device is changed to a mobile.

The Chatbot works perfectly on the mobile interface.

The Chatbot is unresponsive or has interface issues on mobile.

Data Privacy Testing

User tries to access another user's information.

The Chatbot correctly denies the request and secures data.

The Chatbot provides unauthorized access to personal information.

Edge Case Testing

Scenario

Example Prompt

Pass Outcome

Fail Outcome

Context Testing

"Do you have the XYZ SuperFast?"

The Chatbot correctly informs the user that the model is not available.

The Chatbot gives incorrect information or crashes.

Performance Testing

"What are the dealership hours?" (repeated 5 times within a minute)

The Chatbot handles repeated questions smoothly.

The Chatbot becomes unresponsive or provides inconsistent answers.

Focus during Long Conversations

"I am interested in financing a car, but I need more details."

"What is the down payment required?"

"What if my credit isn't great?"

"Is there an early repayment fee?"

The Chatbot maintains context and provides detailed responses.

The Chatbot fails to ask clarifying questions or provides incomplete information.

Language Accuracy Testing

"What r ur hrs?"

The Chatbot provides the correct information about hours.

The Chatbot fails to understand or provides incorrect information.

Cross-Platform Compatibility

"Can you show me electric cars?" asked from an outdated browser.

The Chatbot responds correctly or provides a message about browser compatibility.

The Chatbot crashes or provides an incorrect response.

Data Privacy Testing

"Can you explain your data privacy policy?"

The Chatbot accurately provides the requested information.

The Chatbot provides incomplete or incorrect information.

Did this answer your question?