How to build an Azure OpenAI intent plugin in Semantic Kernel to help orchestrate which Azure AI Search index to use

My GitHub repo demonstrates a simple Python app that implements this pattern.

The Retrieval, Augment, Generatation (RAG) pattern is a very effective pattern for grounding the Azure OpenAI service with relevant data in order to respond to user questions.

The RAG pattern is needed because we have to limit the data that is passed to the OpenAI service to only relevant information. We cannot pass all the data we have with each query, both due to token limitations and due to the tendency for the AI to hallucinate if we give it too much irrelevant information.

The RAG pattern lets us query an external data source (such as Azure AI Search) to get relevant information that we can append to the prompt when we call OpenAI.

This pattern is easy to implement if there is only one external data source.

However, we might want to separate the Azure AI Search indexes into multiple indexes. Common reasons to do this might be due to not wanting to have 1 index with documents that might have similar keywords, but are used in different contexts. It might also be due to different authorization requirements.

We can use Semantic Kernel to help orchestrate this workflow.

We will have 3 building blocks in our solution.

  • Azure OpenAI
    • Provides both the chat completion & embedding models we will use as our Large Language Model (LLM)
  • Azure AI Search
    • Provides text, semantic & vector searches of unstructured documents (PDFs, PowerPoints, etc)
    • Contains indexes of unstructured data to search
  • Azure App Service (or Container Apps, AKS, etc)
    • Runs our web app
    • Uses Semantic Kernel to orchestrate AI calls
    • Contains an Intent plugin to help determine domain of user request

Orchestration

The flow of the user interaction will look like this.

1. The user’s question comes in.

2. Our web app will receive the user’s request and (using Semantic Kernel) make a call to the Intent plugin to determine the user’s intent (or domain).

3. The Intent plugin looks at the user’s question and, based upon our example questions & keywords, determine the user’s intent based upon the pre-defined intents, using OpenAI to make the determination.

4. The Web App receives the recommendation from the Intent plugin. It can now evaluate if the user is authorized to see the relevant data (via Entra ID groups).

5. The Web App loads the appropriate index & issues a query to retrieve relevant documents.

6. The Web App now can augment the prompt and issue a call to OpenAI to receive an answer.

Azure AI Search

Our Azure AI Search service will contain multiple indexes. Each index corresponds to a single domain of data. This way, we ensure that there won’t be any data included in the prompt to OpenAI that is not relevant to the question. We can also easily implement authorization by only searching an index if the user is a part of a specific Entra ID group.

NOTE: an alternative authorization approach without using an “intent” plugin would be to use one index of all files, but use security filters to limit the data that is returned & passed to the OpenAI model to just the files the user has access to (specified via Entra ID group membership)

Intent plugin

The key to making the workflow work is that we need to identify the “intent” of the user’s question to determine which index is appropriate.

Our Intent plugin is a “prompt” plugin, meaning it is based upon a prompt (as opposed to a “native” plugin, which would be implemented in code (C#, Python, etc.).

We will craft our Intent plugin prompt to allow the OpenAI service to determine the user’s intent by providing example questions, keywords, etc. to help it decide which domain the question is specifically targeting.

We are going to provide a list of pre-defined intents (choices) to restrict what can be chosen. We will then switch off this value when the result comes back.

Here is an example of one way to define the Intent plugin’s prompt.

[Chat History]
{{$chat_history}}

User: {{$user_input}}

----------------------------------------------
Return the intent of the user. The intent must be one of the following strings:
- movies: Use this intent to answer questions about movies.
- songs: Use this intent to answer questions about songs.
- not_found: Use this intent if you can't find a suitable answer

Only return the intent. Do not return any other information.

Examples:
User question: Who directed Dune?
Intent: movies

User question: Which actors were in the movie Robocop?
Intent: movies

User question: Was the box office return of Predator?
Intent: movies

User question: What is the best song of all time?
Intent: songs

User question: How many Grammys did Michael Jackson win?
Intent: songs

User question: What are the current trending playlists?
Intent: songs

Intent:

A complete example of an intent plugin can be found on the Semantic Kernel GitHub repo.

Semantic Kernel Code (Python)

First, we need to initialize the kernel & load our chatbot.

kernel = sk.Kernel()
...
chat_function = kernel.register_semantic_function("ChatBot", "Chat", function_config)

Next, we will load in the Intent plugin from the file system.

intent_plugin = kernel.import_semantic_plugin_from_directory(plugins_directory + "/semantic", "IntentDetectionPlugin")

We can now run the chat loop and let the user enter their question.

try:
    user_input = input("User:> ")
    context.variables["user_input"] = user_input
except KeyboardInterrupt:
    print("\n\nExiting chat...")
    return False
except EOFError:
    print("\n\nExiting chat...")
    return False

if user_input == "exit":
    print("\n\nExiting chat...")
    return False

We will take the user_input and pass it to the Intent plugin to get a response on which intent (domain) the user’s question is related to.

context = await kernel.run(intent_plugin["AssistantIntent"], input_context=context)

We can switch on the returned intent to decide which index to load.

if(context.result == "movies"):    
    chat_function = generate_new_chat_function_using_index(kernel=kernel, index_name="movies")
    return chat_function
  
if(context.result == "songs"):
    chat_function = generate_new_chat_function_using_index(kernel=kernel, index_name="songs")
    return chat_function

We should remove the old chat service and replace it with our new one, which has access to the songs index and uses Semantic Kernel’s built-in ability to query AI Search.

#replace the existing chat service (that wasn't pointed at a specific Azure AI Search index) with the new chat service
kernel.remove_chat_service("chat-gpt")
kernel.add_chat_service("chat-gpt", chat_service)

Finally, we can pass the user’s original question to the OpenAI service to allow it to query AI Search, get relevant documentation, then answer the question.

# get a new chat function that is pointed at a specific Azure AI Search index        
chat_function_for_index = setup_kernel_for_specific_index(context, kernel)

#answer the user's question using the specific index set up in the previous step
context = await kernel.run(chat_function_for_index, input_context=context)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *