As discussed in the previous post about Azure OpenAI, using the Retrieval Augmented Generation (RAG) pattern is a simple & effective way to enable Azure OpenAI to “chat with your data”. This pattern enables you to search your data with Azure Cognitive Search (a search engine), retrieve relevant snippets of information from your data, then add that additional information to your prompts to the Azure OpenAI service to respond in human readable fashion.
This pattern is simple, effective and should always be the first thing you deploy. There are several excellent reference implementations posted on GitHub to help you do this.
- Azure-Samples/azure-search-openai-demo: A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. (github.com)
- Azure-Samples/azure-search-openai-demo-csharp: A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. (github.com)
This isn’t always enough
However, if you look at the implementation of the RAG pattern, you will notice that it can be limiting. There is only a single data source (Cognitive Search). Technically, Cognitive Search can index many data sources (PDFs, Word docs, etc). It is best at searching unstructured data. In addition, I have to manually copy the results from the Cognitive Search and append them to the prompt myself in code. I had to parse the results of Cognitive Search myself in code.
This leads to several questions:
- What if I want to use multiple data sources to retrieve the data necessary to answer the user’s question?
- What if I don’t know or don’t want to hard-code the order of operation for calling multiple data sources?
- What if I don’t want to write the parsing code for calling multiple data sources with disparate data formats?
- How can I let AI “orchestrate” the API calls to answer questions & pull together data that I couldn’t predict beforehand?
To solve these more complex issues, I need more than the RAG pattern.
More complex example
Pretend that you are building the chatbot to help answer common customer support questions for an outdoor sports equipment manufacturer. Here are some common data sources that customer support uses to answer customer questions.
- Order History – what products did the customer order in the past?
- Product Catalog – what products does our company sell (and what are their specifications)?
- Weather – what is the weather likely to be if we use the company’s outdoor sporting equipment for a camping trip?
These data sources could be pulled together to answer complex user questions. These data sources could be implemented in multiple systems, each with different authorization requirements, different languages, different protocols, etc.
We need to be able to explain to the OpenAI service when it needs to call out to our external data sources, what kind of input data to pass in & what kind of data to expect back. These explanations need to be in human-understandable terms, since that is what the large language model is using.

Example complex user query
Will my sleeping bag work for my trip to Patagonia next month?
Let’s break this question down.
We need to know what sleeping bag is “my sleeping bag”. Just looking in the product catalog (as a simple RAG implementation would do) couldn’t answer this question. We need to look in the user’s order history to find out which sleeping bag they purchased in the past.

We need to know the product specifications for the specific sleeping bag the user purchased. Since we looked in the order history for that user, we know the product ID. We can look up common specifications (using Cognitive Search).

In this case, we want to know the lowest expected temperature the customer is likely to encounter during their trip. The user specified that they are going to “Patagonia” “next month”, so we need to figure out where in the world “Patagonia” is (what are its GPS coordinates). Of course, Patagonia is a huge area, so we are making an assumption about what the GPS location is. A good chatbot would ask for more information about where “exactly” the user was planning on camping.

Now that we know the general order of operations, we need to somehow get AI to generate a plan to call the various data sources listed above. We need it to call them in the right order, parse the output (since each API will return different data and doesn’t know about the other APIs) and finally compose a response.

Enter Semantic Kernel
Semantic Kernel is an open-source SDK (C#, Java, Python, JavaScript) that can be used to simplify embedding the Azure OpenAI service in your application. In the simplest case, it takes the Azure OpenAI endpoints & allows you to call the “chat” and “embedding” endpoints. Using Semantic Kernel, you don’t have to make explicit REST API calls to the service.
Taking this to the next level, the Semantic Kernel SDK includes the concept of a “planner“. The planner orchestrates multiple calls the Azure OpenAI service to make a plan to answer the user’s questions. You can then execute the plan.
Semantic Kernel also lets you create “plugins“, which can be calls out to native code. These native plugins can do mathematical equations, read/write files or make API calls.
Much of the work of building this kind of application will be in the “prompt engineering” of trying to explain to the Semantic Kernel when it should call your plugin & what parameters to pass in.
Here is the link to the GitHub repo.
In part 2, I will explain how I built a sample app that does this.