In data science and machine learning, Python has become the preferred language due to its extensive range of libraries and support. One of the recent Python library named PandasAI is gaining attention from the data people. This generative AI Python library enhances the popular pandas
library by incorporating large language models (LLMs).
In this blog, we will explore PandasAI, discussing its use cases and technical specifics, and we’ll also go through a mini project to demonstrate how it actually works. Let’s get started with WeCloudData!
Why Generative AI with Pandas?
For years, pandas
has been essential for data manipulation in Python. However, as datasets become more complex, there is a greater need for automation and intelligence in analysis. This is where Python libraries for generative AI come in. They allow for:
- Automatically generating code to answer data-related questions.
- Creating summaries of data in natural language.
- Automating exploratory data analysis (EDA).
- Improving decision-making through conversational interfaces.
By combining pandas
with LLMs, PandasAI bridges the gap between generative AI libraries in Python and traditional data science tools.
What is PandasAI?
PandasAI is a Python generative AI library built on top of [pandas](<https://pandas.pydata.org/>)
. It enables users to query data using plain English while utilizing Gen AI Python libraries, such as OpenAI’s GPT models and Hugging Face Transformers, and other platforms to interpret queries and provide responses.
Key Features:
- Natural Language Queries: Rather than writing complex Python functions, you can simply ask questions like “What is the average sales by region?”.
- Flexible LLM Integration: It supports various generative AI Python libraries for different backends.
- Code Transparency: It displays the underlying code created by the AI.
- Seamless Pandas Integration: It works directly with your existing DataFrames.

PandasAI Technical Walkthrough
Now let’s get technical and build a mini-project using PandasAI.
Installation
The first step is to install the required library packages. We’ll also need an LLM backend, such as OpenAI:

Envirnoment Setup
The explanation for the following code snippet is below for your understanding:
- import pandas as pd: Loads the pandas library for handling dataframes.
- from pandasai import SmartDataframe: Imports PandasAI’s smart dataframe wrapper.
- OpenAI: Specifies OpenAI’s LLM backend for natural language processing.
- data = {…}: Creates a simple dataset with regions, sales, and profit.
- pd.DataFrame(data): Converts the dictionary into a pandas DataFrame.
- SmartDataframe(df, config={“llm”: llm}): Wraps the DataFrame with PandasAI, allowing you to query it using natural language.

We now have the mini data, so let’s move on to our next step, using Panda AI for querying data.
Querying Data with Natural Language
When you run sdf.chat(…), PandasAI takes these natural language queries and passes them to the connected LLM backend (like OpenAI GPT). The model interprets the query, then generates the appropriate Python code (pandas operations, visualizations, etc.) to answer it. PandasAI executes this code securely on your DataFrame, retrieves the result, and returns it in human-readable form. This process lets you interact with data conversationally while still using pandas under the hood.

PandasAI- Mini Project: Amazon Sales Insights with PandasAI
Let’s go beyond simple queries and build a sales insights mini-project powered by PandasAI. We’ll use open-source Amazon sales data for this project. You can find the dataset here.
Step 1: Load Dataset
The first step is to load the Amazon sales CSV into a pandas DataFrame. This changes the raw tabular data into a format that we can easily work with. After loading the data, we wrap it with SmartDataframe from PandasAI. This wrapper is important because it allows us to interact with the dataset using natural language instead of writing complex queries. For example, instead of using df.groupby(“Category”)[“Amount”].sum(), we can simply ask, “Which category has the highest sales?”

Step 2: Ask Analytical Questions
A common question for e-commerce businesses is which categories generate the most revenue. Normally, you would need to group the data by category and sum the sales amounts. With PandasAI, you can just ask in plain English, and it will create and run the required pandas code automatically.

Behind the scenes, PandasAI sends this request to the LLM, which generates the group-by and aggregation code, runs it on your dataset, and returns the result. This helps you quickly find your most profitable product categories.
Step 3: Identify Monthly Sales Trends
Understanding how sales change over time is essential for planning campaigns, restocking, and forecasting. PandasAI lets you ask questions directly. Instead of writing time-series code, it automatically picks the Date column, groups sales by month, and creates a line chart. The output gives you a clear visual of seasonality, such as spikes during Black Friday or Prime Day, and shows any slow months that may need attention.

Step 4: Analyze Product Sizes and Variants
Amazon sellers often have products in multiple sizes and variants. To find out which sizes are selling best, we can ask:

This produces a ranked list of sizes based on quantity sold. With this information, businesses can manage inventory better by focusing on the sizes that sell the fastest, cutting down on stockouts and excess storage costs.
Step 5: Evaluate Courier and Fulfillment Performance
On-time delivery is crucial for customer satisfaction. By using PandasAI, we can ask:

This analysis shows how well logistics partners are performing. If a large percentage of orders are delayed or returned, sellers can quickly pinpoint problems with specific courier services or fulfillment methods.
Step 6: Compare B2B vs Retail Sales
Amazon supports both retail (B2C) and wholesale (B2B) sales. These two channels often function differently. To compare them, you can simply query:

PandasAI groups sales by the B2B flag, calculates total revenue, and figures out average order value (AOV). This allows sellers to see whether their wholesale operations or individual sales are more profitable.
Step 7: Generate an Executive Summary
Lastly, you might want a high-level report for stakeholders. With PandasAI, this is as simple as asking:

Here, the LLM organizes your dataset into a plain-English report that covers key categories, sales trends, delivery performance, and channel comparisons.
Gen AI Python Libraries Use Cases
Here are the major use cases where PandasAI and other python libraries for generative AI shine:
- Customer Insights: Segment customers by purchase behavior automatically.
- Financial Analysis: Generate quick portfolio insights from stock market data.
- Healthcare Data: Summarize patient trends from electronic health records.
- Marketing Analytics: Identify top-performing campaigns with natural language queries.
- Supply Chain: Predict bottlenecks and generate visualizations on demand.
Bringing Generative AI into Data Analysis
PandasAI is not just another library in the long list of Gen AI Python libraries. It is a revolutionary generative AI library that Python developers can use to add natural language interaction to their data workflows. Whether you are a data scientist, analyst, or a business professional, PandasAI allows you to query, analyze, and visualize data with remarkable ease.
In the larger ecosystem of generative AI libraries in Python, PandasAI stands out because it addresses a genuine issue: making data analysis more intuitive and accessible. If you have ever asked, “Which Python library is used for generative AI?” or “What is the best AI library for Python?”, PandasAI should definitely be on your shortlist.
So go ahead, try it out, explore the Gen AI Python libraries examples, and see how PandasAI can change the way you work with data.
Learn and Grow With WeCloudData
At WeCloudData, we are committed to bridging the gap between AI and education by offering cutting-edge AI training programs, data science bootcamps, computer vision bootcamps, natural language processing, and machine learning courses. As tools like PandasAI show, the future of data analysis is moving toward natural language interaction and AI-driven insights. Whether you’re an educator looking to integrate AI into teaching or a student eager to develop AI skills, WeCloudData provides expert-led courses that prepare you to introduce innovations like PandasAI in real-world projects.
Explore AI-driven learning solutions today!
Visit Visit WeCloudData to start your journey into AI-powered education.