Getting Raw Data from Pivot Tables from Excel in Python: A Step-by-Step Guide

Introduction

Mahaboob Basha
2 min readJun 13, 2024

Pivot tables are a cornerstone of data analysis in Excel. They allow us to summarize and explore large datasets, revealing trends and patterns within. However, when working with these insightful tables programmatically in Python, a challenge arises: directly reading their structure and filters from Excel isn’t readily achievable.

Extracting raw data from Excel Pivot Tables in Python can be a daunting task, but with the right tools and understanding, it becomes a manageable process. This guide provides a step-by-step approach to extracting raw data from Pivot Tables, empowering you to leverage the powerful data manipulation capabilities of Python.

  1. Install the Necessary Libraries:

Begin by installing the required Python libraries. We primarily need ‘openpyxl’ for interacting with Excel files

2. Import the Libraries:

Begin by installing the required Python libraries. We primarily need ‘openpyxl’ for interacting with Excel files.

3. Load the Excel File Which Has Pivot Table Data:

Use ‘openpyxl’ to load the Excel file containing the Pivot Table:

workbook = openpyxl.load_workbook('your_excel_file.xlsx')
worksheet = workbook['Sheet_Name'] # Replace 'Sheet1' with your sheet name

4. Identify the Pivot Table:

Locate the Pivot Table within the Excel file. This usually involves navigating through the sheets and looking for the Pivot Table’s unique identifier.

for p in worksheet._pivots:
print(p.name)
pivot_name=p.name #This assigns the name of the desired Pivot Table to the variable

5. Extract Pivot Table Object from the worksheet:

pivot_table = [p for p in worksheet._pivots if p.name == pivot_name][0]

6. Extract Cache Field Information From a Pivot Table:

fields_map = {}
for field in pivot_table.cache.cacheFields:
if field.sharedItems.count > 0:

l = []
for f in field.sharedItems._fields:
try:
l += [f.v]
except AttributeError:
l += [""]
fields_map[field.name] = l

7. Extract Rows from Pivot Table Cache Into Dict:

column_names = [field.name for field in pivot_table.cache.cacheFields]
# Printing the column names
for field in pivot_table.cache.cacheFields:
if field.sharedItems.count > 0:
print(field.name)
rows = []
for record in pivot_table.cache.records.r:
# If some field in the record in missing, we replace it by NaN
record_values = [
field.v if not isinstance(field, Missing) else np.nan for field in record._fields
]

row_dict = {k: v for k, v in zip(column_names, record_values)}

# Shared fields are mapped as an Index, so we replace the field index by its value
for key in fields_map:
row_dict[key] = fields_map[key][row_dict[key]]

rows.append(row_dict)

8. Convert Dict to Pandas DataFrame:

df = pd.DataFrame.from_dict(rows)

9. Saving DataFrame Data Into CSV File:

df.to_csv("output.csv", index=False)

Please find below full code:

Summary:

This article describes a step-by-step technique for properly transforming data from a Pivot Table in an Excel file into a structured DataFrame using Pandas, which is then written to a CSV file for easy examination or additional processing.

--

--