Getting Raw Data from Pivot Tables from Excel in Python: A Step-by-Step Guide

Introduction

2 min readJun 13, 2024

Pivot tables are a cornerstone of data analysis in Excel. They allow us to summarize and explore large datasets, revealing trends and patterns within. However, when working with these insightful tables programmatically in Python, a challenge arises: directly reading their structure and filters from Excel isn’t readily achievable.

Extracting raw data from Excel Pivot Tables in Python can be a daunting task, but with the right tools and understanding, it becomes a manageable process. This guide provides a step-by-step approach to extracting raw data from Pivot Tables, empowering you to leverage the powerful data manipulation capabilities of Python.

Install the Necessary Libraries:

Begin by installing the required Python libraries. We primarily need ‘openpyxl’ for interacting with Excel files

2. Import the Libraries:

Begin by installing the required Python libraries. We primarily need ‘openpyxl’ for interacting with Excel files.

3. Load the Excel File Which Has Pivot Table Data:

Use ‘openpyxl’ to load the Excel file containing the Pivot Table:

workbook = openpyxl.load_workbook('your_excel_file.xlsx')
worksheet = workbook['Sheet_Name'] # Replace 'Sheet1' with your sheet name

4. Identify the Pivot Table:

Locate the Pivot Table within the Excel file. This usually involves navigating through the sheets and looking for the Pivot Table’s unique identifier.

for p in worksheet._pivots:
    print(p.name)
    pivot_name=p.name #This assigns the name of the desired Pivot Table to the variable

5. Extract Pivot Table Object from the worksheet:

pivot_table = [p for p in worksheet._pivots if p.name == pivot_name][0]

6. Extract Cache Field Information From a Pivot Table:

fields_map = {}
for field in pivot_table.cache.cacheFields:
    if field.sharedItems.count > 0:
      
        l = []
        for f in field.sharedItems._fields:
            try:
                l += [f.v]
            except AttributeError:
                l += [""]
        fields_map[field.name] = l

7. Extract Rows from Pivot Table Cache Into Dict:

column_names = [field.name for field in pivot_table.cache.cacheFields]
# Printing the column names
for field in pivot_table.cache.cacheFields:
    if field.sharedItems.count > 0:
        print(field.name)
rows = []
for record in pivot_table.cache.records.r:
    # If some field in the record in missing, we replace it by NaN
    record_values = [
        field.v if not isinstance(field, Missing) else np.nan for field in record._fields
    ]

    row_dict = {k: v for k, v in zip(column_names, record_values)}

    # Shared fields are mapped as an Index, so we replace the field index by its value
    for key in fields_map:
        row_dict[key] = fields_map[key][row_dict[key]]

    rows.append(row_dict)

8. Convert Dict to Pandas DataFrame:

df = pd.DataFrame.from_dict(rows)

9. Saving DataFrame Data Into CSV File:

df.to_csv("output.csv", index=False)

Please find below full code:

pivot_to_unpivot/pivot_to_unpivot.py at main · bashamsc/pivot_to_unpivot