🔍 Extract Field Names Containing 'type' (Integer Fields Without Domain) from GDB Using ArcPy

  ⚙️ How the Script Works 🗂️ Geodatabase Setup The script starts by pointing to a target File Geodatabase (.gdb) and initializing a CSV ...

Tuesday, May 6, 2025

🗂️ Group and Convert CSV Files to Excel (Python)

 

🗂️ Group and Convert CSV Files to Excel (Python)

This script will:

  1. Group CSV files based on their naming convention (by prefix and category).

  2. Read each CSV file, which is delimited by | (pipe character).

  3. Combine data from each group into a single Excel file with separate sheets for each category.

  4. Save the final output as Excel files organized by their prefix.


🔑 Key Steps:

  • File Grouping: It identifies and groups CSV files by their prefix (e.g., "Area", "Basemap").

  • Merging Data: All CSVs in each group are combined into one Excel sheet for that category.

  • Excel Output: The final grouped data is written into Excel files, with each category in a separate sheet.


🧑‍💻 Python Script:

python
import os import pandas as pd # Define the input folder containing all CSV files input_folder = r"C:\Users\Testuser\OneDrive - GPC Global Information Solutions LLC\Projects\MPDA\DataModel\Schema_V2\CSV\Schema" output_folder = r"C:\Users\Testuser\OneDrive - GPC Global Information Solutions LLC\Projects\MPDA\DataModel\Schema_V2\CSV\Schema\SchemaExcel" # Create the output folder if it doesn't exist os.makedirs(output_folder, exist_ok=True) # Dictionary to group files by their prefix (e.g., "Area", "Basemap") grouped_files = {} # Group files based on prefix for csv_file in os.listdir(input_folder): if csv_file.endswith(".csv"): # Split filename to get prefix and category parts = csv_file.split("_") if len(parts) < 2: continue # Skip files that don't follow the naming convention prefix = parts[0] # E.g., "Area" or "Basemap" category = parts[1].split(".")[0] # E.g., "Domain" or "Schema" # Group files under their prefix and category if prefix not in grouped_files: grouped_files[prefix] = {} if category not in grouped_files[prefix]: grouped_files[prefix][category] = [] grouped_files[prefix][category].append(os.path.join(input_folder, csv_file)) # Create Excel files for each prefix group for prefix, categories in grouped_files.items(): output_file = os.path.join(output_folder, f"{prefix}.xlsx") writer = pd.ExcelWriter(output_file, engine='openpyxl') for category, files in categories.items(): combined_df = pd.DataFrame() # Initialize empty DataFrame for file in files: try: # Read CSV with "|" as the delimiter df = pd.read_csv(file, encoding='utf-8', delimiter='|', on_bad_lines='skip') combined_df = pd.concat([combined_df, df], ignore_index=True) except Exception as e: print(f"Error processing file {file}: {e}") # Write to a sheet named after the category combined_df.to_excel(writer, sheet_name=category[:31], index=False) # Sheet names max 31 chars # Save and close the Excel writer writer.close() print(f"Excel file created: {output_file}")

📋 Key Features:

  1. Grouping: Organizes CSV files based on the prefix (e.g., "Area", "Basemap").

  2. Excel Sheets: Each category within a prefix gets its own Excel sheet.

  3. Error Handling: It gracefully handles errors when reading malformed CSVs or reading issues with specific files.

  4. Efficient Output: All grouped data is consolidated into a clean, organized Excel workbook for each prefix.


✅ Conclusion:

This solution provides an efficient method for converting and consolidating multiple CSV files into structured Excel workbooks. It's perfect for organizing large datasets, especially when dealing with a variety of related CSV files that need to be grouped and categorized.