🔍 Extract Field Names Containing 'type' (Integer Fields Without Domain) from GDB Using ArcPy

  ⚙️ How the Script Works 🗂️ Geodatabase Setup The script starts by pointing to a target File Geodatabase (.gdb) and initializing a CSV ...

Wednesday, May 7, 2025

🗂️ Batch Processing and Copying Feature Classes from CSV (ArcPy Script)

 

🗂️ Batch Processing and Copying Feature Classes from CSV (ArcPy Script)

This Python script uses ArcPy to automate the process of copying feature classes from various spatial data sources to a geodatabase, using metadata from a CSV file. The script performs several validations, sanitizes names, checks paths, and ensures that all relevant data (e.g., fields like OriginalName, DIA_ID, DIA_Date) are added to the copied feature classes.


🔑 Key Features:

  • Sanitize Feature Class Names: Automatically removes unwanted characters from feature class names.

  • Validate Feature Class Paths: Checks if paths are accessible, not exceeding the Windows path limit, and whether the data format is supported.

  • Spatial Data Check: Ensures that only spatial data is copied (non-spatial data is skipped).

  • Add Metadata: Adds relevant metadata fields like OriginalName, OriginalPath, DIA_ID, and DIA_Date to the feature classes.

  • Handle Errors Gracefully: Catches and logs any errors during the copy process, providing detailed messages for debugging.


🧑‍💻 Python Script:

python
import arcpy import pandas as pd import os import re import sys from datetime import datetime # Ensure the default encoding is set to UTF-8 reload(sys) sys.setdefaultencoding('utf-8') # Define paths csv_file = u'C:\MPDA\GDB_Extraction\Environmental_Vector.csv' output_folder = u'C:\MPDA\GDB_Extraction\Enviromental vector' # List to capture failed paths failed_paths = [] # Function to sanitize feature class names def sanitize_name(name): sanitized = re.sub(r'[^a-zA-Z0-9_\u0600-\u06FF]', '_', name) return sanitized # Function to check if a field exists in a feature class def field_exists(feature_class, field_name): if arcpy.Exists(feature_class): fields = [f.name for f in arcpy.ListFields(feature_class)] return field_name in fields else: print(u"Feature class does not exist: {}".format(feature_class)) return False # Function to parse the date with multiple formats def parse_date(date_str): formats = ['%m/%d/%Y', '%Y/%m/%d', '%d/%m/%Y'] for fmt in formats: try: return datetime.strptime(date_str, fmt) except ValueError: continue return None # Function to check if a path is valid and supported def is_supported_path(path): if len(path) > 260: print(u"Path exceeds the maximum length allowed by Windows: {}".format(path)) failed_paths.append(path) return False if not arcpy.Exists(path): print(u"Path does not exist or is inaccessible: {}".format(path)) failed_paths.append(path) return False if path.lower().endswith('.mdb'): print(u"Unsupported file format (MDB): {}".format(path)) failed_paths.append(path) return False return True # Function to check if the feature class contains spatial data def contains_spatial_data(feature_class): desc = arcpy.Describe(feature_class) if hasattr(desc, "shapeType"): return True else: print(u"Non-spatial data encountered: {}".format(feature_class)) failed_paths.append(feature_class) return False # Read the CSV file into a pandas DataFrame df = pd.read_csv(csv_file, encoding='utf-8') # Print the column names to check if they exist print(u"Column names in CSV:", df.columns) # Strip any leading/trailing spaces from column names df.columns = df.columns.str.strip() # Group the rows in the CSV by the 'Theme' field grouped = df.groupby('Theme') # Iterate over each unique theme for theme, group in grouped: # Create a sanitized name for the GDB theme_sanitized = sanitize_name(theme) theme_gdb = os.path.join(output_folder, u"{}.gdb".format(theme_sanitized)) # Ensure the output GDB for the theme exists if not arcpy.Exists(theme_gdb): arcpy.CreateFileGDB_management(output_folder, u"{}.gdb".format(theme_sanitized)) # Iterate through each row in the grouped DataFrame for this theme for index, row in group.iterrows(): input_path = row['Shape'] featureclass_name = os.path.basename(input_path).split('.')[0] sanitized_name = sanitize_name(featureclass_name) output_featureclass = os.path.join(theme_gdb, sanitized_name) # Check if the path is valid and supported if is_supported_path(input_path): # Check if the feature class contains spatial data if contains_spatial_data(input_path): # Check for existing feature class with the same name and create a unique name counter = 1 while arcpy.Exists(output_featureclass): output_featureclass = os.path.join(theme_gdb, u"{}_{}".format(sanitized_name, counter)) counter += 1 # Debugging output print(u"Copying from {} to {}".format(input_path, output_featureclass)) try: # Copy the feature class to the output GDB arcpy.CopyFeatures_management(input_path, output_featureclass) print(u"Successfully copied feature class to {}".format(output_featureclass)) except arcpy.ExecuteError: print(u"Failed to copy feature class to {}".format(output_featureclass)) print(arcpy.GetMessages(2)) failed_paths.append(input_path) continue # Check if the copied feature class exists before proceeding if arcpy.Exists(output_featureclass): # Add fields for original name, path, DIA_ID, and DIA_Date if they do not already exist if not field_exists(output_featureclass, "OriginalName"): arcpy.AddField_management(output_featureclass, "OriginalName", "TEXT", field_length=255) if not field_exists(output_featureclass, "OriginalPath"): arcpy.AddField_management(output_featureclass, "OriginalPath", "TEXT", field_length=1000) if not field_exists(output_featureclass, "DIA_ID"): arcpy.AddField_management(output_featureclass, "DIA_ID", "TEXT") if not field_exists(output_featureclass, "DIA_Date"): arcpy.AddField_management(output_featureclass, "DIA_Date", "DATE") # Update the new fields with the original name, path, and other CSV data with arcpy.da.UpdateCursor(output_featureclass, ["OriginalName", "OriginalPath", "DIA_ID", "DIA_Date"]) as cursor: for cursor_row in cursor: cursor_row[0] = featureclass_name cursor_row[1] = row['File Path'] # Fill OriginalPath with the value from the File Path column cursor_row[2] = row['ID'] if 'ID' in df.columns else None if 'DIA_Date' in df.columns and pd.notnull(row['DIA_Date']): date_value = parse_date(row['DIA_Date']) if date_value: cursor_row[3] = date_value else: print(u"Error parsing date for row {}: Invalid date format '{}'".format(index, row['DIA_Date'])) cursor_row[3] = None else: cursor_row[3] = None cursor.updateRow(cursor_row) else: print(u"Copied feature class does not exist: {}".format(output_featureclass)) failed_paths.append(input_path) else: print(u"Skipping non-spatial data: {}".format(input_path)) else: print(u"Skipping unsupported or inaccessible path: {}".format(input_path)) # Print the list of failed paths if failed_paths: print("\nThe following file paths failed during the process:") for path in failed_paths: print(path) else: print("\nAll file paths were processed successfully.") print(u"Process completed successfully.")

🧑‍💻 How It Works:

  1. CSV Input: The script reads metadata from a CSV file.

  2. Feature Class Copying: It copies the feature classes from their input paths to output GDBs, ensuring data validity.

  3. Field Management: Adds necessary fields and populates them with metadata from the CSV file.

  4. Error Handling: Logs failed paths for further review.

    Sample Input CSV:

    ShapeFile PathThemeIDDIA_Date
    C:\Data\Environmental\Area1.shpC:\Data\Environmental\Area1.shpEnvironmental100101/05/2020
    C:\Data\Environmental\Area2.shpC:\Data\Environmental\Area2.shpEnvironmental100203/12/2021
    C:\Data\Environmental\Water.shpC:\Data\Environmental\Water.shpWater100311/22/2022
    C:\Data\Environmental\Soil.shpC:\Data\Environmental\Soil.shpEnvironmental100406/14/2020
    C:\Data\WaterResources\River.shpC:\Data\WaterResources\River.shpWater100508/18/2021
    C:\Data\WaterResources\Lake.shpC:\Data\WaterResources\Lake.shpWater100609/01/2019

    CSV Column Explanation:

    • Shape: The full path to the input feature class (Shapefile or other formats).

    • File Path: The location of the file on your system, which may be used as additional metadata.

    • Theme: The thematic grouping or category for the feature class (e.g., Environmental, Water, etc.).

    • ID: A unique identifier for the feature class, which could correspond to specific metadata or other cataloging information.

    • DIA_Date: The date associated with the feature class data (e.g., the date it was created, modified, or captured).


    How It Relates to the Script:

    1. Shape: The path to the spatial data file (Shapefile or feature class).

    2. File Path: Used to store the file path as metadata in the output feature classes.

    3. Theme: Grouped by the script into separate geodatabases for organization.

    4. ID: Added to the feature class as a field if present.

    5. DIA_Date: Added as a date field and parsed using multiple formats.

      ✅ Conclusion:

      This script automates copying spatial data to organized geodatabases, ensuring all fields and metadata are correctly updated while handling any issues gracefully.

No comments:

Post a Comment