RamGISPythonScripts: 🗂️ Batch Processing and Copying Feature Classes from CSV (ArcPy Script)

🗂️ Batch Processing and Copying Feature Classes from CSV (ArcPy Script)

This Python script uses ArcPy to automate the process of copying feature classes from various spatial data sources to a geodatabase, using metadata from a CSV file. The script performs several validations, sanitizes names, checks paths, and ensures that all relevant data (e.g., fields like OriginalName, DIA_ID, DIA_Date) are added to the copied feature classes.

🔑 Key Features:

Sanitize Feature Class Names: Automatically removes unwanted characters from feature class names.
Validate Feature Class Paths: Checks if paths are accessible, not exceeding the Windows path limit, and whether the data format is supported.
Spatial Data Check: Ensures that only spatial data is copied (non-spatial data is skipped).
Add Metadata: Adds relevant metadata fields like OriginalName, OriginalPath, DIA_ID, and DIA_Date to the feature classes.
Handle Errors Gracefully: Catches and logs any errors during the copy process, providing detailed messages for debugging.

🧑‍💻 Python Script:

python
import arcpy
import pandas as pd
import os
import re
import sys
from datetime import datetime

# Ensure the default encoding is set to UTF-8
reload(sys)
sys.setdefaultencoding('utf-8')

# Define paths
csv_file = u'C:\MPDA\GDB_Extraction\Environmental_Vector.csv'
output_folder = u'C:\MPDA\GDB_Extraction\Enviromental vector'

# List to capture failed paths
failed_paths = []

# Function to sanitize feature class names
def sanitize_name(name):
    sanitized = re.sub(r'[^a-zA-Z0-9_\u0600-\u06FF]', '_', name)
    return sanitized

# Function to check if a field exists in a feature class
def field_exists(feature_class, field_name):
    if arcpy.Exists(feature_class):
        fields = [f.name for f in arcpy.ListFields(feature_class)]
        return field_name in fields
    else:
        print(u"Feature class does not exist: {}".format(feature_class))
        return False

# Function to parse the date with multiple formats
def parse_date(date_str):
    formats = ['%m/%d/%Y', '%Y/%m/%d', '%d/%m/%Y']
    for fmt in formats:
        try:
            return datetime.strptime(date_str, fmt)
        except ValueError:
            continue
    return None

# Function to check if a path is valid and supported
def is_supported_path(path):
    if len(path) > 260:
        print(u"Path exceeds the maximum length allowed by Windows: {}".format(path))
        failed_paths.append(path)
        return False
    if not arcpy.Exists(path):
        print(u"Path does not exist or is inaccessible: {}".format(path))
        failed_paths.append(path)
        return False
    if path.lower().endswith('.mdb'):
        print(u"Unsupported file format (MDB): {}".format(path))
        failed_paths.append(path)
        return False
    return True

# Function to check if the feature class contains spatial data
def contains_spatial_data(feature_class):
    desc = arcpy.Describe(feature_class)
    if hasattr(desc, "shapeType"):
        return True
    else:
        print(u"Non-spatial data encountered: {}".format(feature_class))
        failed_paths.append(feature_class)
        return False

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(csv_file, encoding='utf-8')

# Print the column names to check if they exist
print(u"Column names in CSV:", df.columns)

# Strip any leading/trailing spaces from column names
df.columns = df.columns.str.strip()

# Group the rows in the CSV by the 'Theme' field
grouped = df.groupby('Theme')

# Iterate over each unique theme
for theme, group in grouped:
    # Create a sanitized name for the GDB
    theme_sanitized = sanitize_name(theme)
    theme_gdb = os.path.join(output_folder, u"{}.gdb".format(theme_sanitized))

    # Ensure the output GDB for the theme exists
    if not arcpy.Exists(theme_gdb):
        arcpy.CreateFileGDB_management(output_folder, u"{}.gdb".format(theme_sanitized))

    # Iterate through each row in the grouped DataFrame for this theme
    for index, row in group.iterrows():
        input_path = row['Shape']
        featureclass_name = os.path.basename(input_path).split('.')[0]
        sanitized_name = sanitize_name(featureclass_name)
        output_featureclass = os.path.join(theme_gdb, sanitized_name)

        # Check if the path is valid and supported
        if is_supported_path(input_path):
            # Check if the feature class contains spatial data
            if contains_spatial_data(input_path):
                # Check for existing feature class with the same name and create a unique name
                counter = 1
                while arcpy.Exists(output_featureclass):
                    output_featureclass = os.path.join(theme_gdb, u"{}_{}".format(sanitized_name, counter))
                    counter += 1

                # Debugging output
                print(u"Copying from {} to {}".format(input_path, output_featureclass))

                try:
                    # Copy the feature class to the output GDB
                    arcpy.CopyFeatures_management(input_path, output_featureclass)
                    print(u"Successfully copied feature class to {}".format(output_featureclass))
                except arcpy.ExecuteError:
                    print(u"Failed to copy feature class to {}".format(output_featureclass))
                    print(arcpy.GetMessages(2))
                    failed_paths.append(input_path)
                    continue

                # Check if the copied feature class exists before proceeding
                if arcpy.Exists(output_featureclass):
                    # Add fields for original name, path, DIA_ID, and DIA_Date if they do not already exist
                    if not field_exists(output_featureclass, "OriginalName"):
                        arcpy.AddField_management(output_featureclass, "OriginalName", "TEXT", field_length=255)
                    if not field_exists(output_featureclass, "OriginalPath"):
                        arcpy.AddField_management(output_featureclass, "OriginalPath", "TEXT", field_length=1000)
                    if not field_exists(output_featureclass, "DIA_ID"):
                        arcpy.AddField_management(output_featureclass, "DIA_ID", "TEXT")
                    if not field_exists(output_featureclass, "DIA_Date"):
                        arcpy.AddField_management(output_featureclass, "DIA_Date", "DATE")

                    # Update the new fields with the original name, path, and other CSV data
                    with arcpy.da.UpdateCursor(output_featureclass, ["OriginalName", "OriginalPath", "DIA_ID", "DIA_Date"]) as cursor:
                        for cursor_row in cursor:
                            cursor_row[0] = featureclass_name
                            cursor_row[1] = row['File Path']  # Fill OriginalPath with the value from the File Path column

                            cursor_row[2] = row['ID'] if 'ID' in df.columns else None

                            if 'DIA_Date' in df.columns and pd.notnull(row['DIA_Date']):
                                date_value = parse_date(row['DIA_Date'])
                                if date_value:
                                    cursor_row[3] = date_value
                                else:
                                    print(u"Error parsing date for row {}: Invalid date format '{}'".format(index, row['DIA_Date']))
                                    cursor_row[3] = None
                            else:
                                cursor_row[3] = None

                            cursor.updateRow(cursor_row)
                else:
                    print(u"Copied feature class does not exist: {}".format(output_featureclass))
                    failed_paths.append(input_path)
            else:
                print(u"Skipping non-spatial data: {}".format(input_path))
        else:
            print(u"Skipping unsupported or inaccessible path: {}".format(input_path))

# Print the list of failed paths
if failed_paths:
    print("\nThe following file paths failed during the process:")
    for path in failed_paths:
        print(path)
else:
    print("\nAll file paths were processed successfully.")

print(u"Process completed successfully.")

🧑‍💻 How It Works:

CSV Input: The script reads metadata from a CSV file.
Feature Class Copying: It copies the feature classes from their input paths to output GDBs, ensuring data validity.
Field Management: Adds necessary fields and populates them with metadata from the CSV file.

Error Handling: Logs failed paths for further review.

Sample Input CSV:

Shape	File Path	Theme	ID	DIA_Date
C:\Data\Environmental\Area1.shp	C:\Data\Environmental\Area1.shp	Environmental	1001	01/05/2020
C:\Data\Environmental\Area2.shp	C:\Data\Environmental\Area2.shp	Environmental	1002	03/12/2021
C:\Data\Environmental\Water.shp	C:\Data\Environmental\Water.shp	Water	1003	11/22/2022
C:\Data\Environmental\Soil.shp	C:\Data\Environmental\Soil.shp	Environmental	1004	06/14/2020
C:\Data\WaterResources\River.shp	C:\Data\WaterResources\River.shp	Water	1005	08/18/2021
C:\Data\WaterResources\Lake.shp	C:\Data\WaterResources\Lake.shp	Water	1006	09/01/2019

CSV Column Explanation:

Shape: The full path to the input feature class (Shapefile or other formats).
File Path: The location of the file on your system, which may be used as additional metadata.
Theme: The thematic grouping or category for the feature class (e.g., Environmental, Water, etc.).
ID: A unique identifier for the feature class, which could correspond to specific metadata or other cataloging information.
DIA_Date: The date associated with the feature class data (e.g., the date it was created, modified, or captured).

How It Relates to the Script:

Shape: The path to the spatial data file (Shapefile or feature class).
File Path: Used to store the file path as metadata in the output feature classes.
Theme: Grouped by the script into separate geodatabases for organization.
ID: Added to the feature class as a field if present.
DIA_Date: Added as a date field and parsed using multiple formats.

✅ Conclusion:

This script automates copying spatial data to organized geodatabases, ensuring all fields and metadata are correctly updated while handling any issues gracefully.

RamGISPythonScripts

🔍 Extract Field Names Containing 'type' (Integer Fields Without Domain) from GDB Using ArcPy

Wednesday, May 7, 2025

🗂️ Batch Processing and Copying Feature Classes from CSV (ArcPy Script)

🗂️ Batch Processing and Copying Feature Classes from CSV (ArcPy Script)

🔑 Key Features:

🧑‍💻 Python Script:

🧑‍💻 How It Works:

Sample Input CSV:

CSV Column Explanation:

How It Relates to the Script:

✅ Conclusion:

No comments:

Post a Comment