search
Python star Featured

Read and Write CSV Files with Python

Simple guide to reading and writing CSV files in Python using csv module and pandas. Quick examples for data processing.

person By Gautam Sharma
calendar_today December 31, 2024
schedule 5 min read
Python CSV Data Processing File Handling

CSV files are everywhere. Here’s how to read and write them efficiently with Python.

Built-in CSV Module

No installation needed. Part of Python’s standard library.

Basic Reading

import csv

with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # ['Name', 'Age', 'City']

Reading with Headers

import csv

with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row['Name'], row['Age'])  # Access by column name

Basic Writing

import csv

data = [
    ['Name', 'Age', 'City'],
    ['John', '28', 'New York'],
    ['Sarah', '34', 'Boston']
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

Writing with Headers

import csv

headers = ['Name', 'Age', 'City']
data = [
    {'Name': 'John', 'Age': '28', 'City': 'New York'},
    {'Name': 'Sarah', 'Age': '34', 'City': 'Boston'}
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=headers)
    writer.writeheader()
    writer.writerows(data)

More powerful for data manipulation.

pip install pandas

Read CSV

import pandas as pd

# Basic read
df = pd.read_csv('data.csv')
print(df.head())

# Skip rows
df = pd.read_csv('data.csv', skiprows=2)

# Select specific columns
df = pd.read_csv('data.csv', usecols=['Name', 'Age'])

# Handle missing values
df = pd.read_csv('data.csv', na_values=['NA', 'null', ''])

# Custom delimiter
df = pd.read_csv('data.tsv', sep='\t')

# No headers
df = pd.read_csv('data.csv', header=None)

Write CSV

import pandas as pd

data = {
    'Name': ['John', 'Sarah', 'Mike'],
    'Age': [28, 34, 45],
    'City': ['New York', 'Boston', 'Chicago']
}

df = pd.DataFrame(data)

# Basic write
df.to_csv('output.csv', index=False)

# Custom delimiter
df.to_csv('output.tsv', sep='\t', index=False)

# Append to existing file
df.to_csv('output.csv', mode='a', header=False, index=False)

# Write specific columns
df[['Name', 'City']].to_csv('output.csv', index=False)

Filter Data

import pandas as pd

df = pd.read_csv('employees.csv')

# Filter rows
young_employees = df[df['Age'] < 30]
young_employees.to_csv('young_employees.csv', index=False)

# Multiple conditions
filtered = df[(df['Age'] > 25) & (df['City'] == 'New York')]
filtered.to_csv('filtered.csv', index=False)

Clean Data

import pandas as pd

df = pd.read_csv('messy_data.csv')

# Remove duplicates
df = df.drop_duplicates()

# Remove empty rows
df = df.dropna()

# Fill missing values
df['Age'] = df['Age'].fillna(0)

# Strip whitespace
df['Name'] = df['Name'].str.strip()

# Save cleaned data
df.to_csv('clean_data.csv', index=False)

Combine CSV Files

import pandas as pd
import glob

# Read all CSV files in directory
csv_files = glob.glob('data/*.csv')

# Combine into one dataframe
dfs = [pd.read_csv(file) for file in csv_files]
combined = pd.concat(dfs, ignore_index=True)

# Save combined data
combined.to_csv('combined.csv', index=False)

Transform Data

import pandas as pd

df = pd.read_csv('sales.csv')

# Add new column
df['Total'] = df['Quantity'] * df['Price']

# Rename columns
df = df.rename(columns={'old_name': 'new_name'})

# Sort data
df = df.sort_values('Total', ascending=False)

# Group and aggregate
summary = df.groupby('Product').agg({
    'Quantity': 'sum',
    'Total': 'sum'
})

summary.to_csv('summary.csv')

Handle Large Files

import pandas as pd

# Read in chunks
chunk_size = 10000

for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size):
    # Process each chunk
    processed = chunk[chunk['Age'] > 30]
    processed.to_csv('output.csv', mode='a', header=False, index=False)

Convert Data Types

import pandas as pd

df = pd.read_csv('data.csv')

# Convert types during read
df = pd.read_csv('data.csv', dtype={
    'Age': int,
    'Salary': float,
    'Name': str
})

# Convert dates
df = pd.read_csv('data.csv', parse_dates=['Date'])

# Or after reading
df['Date'] = pd.to_datetime(df['Date'])
df['Age'] = df['Age'].astype(int)

df.to_csv('typed_data.csv', index=False)

CSV to JSON

import pandas as pd

df = pd.read_csv('data.csv')

# To JSON
df.to_json('data.json', orient='records', indent=2)

# From JSON back to CSV
df = pd.read_json('data.json')
df.to_csv('converted.csv', index=False)

Handle Different Encodings

import pandas as pd

# Common encodings
df = pd.read_csv('data.csv', encoding='utf-8')
df = pd.read_csv('data.csv', encoding='latin-1')
df = pd.read_csv('data.csv', encoding='cp1252')

# Auto-detect encoding
import chardet

with open('data.csv', 'rb') as file:
    result = chardet.detect(file.read())
    encoding = result['encoding']

df = pd.read_csv('data.csv', encoding=encoding)

CSV with Quotes

import csv

# Read with quotes
with open('data.csv', 'r') as file:
    reader = csv.reader(file, quotechar='"')
    for row in reader:
        print(row)

# Write with quotes
with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file, quoting=csv.QUOTE_ALL)
    writer.writerow(['Name', 'Description'])
    writer.writerow(['Product', 'Description with, comma'])

Error Handling

import pandas as pd

try:
    df = pd.read_csv('data.csv')
except FileNotFoundError:
    print("File not found")
except pd.errors.EmptyDataError:
    print("CSV file is empty")
except pd.errors.ParserError:
    print("Error parsing CSV")
else:
    df.to_csv('output.csv', index=False)

Quick Reference

Reading:

  • csv.reader() - Simple row-by-row reading
  • csv.DictReader() - Read as dictionaries
  • pd.read_csv() - Full-featured with data manipulation

Writing:

  • csv.writer() - Simple row writing
  • csv.DictWriter() - Write dictionaries
  • df.to_csv() - Write pandas DataFrame

Key Parameters:

  • delimiter / sep - Column separator (default: comma)
  • quotechar - Character for quoting fields (default: ")
  • header - Row number for column names
  • skiprows - Rows to skip
  • encoding - File encoding (utf-8, latin-1, etc.)
  • index=False - Don’t write row numbers

When to Use What:

  • Built-in csv module: Simple reading/writing, no dependencies
  • Pandas: Data cleaning, filtering, transformation, analysis

Conclusion

For simple tasks, use Python’s built-in csv module. For anything involving data manipulation, filtering, or transformation, use pandas. Both are fast and reliable for processing CSV files.

Gautam Sharma

About Gautam Sharma

Full-stack developer and tech blogger sharing coding tutorials and best practices

Related Articles

Python

Generate Excel Files from Raw Data with Python

Quick guide to creating Excel files from raw data using Python. Learn to use openpyxl, xlsxwriter, and pandas for Excel generation.

December 31, 2024
Python

Python FFMPEG Integration: Edit Videos in Terminal

Master video editing from the command line using Python and FFmpeg. Learn to trim, merge, compress, and manipulate videos programmatically.

December 31, 2024
Python

Python ReportLab Tutorial: Edit PDF Files in Terminal

Learn how to use Python's ReportLab library to create and edit PDF files directly from the command line with practical examples.

December 31, 2024