IS4010: AI-Enhanced Application Development

Week 4: Python Data Structures

Brandon M. Greenwell

Welcome to Python Data Structures 📦

The foundation of organized programming 🏗️

  • Data structures are the backbone of efficient programming
  • They determine how we organize, store, and access information
  • Python’s philosophy: “Batteries included” - powerful built-in data structures ready to use
  • Career relevance: Understanding data structures is essential for technical interviews
  • Real-world impact: Every app you use relies on sophisticated data organization

Session 1: Sequences & collections

What is a data structure? 🤔

  • A data structure is a way to organize and store multiple pieces of data in a single variable
  • Think of them as specialized containers for your data
  • Python’s built-in arsenal: Lists, tuples, dictionaries, and sets
  • Choosing wisely: The right data structure makes your code faster, cleaner, and more maintainable
  • Performance matters: Different structures excel at different operations

The list: a mutable sequence 📝

  • A list is an ordered, changeable collection of items
  • Most versatile: Lists are the workhorses of Python programming
  • Ordered: Items maintain their position - first in, stays first (unless you change it)
  • Mutable: Add, remove, or modify items after creation
  • Syntax: Items enclosed in square brackets [], separated by commas
  • Dynamic size: Grow or shrink as needed - no fixed size limit
# A list of computer science pioneers
pioneers = ["Grace Hopper", "Ada Lovelace", "Katherine Johnson"]

# Add new pioneer (append adds to the end)
pioneers.append("Dorothy Vaughan")

# Update an item (zero-based indexing)
pioneers[0] = "Rear Admiral Grace Hopper"

# Insert at specific position
pioneers.insert(1, "Hedy Lamarr")

print(pioneers)
# Output: ['Rear Admiral Grace Hopper', 'Hedy Lamarr', 'Ada Lovelace', 'Katherine Johnson', 'Dorothy Vaughan']

# Useful list methods
print(f"Number of pioneers: {len(pioneers)}")
print(f"Last pioneer: {pioneers[-1]}")

The tuple: an immutable sequence 🔒

  • A tuple is an ordered, unchangeable collection of items
  • Ordered: Like lists, items maintain their position
  • Immutable: Cannot add, remove, or change items after creation
  • Syntax: Items enclosed in parentheses (), though parentheses are often optional
  • Use cases: Coordinates, RGB colors, database records, function return values
  • Performance advantage: Slightly faster than lists for accessing elements
  • Memory efficient: Less overhead than lists due to immutability
# Coordinate pairs for a map application
start_location = (39.1031, -84.5120)  # Cincinnati coordinates
end_location = (39.7391, -104.9847)   # Denver coordinates

# RGB color values for UI design
brand_colors = (
    (0, 123, 191),    # UC Blue
    (224, 18, 34),    # UC Red
    (0, 0, 0)         # Black
)

# Tuple unpacking (very common pattern)
lat, lon = start_location
print(f"Starting at latitude {lat}, longitude {lon}")

# Multiple return values from functions
def get_name_and_grade():
    return "Alice", 95.5  # Returns a tuple

name, grade = get_name_and_grade()  # Tuple unpacking

# Single item tuple (note the comma!)
single_item = (42,)  # Without comma, it's just parentheses for grouping

Accessing items: indexing & slicing 🎯

  • Zero-based indexing: Python counts from 0, not 1 (like most programming languages)
  • Positive indices: my_list[0] first item, my_list[1] second item, etc.
  • Negative indices: my_list[-1] last item, my_list[-2] second-to-last, etc.
  • Slicing: Extract ranges of items with start:stop:step syntax
  • Slice behavior: [start:stop] includes start, excludes stop
  • Advanced slicing: Control step size and direction
# Working with sequence data
languages = ["Python", "JavaScript", "Java", "C++", "Rust", "Go", "TypeScript"]

# Basic indexing
print(f"First language: {languages[0]}")           # Python
print(f"Third language: {languages[2]}")           # Java
print(f"Last language: {languages[-1]}")           # TypeScript
print(f"Second to last: {languages[-2]}")          # Go

# Slicing examples
print(f"First three: {languages[0:3]}")            # ['Python', 'JavaScript', 'Java']
print(f"From index 2 on: {languages[2:]}")         # ['Java', 'C++', 'Rust', 'Go', 'TypeScript']
print(f"Last three: {languages[-3:]}")             # ['Rust', 'Go', 'TypeScript']
print(f"Every other: {languages[::2]}")            # ['Python', 'Java', 'Rust', 'TypeScript']
print(f"Reversed: {languages[::-1]}")              # Entire list in reverse order

# Slicing with coordinates
point = (10, 20, 30, 40)
x, y = point[0:2]  # Get first two values
print(f"2D coordinates: ({x}, {y})")

Common sequence operations 🛠️

  • Length: len(sequence) returns number of items
  • Membership: item in sequence checks if item exists
  • Concatenation: list1 + list2 combines sequences
  • Repetition: list * 3 repeats sequence multiple times
  • Sorting: sorted() returns new sorted sequence, list.sort() modifies in place
  • Finding: sequence.index(item) finds first occurrence
# Working with programming languages data
popular_languages = ["Python", "JavaScript", "TypeScript"]
systems_languages = ["C++", "Rust", "Go"]

# Length and membership
print(f"We track {len(popular_languages)} popular languages")
print(f"Is Python popular? {'Python' in popular_languages}")
print(f"Is Assembly popular? {'Assembly' in popular_languages}")

# Concatenation and repetition
all_languages = popular_languages + systems_languages
separator = ["-"] * 3  # Creates ["-", "-", "-"]

# Sorting (doesn't modify original)
sorted_languages = sorted(all_languages)
print(f"Alphabetical: {sorted_languages}")

# Finding items
try:
    rust_position = all_languages.index("Rust")
    print(f"Rust is at position {rust_position}")
except ValueError:
    print("Rust not found in list")

# List comprehensions (introduced earlier in this lecture!)
short_names = [lang for lang in all_languages if len(lang) <= 4]
print(f"Short language names: {short_names}")

Quick Introduction: List Comprehensions 📝

Python has a concise way to create lists called list comprehensions:

# Traditional approach with loops
numbers = []
for i in range(5):
    numbers.append(i * 2)
print(numbers)  # [0, 2, 4, 6, 8]

# List comprehension - more concise!
numbers = [i * 2 for i in range(5)]
print(numbers)  # [0, 2, 4, 6, 8]

# Pattern: [expression for item in iterable]
emails = [f"user_{i}@email.com" for i in range(3)]
print(emails)  # ['user_0@email.com', 'user_1@email.com', 'user_2@email.com']

Why learn this now? - Perfect for generating test data (which we’ll need for performance testing!) - More readable and Pythonic than traditional loops - We’ll use this pattern extensively in data processing

🔬 Live Exercise: AI-Assisted Performance Detective

Challenge: Help me optimize this slow code using GenAI tools!

# SLOW CODE - Let's time this together!
import time

# Simulating a large customer database
customers = [f"customer_{i}@email.com" for i in range(50000)]
vip_customers = [f"customer_{i}@email.com" for i in range(0, 50000, 100)]  # Every 100th customer

# Slow approach - checking VIP status
def is_vip_slow(email, vip_list):
    """Check if customer is VIP - using list membership"""
    return email in vip_list

# Let's time this approach
start_time = time.time()
vip_count = 0
for customer in customers[:1000]:  # Check first 1000 customers
    if is_vip_slow(customer, vip_customers):
        vip_count += 1
slow_time = time.time() - start_time

print(f"Slow approach: {slow_time:.4f} seconds, found {vip_count} VIPs")

Your Turn with GenAI:

  1. 🤖 Ask Claude/Gemini/ChatGPT: “How can I optimize this customer VIP lookup code?”
  2. 📊 Get timing comparison: What data structure should we use instead?
  3. 💡 Implement together: Let’s code the optimized version live!

🚀 Optimized Solution (with GenAI help!)

# FAST CODE - After consulting with GenAI
import time

# Convert VIP list to set for O(1) lookups
vip_customers_set = set(vip_customers)

def is_vip_fast(email, vip_set):
    """Check if customer is VIP - using set membership"""
    return email in vip_set

# Time the optimized approach
start_time = time.time()
vip_count = 0
for customer in customers[:1000]:
    if is_vip_fast(customer, vip_customers_set):
        vip_count += 1
fast_time = time.time() - start_time

print(f"Fast approach: {fast_time:.4f} seconds, found {vip_count} VIPs")
print(f"Speedup: {slow_time/fast_time:.1f}x faster!")

# GenAI bonus: What other optimizations could we make?
# - Pre-compute VIP status in a dictionary
# - Use database indexing for larger datasets
# - Cache frequently accessed results

Results Discussion: - 📈 Performance gain: Typically 10-100x faster for large datasets - 🧠 AI insight: GenAI tools excel at identifying these patterns - 💼 Real impact: This optimization matters in production systems - 🔄 Learning loop: AI suggestions → implementation → measurement → iteration

Session 2: Mappings & collections

The dictionary: key-value pairs 🗝️

  • A dictionary is a collection of key: value pairs
  • Lightning fast lookups: Optimized for retrieving values when you know the key
  • Real-world analogy: Like a phone book or actual dictionary - look up by name/word, get number/definition
  • Unique keys: Each key can appear only once, but values can be duplicated
  • Key types: Strings, numbers, tuples (immutable types only)
  • Syntax: Pairs enclosed in curly braces {key: value}
  • Python 3.7+: Dictionaries maintain insertion order (ordered)
# User profile for a social media application
user_profile = {
    "username": "ada_lovelace",
    "first_name": "Ada",
    "last_name": "Lovelace",
    "birth_year": 1815,
    "occupation": "Mathematician",
    "famous_for": "First computer programmer",
    "followers": 125000,
    "verified": True,
    "programming_languages": ["Analytical Engine", "Mathematics"]
}

# Accessing values (multiple ways)
print(f"User: {user_profile['username']}")
print(f"Born: {user_profile.get('birth_year', 'Unknown')}")  # Safe access
print(f"Followers: {user_profile['followers']:,}")

# Adding new information
user_profile["location"] = "London, England"
user_profile["last_active"] = "1852-11-27"

# Updating existing values
user_profile["followers"] += 1000  # New followers!

print(f"Updated profile: {user_profile['username']} now has {user_profile['followers']:,} followers")

🛠️ Live Exercise: AI-Powered Code Refactoring

Challenge: This code works, but it’s messy and slow. Let’s fix it with GenAI!

# MESSY CODE - Student registration system (needs help!)
def process_student_data():
    # Raw data from registration form
    students = []
    students.append(["Alice Johnson", "alice@uc.edu", "IS4010", 95])
    students.append(["Bob Smith", "bob@uc.edu", "IS4010", 87])
    students.append(["Alice Johnson", "alice@uc.edu", "CS2071", 92])  # Duplicate student!
    students.append(["Carol Davis", "carol@uc.edu", "IS4010", 78])
    students.append(["Bob Smith", "bob@uc.edu", "MATH2045", 85])    # Another duplicate!

    # Find students in IS4010 (slow way)
    is4010_students = []
    for student in students:
        if student[2] == "IS4010":  # Magic number - what does index 2 mean?
            is4010_students.append(student)

    # Calculate average grade (also slow)
    total = 0
    count = 0
    for student in is4010_students:
        total += student[3]  # Another magic number!
        count += 1
    average = total / count if count > 0 else 0

    return is4010_students, average

# Call the function
students, avg = process_student_data()
print(f"IS4010 students: {len(students)}, Average: {avg:.1f}")

GenAI Prompt Challenge: 🤖 “Analyze this Python code and suggest improvements for data structure choice, readability, and performance. Focus on eliminating magic numbers and handling duplicate data.”

🎯 Refactored Solution (AI-Assisted)

# CLEAN CODE - After GenAI consultation
from collections import defaultdict

def process_student_data_improved():
    # Better data structure: list of dictionaries
    raw_students = [
        {"name": "Alice Johnson", "email": "alice@uc.edu", "course": "IS4010", "grade": 95},
        {"name": "Bob Smith", "email": "bob@uc.edu", "course": "IS4010", "grade": 87},
        {"name": "Alice Johnson", "email": "alice@uc.edu", "course": "CS2071", "grade": 92},
        {"name": "Carol Davis", "email": "carol@uc.edu", "course": "IS4010", "grade": 78},
        {"name": "Bob Smith", "email": "bob@uc.edu", "course": "MATH2045", "grade": 85},
    ]

    # Group students by course (using defaultdict - AI suggestion!)
    students_by_course = defaultdict(list)
    unique_enrollments = set()  # Handle duplicates with sets!

    for student in raw_students:
        # Create unique key for duplicate detection
        key = (student["name"], student["email"], student["course"])
        if key not in unique_enrollments:
            unique_enrollments.add(key)
            students_by_course[student["course"]].append(student)

    # Calculate statistics for IS4010
    is4010_students = students_by_course["IS4010"]
    average_grade = sum(s["grade"] for s in is4010_students) / len(is4010_students)

    return is4010_students, average_grade

# Much cleaner!
students, avg = process_student_data_improved()
print(f"IS4010 students: {len(students)}, Average: {avg:.1f}")

What GenAI helped us achieve: - 🗂️ Better structure: Dictionaries instead of magic indices - 🚫 Duplicate handling: Sets for unique enrollment tracking - 📊 Efficient grouping: defaultdict for course organization - 📖 Readable code: Self-documenting variable names - ⚡ Performance: Faster lookups and processing

The set: unique items only 🎯

  • A set is an unordered collection of unique items
  • No duplicates allowed: Automatically removes duplicate entries
  • Lightning fast membership testing: item in set is extremely efficient
  • Mathematical operations: Union, intersection, difference operations
  • Syntax: Items in curly braces {} or use set() constructor
  • Mutable: Add and remove items after creation
  • Use cases: Remove duplicates, track unique visitors, mathematical set operations
# Website analytics: tracking unique visitors
daily_visitors = ["alice", "bob", "charlie", "alice", "david", "alice", "eve"]
monthly_visitors = ["alice", "frank", "george", "bob", "helen"]

# Remove duplicates and get unique visitors
unique_daily = set(daily_visitors)
unique_monthly = set(monthly_visitors)

print(f"Unique daily visitors: {len(unique_daily)}")  # 5 unique visitors
print(f"Daily visitors: {unique_daily}")

# Set operations (mathematical set theory)
all_visitors = unique_daily | unique_monthly        # Union: all unique visitors
returning_visitors = unique_daily & unique_monthly  # Intersection: visitors in both
new_visitors = unique_monthly - unique_daily        # Difference: monthly-only visitors

print(f"All unique visitors: {all_visitors}")
print(f"Returning visitors: {returning_visitors}")
print(f"New visitors this month: {new_visitors}")

# Adding and removing items
unique_daily.add("ivan")      # Add new visitor
unique_daily.discard("bob")   # Remove visitor (safe - no error if not found)

# Create set from list to remove duplicates
languages = ["Python", "Java", "Python", "C++", "Java", "Rust"]
unique_languages = set(languages)
print(f"Unique programming languages: {unique_languages}")

# Fast membership testing
if "Python" in unique_languages:
    print("Python is one of our tracked languages")

📡 Live Exercise: AI-Powered API Data Design

Scenario: You’re building a student dashboard that needs to process this GitHub API-style data efficiently. Let’s design the optimal data structures with GenAI!

# Realistic API response from student management system
api_response = {
    "students": [
        {
            "id": 12345,
            "username": "alice_j",
            "profile": {
                "name": "Alice Johnson",
                "email": "alice@uc.edu",
                "year": "junior",
                "major": "Information Systems"
            },
            "courses": [
                {"code": "IS4010", "grade": 95, "credits": 3},
                {"code": "CS2071", "grade": 88, "credits": 4},
                {"code": "MATH2045", "grade": 92, "credits": 3}
            ],
            "activities": ["ACM", "IS Club", "Dean's List"],
            "permissions": ["student", "tutor", "lab_assistant"]
        },
        {
            "id": 67890,
            "username": "bob_s",
            "profile": {
                "name": "Bob Smith",
                "email": "bob@uc.edu",
                "year": "senior",
                "major": "Computer Science"
            },
            "courses": [
                {"code": "IS4010", "grade": 87, "credits": 3},
                {"code": "CS5168", "grade": 91, "credits": 3}
            ],
            "activities": ["IEEE", "Hackathon Club"],
            "permissions": ["student", "teaching_assistant"]
        }
    ],
    "metadata": {
        "total_students": 2,
        "timestamp": "2024-03-15T10:30:00Z",
        "api_version": "v2.1"
    }
}

GenAI Design Challenge: 🤖 “I need to design data structures for a student dashboard with these requirements: 1. Fast lookup of students by username 2. Quick filtering by course enrollment 3. Efficient permission checking 4. Find students with specific activities What data structures should I use and why?”

🏗️ Optimized Data Structures (AI-Designed)

# After consulting GenAI, let's build efficient access structures
def process_student_api_data(api_response):
    students_data = api_response["students"]

    # 1. Fast username lookup - Dictionary
    students_by_username = {}

    # 2. Course enrollment tracking - Dictionary of sets
    students_by_course = {}

    # 3. Activity membership - Dictionary of sets
    students_by_activity = {}

    # 4. Permission checking - Dictionary of sets
    students_by_permission = {}

    # Process each student (GenAI suggested this structure)
    for student in students_data:
        username = student["username"]
        students_by_username[username] = student

        # Index by courses
        for course in student["courses"]:
            course_code = course["code"]
            if course_code not in students_by_course:
                students_by_course[course_code] = set()
            students_by_course[course_code].add(username)

        # Index by activities
        for activity in student["activities"]:
            if activity not in students_by_activity:
                students_by_activity[activity] = set()
            students_by_activity[activity].add(username)

        # Index by permissions
        for permission in student["permissions"]:
            if permission not in students_by_permission:
                students_by_permission[permission] = set()
            students_by_permission[permission].add(username)

    return {
        "by_username": students_by_username,
        "by_course": students_by_course,
        "by_activity": students_by_activity,
        "by_permission": students_by_permission
    }

# Build the optimized structures
optimized_data = process_student_api_data(api_response)

# Now queries are lightning fast!
print(f"Students in IS4010: {optimized_data['by_course']['IS4010']}")
print(f"ACM members: {optimized_data['by_activity']['ACM']}")
print(f"Teaching assistants: {optimized_data['by_permission']['teaching_assistant']}")

# Complex query: IS4010 students who are also ACM members
is4010_students = optimized_data['by_course']['IS4010']
acm_members = optimized_data['by_activity']['ACM']
is4010_acm = is4010_students & acm_members  # Set intersection!
print(f"IS4010 + ACM: {is4010_acm}")

GenAI Success Metrics: - 🚀 O(1) lookups instead of O(n) searches - 🔍 Set operations for complex filtering - 📊 Multiple access patterns supported efficiently - 💾 Memory trade-off for speed (typical AI recommendation)

Choosing the right data structure 🎯

  • Use a list when you need an ordered, changeable collection
    • Shopping carts, todo items, search results, user feeds
  • Use a tuple when you need ordered, unchangeable data
    • Coordinates, RGB colors, database records, configuration settings
  • Use a dict when you need fast key-based lookups
    • User profiles, settings, caches, JSON data, databases
  • Use a set when you need unique items and fast membership testing
    • Unique visitors, tags, permissions, removing duplicates
# Real-world decision making examples

# E-commerce shopping cart (order matters, items change)
shopping_cart = ["laptop", "mouse", "keyboard", "monitor"]

# Product coordinates (fixed position data)
warehouse_location = (39.1031, -84.5120, 1)  # lat, lon, floor

# User account information (key-value lookups)
user_account = {
    "user_id": 12345,
    "email": "student@uc.edu",
    "subscription": "premium",
    "last_login": "2024-03-15"
}

# User permissions (unique items, fast checking)
user_permissions = {"read", "write", "delete", "admin"}

# Decision logic in action
def can_user_edit(user_permissions, required_permission):
    """Check if user has required permission - O(1) lookup!"""
    return required_permission in user_permissions

# Performance comparison
import time

# Checking membership in list vs set
large_list = list(range(10000))
large_set = set(range(10000))

# This is slow for lists (O(n))
# result = 9999 in large_list

# This is fast for sets (O(1))
# result = 9999 in large_set

🎮 Interactive Challenge: AI Pair Programming Session

Mini-Hackathon Time! Work with GenAI to solve these real-world data structure problems in 15 minutes.

Challenge 1: Social Media Analytics 📱

# Raw social media engagement data
posts = [
    {"id": 1, "user": "alice", "likes": ["bob", "carol", "dave"], "tags": ["python", "coding"]},
    {"id": 2, "user": "bob", "likes": ["alice", "carol"], "tags": ["ai", "python"]},
    {"id": 3, "user": "carol", "likes": ["alice"], "tags": ["datascience", "python"]},
]

# Goals: Find most popular user, most used tags, mutual connections
# Ask your AI: "What data structures optimize social media analytics?"

Challenge 2: E-commerce Inventory 🛒

# Product inventory system
inventory = [
    {"sku": "LAP001", "name": "Laptop", "price": 999, "categories": ["electronics", "computers"], "stock": 5},
    {"sku": "MOU002", "name": "Mouse", "price": 25, "categories": ["electronics", "accessories"], "stock": 50},
    {"sku": "KEY003", "name": "Keyboard", "price": 75, "categories": ["electronics", "accessories"], "stock": 30},
]

# Goals: Fast product lookup, category filtering, inventory alerts
# Ask your AI: "How do I optimize product search and filtering?"

Challenge 3: University Course Scheduler 📚

# Course scheduling conflicts
courses = [
    {"code": "IS4010", "time": "MWF 10:00", "room": "Lindner 100", "students": ["alice", "bob"]},
    {"code": "CS2071", "time": "TTh 14:00", "room": "Lindner 200", "students": ["alice", "carol"]},
    {"code": "MATH2045", "time": "MWF 10:00", "room": "Swift 300", "students": ["bob", "dave"]},
]

# Goals: Detect time conflicts, room utilization, student schedules
# Ask your AI: "What's the best way to handle scheduling conflicts?"

Your Mission: 1. 🤖 Pick ONE challenge that interests you most 2. 💬 Consult your GenAI tool for data structure recommendations 3. 🚀 Be ready to share your AI’s best suggestion with the class 4. 🏆 Bonus: Implement a quick prototype if you’re feeling ambitious!

🎯 Challenge Solutions Showcase

Let’s see what GenAI recommended for each challenge!

Social Media Analytics - Expected AI Solutions:

  • User popularity: Dictionary counting mentions across all likes lists
  • Tag frequency: Counter or defaultdict for tag occurrence tracking
  • Mutual connections: Set intersections for finding common connections
  • Performance: Sets for fast membership checking in large user bases

E-commerce Inventory - Expected AI Solutions:

  • Product lookup: Dictionary with SKU as key for O(1) access
  • Category filtering: Dictionary mapping categories to product sets
  • Inventory alerts: Separate low-stock tracking with threshold checking
  • Price queries: Sorted structures or separate indexing for range queries

Course Scheduling - Expected AI Solutions:

  • Time conflict detection: Dictionary grouping courses by time slots
  • Room utilization: Dictionary tracking room→courses mapping
  • Student schedules: Dictionary with student→courses for individual schedules
  • Conflict resolution: Set operations for finding overlapping students/times

Key AI Insights: - 🧠 Pattern recognition: AI excels at identifying common optimization patterns - 🔄 Trade-off analysis: AI often suggests multiple approaches with pros/cons - 📈 Scalability thinking: AI frequently considers performance at scale - 🚨 Edge case awareness: Good AI prompts reveal boundary conditions

Putting It All Together: Lab 04 🚀

You’ve experienced the power of AI-assisted data structure optimization. Now apply those skills!

Lab 04 Challenge:

  • AI-guided design: Use GenAI to recommend optimal data structures for three real scenarios
  • Performance validation: Test your AI’s recommendations with actual code
  • Professional documentation: Record your AI interactions and reasoning
  • Automated verification: GitHub Actions will test your implementations
  • Real-world application: Problems mirror the challenges we solved today

From Today’s Exercises to Your Lab: - 🔬 Performance Detective → Lab’s efficiency requirements - 🛠️ Code Refactoring → Lab’s clean implementation standards - 📡 API Data Design → Lab’s complex data structure challenges - 🎮 Problem Solving → Lab’s critical thinking with AI assistance

Your AI Pair Programming Toolkit: - 💬 Effective prompts: “Analyze this code for performance bottlenecks…” - 📊 Trade-off analysis: “Compare list vs set for this use case…” - 🚀 Optimization requests: “Suggest the fastest data structure for…” - 🔍 Code review: “What improvements can you suggest for…”

Navigate to: labs/lab04/README.md

Success Strategy: 1. 🤖 Collaborate with AI for data structure recommendations 2. 💡 Test and validate AI suggestions with real code 3. 📝 Document your reasoning for future reference 4. ✅ Verify with automation using GitHub Actions