Week 4: Python Data Structures
[]
, separated by commas# A list of computer science pioneers
pioneers = ["Grace Hopper", "Ada Lovelace", "Katherine Johnson"]
# Add new pioneer (append adds to the end)
pioneers.append("Dorothy Vaughan")
# Update an item (zero-based indexing)
pioneers[0] = "Rear Admiral Grace Hopper"
# Insert at specific position
pioneers.insert(1, "Hedy Lamarr")
print(pioneers)
# Output: ['Rear Admiral Grace Hopper', 'Hedy Lamarr', 'Ada Lovelace', 'Katherine Johnson', 'Dorothy Vaughan']
# Useful list methods
print(f"Number of pioneers: {len(pioneers)}")
print(f"Last pioneer: {pioneers[-1]}")
()
, though parentheses are often optional# Coordinate pairs for a map application
start_location = (39.1031, -84.5120) # Cincinnati coordinates
end_location = (39.7391, -104.9847) # Denver coordinates
# RGB color values for UI design
brand_colors = (
(0, 123, 191), # UC Blue
(224, 18, 34), # UC Red
(0, 0, 0) # Black
)
# Tuple unpacking (very common pattern)
lat, lon = start_location
print(f"Starting at latitude {lat}, longitude {lon}")
# Multiple return values from functions
def get_name_and_grade():
return "Alice", 95.5 # Returns a tuple
name, grade = get_name_and_grade() # Tuple unpacking
# Single item tuple (note the comma!)
single_item = (42,) # Without comma, it's just parentheses for grouping
my_list[0]
first item, my_list[1]
second item, etc.my_list[-1]
last item, my_list[-2]
second-to-last, etc.start:stop:step
syntax[start:stop]
includes start, excludes stop# Working with sequence data
languages = ["Python", "JavaScript", "Java", "C++", "Rust", "Go", "TypeScript"]
# Basic indexing
print(f"First language: {languages[0]}") # Python
print(f"Third language: {languages[2]}") # Java
print(f"Last language: {languages[-1]}") # TypeScript
print(f"Second to last: {languages[-2]}") # Go
# Slicing examples
print(f"First three: {languages[0:3]}") # ['Python', 'JavaScript', 'Java']
print(f"From index 2 on: {languages[2:]}") # ['Java', 'C++', 'Rust', 'Go', 'TypeScript']
print(f"Last three: {languages[-3:]}") # ['Rust', 'Go', 'TypeScript']
print(f"Every other: {languages[::2]}") # ['Python', 'Java', 'Rust', 'TypeScript']
print(f"Reversed: {languages[::-1]}") # Entire list in reverse order
# Slicing with coordinates
point = (10, 20, 30, 40)
x, y = point[0:2] # Get first two values
print(f"2D coordinates: ({x}, {y})")
len(sequence)
returns number of itemsitem in sequence
checks if item existslist1 + list2
combines sequenceslist * 3
repeats sequence multiple timessorted()
returns new sorted sequence, list.sort()
modifies in placesequence.index(item)
finds first occurrence# Working with programming languages data
popular_languages = ["Python", "JavaScript", "TypeScript"]
systems_languages = ["C++", "Rust", "Go"]
# Length and membership
print(f"We track {len(popular_languages)} popular languages")
print(f"Is Python popular? {'Python' in popular_languages}")
print(f"Is Assembly popular? {'Assembly' in popular_languages}")
# Concatenation and repetition
all_languages = popular_languages + systems_languages
separator = ["-"] * 3 # Creates ["-", "-", "-"]
# Sorting (doesn't modify original)
sorted_languages = sorted(all_languages)
print(f"Alphabetical: {sorted_languages}")
# Finding items
try:
rust_position = all_languages.index("Rust")
print(f"Rust is at position {rust_position}")
except ValueError:
print("Rust not found in list")
# List comprehensions (introduced earlier in this lecture!)
short_names = [lang for lang in all_languages if len(lang) <= 4]
print(f"Short language names: {short_names}")
Python has a concise way to create lists called list comprehensions:
# Traditional approach with loops
numbers = []
for i in range(5):
numbers.append(i * 2)
print(numbers) # [0, 2, 4, 6, 8]
# List comprehension - more concise!
numbers = [i * 2 for i in range(5)]
print(numbers) # [0, 2, 4, 6, 8]
# Pattern: [expression for item in iterable]
emails = [f"user_{i}@email.com" for i in range(3)]
print(emails) # ['user_0@email.com', 'user_1@email.com', 'user_2@email.com']
Why learn this now? - Perfect for generating test data (which we’ll need for performance testing!) - More readable and Pythonic than traditional loops - We’ll use this pattern extensively in data processing
Challenge: Help me optimize this slow code using GenAI tools!
# SLOW CODE - Let's time this together!
import time
# Simulating a large customer database
customers = [f"customer_{i}@email.com" for i in range(50000)]
vip_customers = [f"customer_{i}@email.com" for i in range(0, 50000, 100)] # Every 100th customer
# Slow approach - checking VIP status
def is_vip_slow(email, vip_list):
"""Check if customer is VIP - using list membership"""
return email in vip_list
# Let's time this approach
start_time = time.time()
vip_count = 0
for customer in customers[:1000]: # Check first 1000 customers
if is_vip_slow(customer, vip_customers):
vip_count += 1
slow_time = time.time() - start_time
print(f"Slow approach: {slow_time:.4f} seconds, found {vip_count} VIPs")
Your Turn with GenAI:
# FAST CODE - After consulting with GenAI
import time
# Convert VIP list to set for O(1) lookups
vip_customers_set = set(vip_customers)
def is_vip_fast(email, vip_set):
"""Check if customer is VIP - using set membership"""
return email in vip_set
# Time the optimized approach
start_time = time.time()
vip_count = 0
for customer in customers[:1000]:
if is_vip_fast(customer, vip_customers_set):
vip_count += 1
fast_time = time.time() - start_time
print(f"Fast approach: {fast_time:.4f} seconds, found {vip_count} VIPs")
print(f"Speedup: {slow_time/fast_time:.1f}x faster!")
# GenAI bonus: What other optimizations could we make?
# - Pre-compute VIP status in a dictionary
# - Use database indexing for larger datasets
# - Cache frequently accessed results
Results Discussion: - 📈 Performance gain: Typically 10-100x faster for large datasets - 🧠 AI insight: GenAI tools excel at identifying these patterns - 💼 Real impact: This optimization matters in production systems - 🔄 Learning loop: AI suggestions → implementation → measurement → iteration
key: value
pairs{key: value}
# User profile for a social media application
user_profile = {
"username": "ada_lovelace",
"first_name": "Ada",
"last_name": "Lovelace",
"birth_year": 1815,
"occupation": "Mathematician",
"famous_for": "First computer programmer",
"followers": 125000,
"verified": True,
"programming_languages": ["Analytical Engine", "Mathematics"]
}
# Accessing values (multiple ways)
print(f"User: {user_profile['username']}")
print(f"Born: {user_profile.get('birth_year', 'Unknown')}") # Safe access
print(f"Followers: {user_profile['followers']:,}")
# Adding new information
user_profile["location"] = "London, England"
user_profile["last_active"] = "1852-11-27"
# Updating existing values
user_profile["followers"] += 1000 # New followers!
print(f"Updated profile: {user_profile['username']} now has {user_profile['followers']:,} followers")
Challenge: This code works, but it’s messy and slow. Let’s fix it with GenAI!
# MESSY CODE - Student registration system (needs help!)
def process_student_data():
# Raw data from registration form
students = []
students.append(["Alice Johnson", "alice@uc.edu", "IS4010", 95])
students.append(["Bob Smith", "bob@uc.edu", "IS4010", 87])
students.append(["Alice Johnson", "alice@uc.edu", "CS2071", 92]) # Duplicate student!
students.append(["Carol Davis", "carol@uc.edu", "IS4010", 78])
students.append(["Bob Smith", "bob@uc.edu", "MATH2045", 85]) # Another duplicate!
# Find students in IS4010 (slow way)
is4010_students = []
for student in students:
if student[2] == "IS4010": # Magic number - what does index 2 mean?
is4010_students.append(student)
# Calculate average grade (also slow)
total = 0
count = 0
for student in is4010_students:
total += student[3] # Another magic number!
count += 1
average = total / count if count > 0 else 0
return is4010_students, average
# Call the function
students, avg = process_student_data()
print(f"IS4010 students: {len(students)}, Average: {avg:.1f}")
GenAI Prompt Challenge: 🤖 “Analyze this Python code and suggest improvements for data structure choice, readability, and performance. Focus on eliminating magic numbers and handling duplicate data.”
# CLEAN CODE - After GenAI consultation
from collections import defaultdict
def process_student_data_improved():
# Better data structure: list of dictionaries
raw_students = [
{"name": "Alice Johnson", "email": "alice@uc.edu", "course": "IS4010", "grade": 95},
{"name": "Bob Smith", "email": "bob@uc.edu", "course": "IS4010", "grade": 87},
{"name": "Alice Johnson", "email": "alice@uc.edu", "course": "CS2071", "grade": 92},
{"name": "Carol Davis", "email": "carol@uc.edu", "course": "IS4010", "grade": 78},
{"name": "Bob Smith", "email": "bob@uc.edu", "course": "MATH2045", "grade": 85},
]
# Group students by course (using defaultdict - AI suggestion!)
students_by_course = defaultdict(list)
unique_enrollments = set() # Handle duplicates with sets!
for student in raw_students:
# Create unique key for duplicate detection
key = (student["name"], student["email"], student["course"])
if key not in unique_enrollments:
unique_enrollments.add(key)
students_by_course[student["course"]].append(student)
# Calculate statistics for IS4010
is4010_students = students_by_course["IS4010"]
average_grade = sum(s["grade"] for s in is4010_students) / len(is4010_students)
return is4010_students, average_grade
# Much cleaner!
students, avg = process_student_data_improved()
print(f"IS4010 students: {len(students)}, Average: {avg:.1f}")
What GenAI helped us achieve: - 🗂️ Better structure: Dictionaries instead of magic indices - 🚫 Duplicate handling: Sets for unique enrollment tracking - 📊 Efficient grouping: defaultdict for course organization - 📖 Readable code: Self-documenting variable names - ⚡ Performance: Faster lookups and processing
item in set
is extremely efficient{}
or use set()
constructor# Website analytics: tracking unique visitors
daily_visitors = ["alice", "bob", "charlie", "alice", "david", "alice", "eve"]
monthly_visitors = ["alice", "frank", "george", "bob", "helen"]
# Remove duplicates and get unique visitors
unique_daily = set(daily_visitors)
unique_monthly = set(monthly_visitors)
print(f"Unique daily visitors: {len(unique_daily)}") # 5 unique visitors
print(f"Daily visitors: {unique_daily}")
# Set operations (mathematical set theory)
all_visitors = unique_daily | unique_monthly # Union: all unique visitors
returning_visitors = unique_daily & unique_monthly # Intersection: visitors in both
new_visitors = unique_monthly - unique_daily # Difference: monthly-only visitors
print(f"All unique visitors: {all_visitors}")
print(f"Returning visitors: {returning_visitors}")
print(f"New visitors this month: {new_visitors}")
# Adding and removing items
unique_daily.add("ivan") # Add new visitor
unique_daily.discard("bob") # Remove visitor (safe - no error if not found)
# Create set from list to remove duplicates
languages = ["Python", "Java", "Python", "C++", "Java", "Rust"]
unique_languages = set(languages)
print(f"Unique programming languages: {unique_languages}")
# Fast membership testing
if "Python" in unique_languages:
print("Python is one of our tracked languages")
Scenario: You’re building a student dashboard that needs to process this GitHub API-style data efficiently. Let’s design the optimal data structures with GenAI!
# Realistic API response from student management system
api_response = {
"students": [
{
"id": 12345,
"username": "alice_j",
"profile": {
"name": "Alice Johnson",
"email": "alice@uc.edu",
"year": "junior",
"major": "Information Systems"
},
"courses": [
{"code": "IS4010", "grade": 95, "credits": 3},
{"code": "CS2071", "grade": 88, "credits": 4},
{"code": "MATH2045", "grade": 92, "credits": 3}
],
"activities": ["ACM", "IS Club", "Dean's List"],
"permissions": ["student", "tutor", "lab_assistant"]
},
{
"id": 67890,
"username": "bob_s",
"profile": {
"name": "Bob Smith",
"email": "bob@uc.edu",
"year": "senior",
"major": "Computer Science"
},
"courses": [
{"code": "IS4010", "grade": 87, "credits": 3},
{"code": "CS5168", "grade": 91, "credits": 3}
],
"activities": ["IEEE", "Hackathon Club"],
"permissions": ["student", "teaching_assistant"]
}
],
"metadata": {
"total_students": 2,
"timestamp": "2024-03-15T10:30:00Z",
"api_version": "v2.1"
}
}
GenAI Design Challenge: 🤖 “I need to design data structures for a student dashboard with these requirements: 1. Fast lookup of students by username 2. Quick filtering by course enrollment 3. Efficient permission checking 4. Find students with specific activities What data structures should I use and why?”
# After consulting GenAI, let's build efficient access structures
def process_student_api_data(api_response):
students_data = api_response["students"]
# 1. Fast username lookup - Dictionary
students_by_username = {}
# 2. Course enrollment tracking - Dictionary of sets
students_by_course = {}
# 3. Activity membership - Dictionary of sets
students_by_activity = {}
# 4. Permission checking - Dictionary of sets
students_by_permission = {}
# Process each student (GenAI suggested this structure)
for student in students_data:
username = student["username"]
students_by_username[username] = student
# Index by courses
for course in student["courses"]:
course_code = course["code"]
if course_code not in students_by_course:
students_by_course[course_code] = set()
students_by_course[course_code].add(username)
# Index by activities
for activity in student["activities"]:
if activity not in students_by_activity:
students_by_activity[activity] = set()
students_by_activity[activity].add(username)
# Index by permissions
for permission in student["permissions"]:
if permission not in students_by_permission:
students_by_permission[permission] = set()
students_by_permission[permission].add(username)
return {
"by_username": students_by_username,
"by_course": students_by_course,
"by_activity": students_by_activity,
"by_permission": students_by_permission
}
# Build the optimized structures
optimized_data = process_student_api_data(api_response)
# Now queries are lightning fast!
print(f"Students in IS4010: {optimized_data['by_course']['IS4010']}")
print(f"ACM members: {optimized_data['by_activity']['ACM']}")
print(f"Teaching assistants: {optimized_data['by_permission']['teaching_assistant']}")
# Complex query: IS4010 students who are also ACM members
is4010_students = optimized_data['by_course']['IS4010']
acm_members = optimized_data['by_activity']['ACM']
is4010_acm = is4010_students & acm_members # Set intersection!
print(f"IS4010 + ACM: {is4010_acm}")
GenAI Success Metrics: - 🚀 O(1) lookups instead of O(n) searches - 🔍 Set operations for complex filtering - 📊 Multiple access patterns supported efficiently - 💾 Memory trade-off for speed (typical AI recommendation)
list
when you need an ordered, changeable collection
tuple
when you need ordered, unchangeable data
dict
when you need fast key-based lookups
set
when you need unique items and fast membership testing
# Real-world decision making examples
# E-commerce shopping cart (order matters, items change)
shopping_cart = ["laptop", "mouse", "keyboard", "monitor"]
# Product coordinates (fixed position data)
warehouse_location = (39.1031, -84.5120, 1) # lat, lon, floor
# User account information (key-value lookups)
user_account = {
"user_id": 12345,
"email": "student@uc.edu",
"subscription": "premium",
"last_login": "2024-03-15"
}
# User permissions (unique items, fast checking)
user_permissions = {"read", "write", "delete", "admin"}
# Decision logic in action
def can_user_edit(user_permissions, required_permission):
"""Check if user has required permission - O(1) lookup!"""
return required_permission in user_permissions
# Performance comparison
import time
# Checking membership in list vs set
large_list = list(range(10000))
large_set = set(range(10000))
# This is slow for lists (O(n))
# result = 9999 in large_list
# This is fast for sets (O(1))
# result = 9999 in large_set
Mini-Hackathon Time! Work with GenAI to solve these real-world data structure problems in 15 minutes.
# Raw social media engagement data
posts = [
{"id": 1, "user": "alice", "likes": ["bob", "carol", "dave"], "tags": ["python", "coding"]},
{"id": 2, "user": "bob", "likes": ["alice", "carol"], "tags": ["ai", "python"]},
{"id": 3, "user": "carol", "likes": ["alice"], "tags": ["datascience", "python"]},
]
# Goals: Find most popular user, most used tags, mutual connections
# Ask your AI: "What data structures optimize social media analytics?"
# Product inventory system
inventory = [
{"sku": "LAP001", "name": "Laptop", "price": 999, "categories": ["electronics", "computers"], "stock": 5},
{"sku": "MOU002", "name": "Mouse", "price": 25, "categories": ["electronics", "accessories"], "stock": 50},
{"sku": "KEY003", "name": "Keyboard", "price": 75, "categories": ["electronics", "accessories"], "stock": 30},
]
# Goals: Fast product lookup, category filtering, inventory alerts
# Ask your AI: "How do I optimize product search and filtering?"
# Course scheduling conflicts
courses = [
{"code": "IS4010", "time": "MWF 10:00", "room": "Lindner 100", "students": ["alice", "bob"]},
{"code": "CS2071", "time": "TTh 14:00", "room": "Lindner 200", "students": ["alice", "carol"]},
{"code": "MATH2045", "time": "MWF 10:00", "room": "Swift 300", "students": ["bob", "dave"]},
]
# Goals: Detect time conflicts, room utilization, student schedules
# Ask your AI: "What's the best way to handle scheduling conflicts?"
Your Mission: 1. 🤖 Pick ONE challenge that interests you most 2. 💬 Consult your GenAI tool for data structure recommendations 3. 🚀 Be ready to share your AI’s best suggestion with the class 4. 🏆 Bonus: Implement a quick prototype if you’re feeling ambitious!
Let’s see what GenAI recommended for each challenge!
Key AI Insights: - 🧠 Pattern recognition: AI excels at identifying common optimization patterns - 🔄 Trade-off analysis: AI often suggests multiple approaches with pros/cons - 📈 Scalability thinking: AI frequently considers performance at scale - 🚨 Edge case awareness: Good AI prompts reveal boundary conditions
You’ve experienced the power of AI-assisted data structure optimization. Now apply those skills!
Lab 04 Challenge:
From Today’s Exercises to Your Lab: - 🔬 Performance Detective → Lab’s efficiency requirements - 🛠️ Code Refactoring → Lab’s clean implementation standards - 📡 API Data Design → Lab’s complex data structure challenges - 🎮 Problem Solving → Lab’s critical thinking with AI assistance
Your AI Pair Programming Toolkit: - 💬 Effective prompts: “Analyze this code for performance bottlenecks…” - 📊 Trade-off analysis: “Compare list vs set for this use case…” - 🚀 Optimization requests: “Suggest the fastest data structure for…” - 🔍 Code review: “What improvements can you suggest for…”
Navigate to: labs/lab04/README.md
Success Strategy: 1. 🤖 Collaborate with AI for data structure recommendations 2. 💡 Test and validate AI suggestions with real code 3. 📝 Document your reasoning for future reference 4. ✅ Verify with automation using GitHub Actions
IS4010: App Development with AI