pa02 : Application of data structures to a movie dataset

num ready? description assigned due
pa02 true Application of data structures to a movie dataset Thu 02/23 09:00AM Tue 03/14 11:59PM

Collaboration policy

This assignment must be completed solo.

Introduction

In this assignment, you will

Goal of this assignment

Note: The goals above are very important in industry and in interviews!

Getting Started

Starter code

GitHub repository

Refer to lab01 for instructions on how to set up a GitHub repository and pull the starter code for this lab. Obtain the starter code from this repo: https://github.com/ucsb-cs24-s22/STARTER-pa02

Contents

You are given 6 datasets in CSV files. Each CSV file:

Example of alphabetical order

ace ventura: when nature calls,6.1
balto,7.1
casino,7.8
cutthroat island,5.7
dracula: dead and loving it,5.7
father of the bride part ii,5.7
four rooms,6.5
goldeneye,6.6
grumpier old men,6.5
heat,7.7
jumanji,6.9
money train,5.4
nixon,7.1
sabrina,6.2
sense and sensibility,7.2
sudden death,5.5
the american president,6.5
tom and huck,5.4
toy story,7.7
waiting to exhale,6.1

Example of random order

toy story,7.7
jumanji,6.9
grumpier old men,6.5
waiting to exhale,6.1
father of the bride part ii,5.7
heat,7.7
sabrina,6.2
tom and huck,5.4
sudden death,5.5
goldeneye,6.6
the american president,6.5
dracula: dead and loving it,5.7
balto,7.1
nixon,7.1
cutthroat island,5.7
casino,7.8
sense and sensibility,7.2
four rooms,6.5
ace ventura: when nature calls,6.1
money train,5.4

Files to complete

Problem statement

This assignment has 3 parts. You should separate your algorithm for part 1 from your algorithm for part 2 because in part 3, you need to analyze the running time complexity of your solution to part 2 only.

Command-line arguments

./runMovies filename prefix_1 prefix_2 prefix_3 ... prefix_m

Example of a prefix with whitespaces

./runMovies input_1000_random.csv "the american" ab

Part 1: Print all movie names and ratings

Your program should print out all the movies in alphabetical order of movie name. You may use only one data structure of your choice to store the data from the CSV file. When testing this part, do not give any prefixes as command-line arguments!

Example with no prefixes

./runMovies input_20_random.csv 

This should produce the output

ace ventura: when nature calls, 6.1
balto, 7.1
casino, 7.8
cutthroat island, 5.7
dracula: dead and loving it, 5.7
father of the bride part ii, 5.7
four rooms, 6.5
goldeneye, 6.6
grumpier old men, 6.5
heat, 7.7
jumanji, 6.9
money train, 5.4
nixon, 7.1
sabrina, 6.2
sense and sensibility, 7.2
sudden death, 5.5
the american president, 6.5
tom and huck, 5.4
toy story, 7.7
waiting to exhale, 6.1

Part 2: Find movies based on prefixes

If one or more prefixes are given as command-line arguments, then for each prefix, your program should:

You may use additional data structures from the C++ STL to help you solve this part of the assignment.

Part 2a: All movies starting with a prefix

First, for each prefix, your program should print out all the movies whose names start with that prefix in decreasing order of rating. If multiple movies have the same rating, then print them in alphabetical order of movie name. For example, print the american president, 6.5 before the confessional, 6.5. If no movie names start with that prefix, then print

No movies found with prefix <prefix_value>

Part 2b: Highest rated movie for a prefix

Second, for each prefix, your program should print the highest rated movie whose name starts with that prefix. If there is a tie for highest rated movie, then use the movie whose name comes first in alphabetical order.

Examples

Example with 3 prefixes and multiple movies with same rating

./runMovies input_100_random.csv to th w

This should produce the following output:

toy story, 7.7
to die for, 6.7
tom and huck, 5.4

the usual suspects, 8.1
the city of lost children, 7.6
the postman, 7.6
the white balloon, 7.5
the journey of august king, 6.7
things to do in denver when you're dead, 6.7
the american president, 6.5
the confessional, 6.5
the crossing guard, 6.1
the indian in the cupboard, 5.9
the juror, 5.5
the big green, 5.2

wings of courage, 6.8
white squall, 6.3
waiting to exhale, 6.1
when night is falling, 5.9

Best movie with prefix to is: toy story with rating 7.7
Best movie with prefix th is: the usual suspects with rating 8.1
Best movie with prefix w is: wings of courage with rating 6.8

Another example multiple movies with same rating

./runMovies input_1000_random.csv be

should produce the output:

before sunrise, 7.7
before the rain, 7.7
beauty and the beast, 7.5
belle de jour, 7.3
beautiful girls, 6.6
beyond rangoon, 6.4
beat the devil, 6.2
before and after, 5.8
beverly hills cop iii, 5.5
bed of roses, 5.1
being human, 5.0
beyond bedlam, 4.0

Best movie with prefix be is: before sunrise with rating 7.7

Example with no movies for a given prefix

./runMovies input_100_random.csv t xyz

should produce the output

the usual suspects, 8.1
toy story, 7.7
the city of lost children, 7.6
the postman, 7.6
the white balloon, 7.5
twelve monkeys, 7.4
the journey of august king, 6.7
things to do in denver when you're dead, 6.7
to die for, 6.7
the american president, 6.5
the confessional, 6.5
the crossing guard, 6.1
the indian in the cupboard, 5.9
the juror, 5.5
tom and huck, 5.4
two bits, 5.4
the big green, 5.2
two if by sea, 4.5

No movies found with prefix xyz

Best movie with prefix t is: the usual suspects with rating 8.1

Part 3: Analyze the time and space complexity of your algorithm from part 2

Assume that

Part 3a: Analyze time complexity

Analyze the worst case Big-O time complexity of your algorithm from part 2 of the assignment. You may assume that

You must provide the time complexity analysis as a commented block right after your main() function in main.cpp. Note that

You will be graded for the efficiency of your algorithms but also the clarity and correctness of your analysis. However, we are not giving you a target Big-O time complexity. A set of solutions that have similar Big-O time complexities will receive the same grade.

Part 3b: Analyze space complexity

Analyze the worst case Big-O space complexity of your algorithm from part 2 of the assignment. You may assume that

You must provide the space complexity analysis as a commented block under your time complexity analysis. Note that

You will be graded for the efficiency of your algorithms but also the clarity and correctness of your analysis. However, we are not giving you a target Big-O space complexity. A set of solutions that have similar Big-O space complexities will receive the same grade.

Part 3c: Explore tradeoffs between time/space complexity

Briefly state how you designed your algorithm from part 2 of the assignment for the task at hand. More specifically, answer this question:

Based on your answer to the question above, answer one of the following:

  1. If you designed your algorithm for a low time complexity,
    • Were you able to achieve a low space complexity as well?
    • Why or why not?
  2. If you designed your algorithm for a low space complexity,
    • Were you able to achieve a low time complexity as well?
    • Why or why not?
  3. If you designed your algorithm for both,
    • Were you able to achieve both?
    • Why or why not?
    • Which was harder to achieve: low time complexity or low space complexity?

You must provide these answers as a commented block under your space complexity analysis.

You will be graded for the clarity and thoughtfulness of your analysis.

Requirements

For this programming assignment, you will have a lot of flexibility on your implementation (which just means we won’t be providing a code framework for you to fill in). However, there are a few requirements that you need to keep in mind as you think about your solution: