Flight Delay

Mia started her new job with one of the best airlines in the country.

Her first project is to build a supervised learning model to predict flight delays. She has access to a large dataset containing information about her airline's arrival and departure flights.

Early on, Mia discovered something interesting: a delayed flight impacts every flight that day. Any future flight on the same day will likely depart late when a delay occurs.

Mia wants to split her dataset into a training and a test set. How would you recommend she does it?

Mia should use stratified splitting to ensure both sets contain the same ratio of arrival and departure flights.

Mia should split her dataset so that flights from the same date go in the same split.

Mia should split her dataset randomly to ensure both sets properly represent the overall dataset.

Mia should use flight information before a specific date as her training set and any data after that as her test set