Unlike in conventional cross validation that assumes data point independence across time i.e observation that are near in time are related.
Therefore when the conventional cross validation technique are used to estimate the model accuracy for the time series data, then it fails miserably, as the conventional cross validation takes some input data at random points of the data.
E.g for a time series data 1,2,3,4,5,6,7,8,9,10 a traditional cross validation might yield the set as
1,10,9,4 as train and rest as test set
However in a time series data we would want to preserve the data point order and closeness. We might want to have something as
1,2,3,4,5,6 as train and 7,8,9,10 as test set.
This splitted time series data set has data dependence completely preserved.
Time Series Split
For such time series data need, in python we can use the Time Series Split, which returns first k folds as train set and the k+1th hold as test set.