60.1_用于时间序列数据的数据加载器

60.1 用于时间序列数据的数据加载器

上一个步骤按顺序依次取出时间序列数据（批量大小为1）。本步骤将把多个数据整理成小批量数据进行训练。为此我们要创建专用的数据加载器。

为了将时间序列数据合并为小批量数据，在传递数据时，我们可以“偏移”每个小批量数据的起始位置。假设时间序列数据由1000个数据组成，我们要创建的是大小为2的小批量数据。在这种情况下，第一个样本数据是从时间序列数据的开头(第0个)依次取出的。第二个样本数据则以第500个数据作为起始位置，并从该处依次取出数据(起始位置偏移500)。

基于以上内容，下面实现用于时间序列数据的数据加载器。代码如下所示。

dezero/datalogcers.py

class SeqDataLoader(DataLoader): def __init__(self, dataset, batch_size,gpu=False): super().__init__(dataset=dataset, batch_size=batch_size, shuffle=False,gpu=gpu) def __next__(self): if self_iteration >= self.max_iter: self.reset() raise StopIteration jump  $=$  self.data_size//self.batch_size batch_index  $=$  [(i \* jump  $^+$  self_iteration)%self.data_size for i in range(self.batch_size)] batch  $=$  [self.batch[i]for i in batch_index] xp  $=$  CUDA.cupy if selfgpu else np x=xp.array([example[0] for example in batch]) t  $=$  xp.array([example[1] for example in batch]) self_iteration  $+ = 1$  return x,t

首先修改的是初始化的部分。由于数据重排会改变数据的顺序，所以在时间序列数据的情况下，设置shuffle=False。

在__next__方法中，我们编写了取出下一个小批量数据的代码。重要的部分用阴影标出。首先求偏移量jump，然后将用于取出每个样本数据的索引的起始位置设置为batch_index，最后从数据集self(dataset中取出数据。

以上是用于实现时间序列数据的数据加载器的代码。下面是这个SeqDataLoader类的使用示例。

train_set =dezero.datasets.SinCurve(train=True)  
dataloder  $=$  SeqDataLoader(train_set，batch_size=3)  
x，t  $=$  next(dataloder)  
print(x)  
print('--')  
print(t)

运行结果

[ \begin{bmatrix} [-0.04725922] \\ [0.83577416] \\ [-0.83650972] \end{bmatrix} ]  
[[-0.04529467]  
[0.83116588]  
[[-0.88256346]]