What are TPUs?
TPUs are powerful hardware accelerators specialized in deep learning tasks. They were developed (and first used) by Google to process large image databases, such as extracting all the text from Street View. Google started using TPU in 2015; then, they made it public in 2018.
TPUs are custom build processing units to work for a specific app framework. That is TensorFlow. An open-source machine learning platform, with state of the art tools, libraries, and community, so the user can quickly build and deploy ML apps.
The difference between CPU, GPU and TPU is that the CPU handles all the logics, calculations and input/output of the computer, it is a general-purpose processor. In comparison, GPU is an additional processor to enhance the graphical interface and run high-end tasks. TPUs are powerful custom-built processors to run the project made on a specific framework, i.e. TensorFlow.
Different types of processors are suited for different types of machine learning models. TPUs are well suited for CNNs, while GPUs have benefits for some fully-connected neural networks, and CPUs can have advantages for RNNs.
How TPU works?
Google
designed Cloud TPUs as a matrix processor specialized for neural
network workloads. TPUs can't run word processors, control rocket
engines, or execute bank transactions, but they can handle massive
matrix operations used in neural networks at fast speeds.
The TPU host streams data into an
infeed queue. The TPU loads data from the infeed queue and stores
them in HBM memory. When the computation is completed, the TPU loads
the results into the outfeed queue. The TPU host then reads the
results from the outfeed queue and stores them in the host's memory.
As a result, TPUs can achieve a
high-computational throughput on neural network calculations.
The primary task for TPUs is matrix processing, which is a
combination of multiply and accumulate operations. TPUs contain
thousands of multiply-accumulators that are directly connected to
each other to form a large physical matrix. This is called a systolic
array architecture.
To
perform the matrix operations, the TPU loads the parameters from HBM
memory into the Matrix Multiplication Unit (MXU). Then, the TPU loads
data from HBM memory. As each multiplication is executed, the result
is passed to the next multiply-accumulator. The output is the
summation of all multiplication results between the data and
parameters. No memory access is required during the matrix
multiplication process.
About the dataset
This competition provides its files in TFRecord format. The TFRecord format is a container format frequently used in Tensorflow to group and shard data data files for optimal training performace. Each file contains the id and img (the actual pixels in array form) information for many images.
Implementing the solution
After identifying the type of flowers, we will continue by training the model and making prediction to reveal patterns in the kinds of images our model has trouble with. Before making our final predictions on the test set, we will evaluate model's predictions on the validation set. This will help us diagnose problems in training or suggest ways our model can be improved.
References
[1]
Getting Started With TPUs
- https://www.kaggle.com/code/ryanholbrook/getting-started-with-tpus#Step-3:-Define-a-model
[2]
Efficiently using TPU for image classification
- https://medium.com/swlh/efficiently-using-tpu-for-image-classification-ed20d2970893
[3]
Introduction to Cloud TPU -
https://cloud.google.com/tpu/docs/intro-to-tpu
[4] When to use CPUs vs GPUs vs TPUs in a Kaggle Competition? -
https://towardsdatascience.com/when-to-use-cpus-vs-gpus-vs-tpus-in-a-kaggle-competition-9af708a8c3eb
Niciun comentariu:
Trimiteți un comentariu