[ad_1]
Machine studying optimization of vector quantization strategies utilized in end-to-end coaching of neural networks
This publish is a brief clarification of my paper [1] revealed at ICASSP 2023 convention. For extra particulars, please take a look at the paper under this link.
Vector quantization (VQ) is a knowledge compression method just like k-means algorithm which may mannequin any knowledge distribution. Vector quantization has been utilized in a variety of purposes for speech, picture, and video knowledge, reminiscent of picture era [2], speech and audio coding [3], voice conversion [4,5], music era [6], and text-to-speech synthesis [7,8]. The determine beneath reveals how vector quantization (VQ) works. For VQ course of, we require a codebook which incorporates various codewords. Making use of VQ on a knowledge level (grey dots) means to map it to the closest codeword (blue dots), i.e. change the worth of knowledge level with the closest codeword worth. Every voronoi cell (black traces) incorporates one codeword such that each one knowledge factors situated in that cell will likely be mapped to that codeword, since it’s the closest codeword to knowledge factors situated in that voronoi cell.
In different phrases, vector quantization maps the enter vector x to the closest codeword inside the codebook (CB) utilizing the next components:
The computational complexity of VQ will increase exponentially with the rise within the codebook dimension (enhance in VQ bitrate). Therefore, this plain type of VQ is relevant just for restricted bitrates (restricted codebook sizes). To unravel this problem and apply VQ for larger bitrates and better dimensional knowledge, we use some variants of VQ reminiscent of Residual VQ, Additive VQ, and Product VQ. These strategies considers multiple codebook to use VQ on the information. We’ll clarify these three VQ strategies within the following.
Residual VQ quantizes the enter vector x by making use of M consecutive VQ modules on it. In response to the next determine, suppose M=3. We apply the primary VQ module on enter vector x utilizing the primary codebook (CB¹). Then, after discovering the closest codeword type first codebook, we calculate the rest (R1). Afterwards, we cross R1 as enter to the following VQ module utilizing the second codebook (CB²). This course of will proceed for M levels the place we’d discover three closest codeword coming from separate codebooks. On the finish, we quantize the enter vector x as a summation of M closest codewords.
In an analogous means as Residual VQ, Additive VQ quantizes the enter vector x by making use of M consecutive VQ modules. Nevertheless, Additive VQ adopts the complicated beam looking out algorithm to seek out the closest codewords for the quantization course of (you’ll find the small print of beam looking out algorithm on this paper [9]). In response to the next determine, we suppose M=3. In Additive VQ, first we seek for the closest codeword from the union of all three codebooks (right here CB¹, CB², CB³). Then, suppose we discover the very best codeword from CB². After that, we calculate the residual (R1) and cross it as enter to the following VQ module. For the reason that first codeword is chosen from CB², now we seek for the closest codeword from the union of CB¹ and CB³. After calculating the residual R2, we cross it as enter to the final VQ module, the place we do the search utilizing the final codebook (on this case CB¹) which isn’t but contributed to the quantization course of. On the finish, we quantize the enter vector x as a summation of M closest codewords.
Product VQ splits the enter vector x of dimension D to M impartial subspaces of dimension D/M. Then it applies M impartial VQ modules to the prevailing subspaces. On the finish, Product VQ quantizes the enter vector x as a concatenation of M closest codewords (one per every codebook). The determine beneath reveals the Product VQ when M=3.
Vector quantization (VQ) coaching means to optimize the codebook(s) such that they mannequin the information distribution in a means that the error of quantization (reminiscent of imply squared error) between knowledge factors and codebook parts is minimized. To optimize the codebooks for these three above-mentioned variants of VQ (Residual VQ, Additive VQ, and Product VQ) there are totally different approaches which we are going to point out within the following.
1. Ok-means Algorithm (conventional method):
Based mostly on the literature assessment, in many of the papers, codebooks for these three VQ strategies have been optimized by k-means algorithm.
2. Stochastic Optimization (machine studying algorithms):
Machine studying optimization algorithms are primarily based on gradient calculation. Due to this fact, it’s unattainable to optimize vector quantization strategies utilizing machine studying optimization, because the argmin perform in vector quantization perform (first equation above) shouldn’t be differentiable. In different phrases, we can’t cross the gradients over vector quantization perform in backpropagation. Right here we now have talked about two options to unravel this drawback.
2.1. Straight Via Estimator (STE)
STE [10] solves the issue by merely copying the gradients intactly over VQ module in backpropagation. Therefore, it doesn’t think about the affect of vector quantization and results in a mismatch between the gradients and true conduct of the VQ perform.
2.2. Noise Substitution in Vector Quantization (NSVQ):
The NSVQ method [11] is our just lately proposed technique, by which the vector quantization error is simulated by including noise to the enter vector, such that the simulated noise would acquire the form of authentic VQ error distribution (you’ll be able to learn shortly about NSVQ in this post).
NSVQ method [11] has some benefits over STE technique [10] that are listed within the following. 1) NSVQ yields extra correct gradients for VQ perform. 2) NSVQ achieves sooner convergence for VQ coaching (codebook optimization). 3) NSVQ doesn’t want any extra hyper-parameter tuning for VQ coaching (doesn’t require extra loss time period for VQ coaching to be added to the worldwide optimization loss perform).
In our paper, we now have used our just lately proposed NSVQ method [11] to optimize three above-mentioned variants of VQ by machine studying optimization. To judge the efficiency of those three VQ strategies and examine the trade-offs between accuracy, bitrate, and complexity of them, we performed 4 totally different situations for experiments. We’ll clarify all these situations of experiments within the following.
1. Approximate Nearest Neighbor (ANN) Search
On this experiment, we modeled the distribution of SIFT1M dataset [12] (128-D picture descriptors) by coaching three VQ strategies on its studying set. The SIFT1M picture descriptors dataset [12] consists of 10⁶ base vectors, 10⁵ studying vectors, and 10⁴ question vectors for testing functions. The bottom reality incorporates the set of precise nearest neighbors, from the bottom vectors to the question vectors. Within the ANN search, we first compress the bottom vectors utilizing the corresponding learnt codebooks skilled on the educational set. Then, for every question vector, we discover the approximate nearest neighbors from the compressed base vectors by performing an exhaustive search. To evaluate the standard of knowledge compression, we calculate the recall metric at totally different values for parameter T, which reveals whether or not the precise nearest neighbor (from groundtruth) exists within the first T computed nearest neighbors. The determine beneath illustrates the comparability of three variants of VQ optimized by our proposed NSVQ method with the baseline strategies below recall metric. Usually, all three machine learning-based optimized VQ strategies obtain comparable (even barely higher in case of RVQ) recall values to the baselines.
2. Picture Compression utilizing VQ-VAE
On this experiment, we skilled a vector quantized variational autoencoder (VQ-VAE) on the coaching set of CIFAR10 dataset to compress it. To use the vector quantization within the bottleneck of VQ-VAE, we used every of those three VQ strategies. After coaching, we reconstructed the check pictures of CIFAR10 utilizing the skilled encoder, decoder, and learnt codebooks for every VQ technique. To judge the standard of reconstructed pictures, we make use of Peak Sign to Noise Ratio (Peak SNR) metric. As well as, we computed the complexity of every VQ technique utilizing Weighted Million Operations Per Second (WMOPS) metric, which is below ITU-T standard. The next determine reveals the outcomes of this experiment.
In response to the complexity determine (in the appropriate), we discovered that for a similar use of computational assets (left vertical pink line) and the next bitrate, Product VQ performs higher than Residual VQ. As well as, for a similar use of computational assets (proper vertical pink line) and the next bitrate, Residual VQ performs higher than Additive VQ. Due to this fact, relying on how a lot computational assets can be found, we are able to conclude which is the very best VQ technique to make use of.
3. Speech Coding
On this experiment, we mannequin the spectral envelope of speech indicators by three VQ strategies utilizing the speech codec offered in [13]. To judge the standard of decoded speech indicators, we used perceptual analysis of speech high quality (PESQ) and perceptually weighted sign to noise ratio (pSNR) as goal metrics. The next determine reveals the efficiency of all three VQ strategies below PESQ and pSNR standards. In response to the outcomes, we observe that Additive VQ positive aspects larger imply and decrease variance than each Residual VQ and Product VQ in each metrics.
4. Toy Examples
On this experiment, we intend to match the efficiency of three VQ strategies with respect to the correlation within the knowledge. Therefore, we ready two correlated and uncorrelated datasets of dimension 64. Then, we compressed these datasets utilizing these three VQ strategies. To judge the efficiency, we computed the imply squared error (MSE) between every dataset and its quantized model. The next determine reveals the outcomes for this experiment.
In correlated dataset, since Residual VQ and Additive VQ take the correlation amongst all knowledge dimensions under consideration, they’ve a lot decrease quantization error than Product VQ as anticipated. Then again, Product VQ has higher efficiency than Additive VQ and Residual VQ for uncorrelated knowledge, since there is no such thing as a correlation amongst knowledge dimensions and that’s precisely what Product VQ presumes.
Utilizing variants of Vector Quantization (VQ) reminiscent of Residual VQ, Additive VQ, and Product VQ permits to use VQ for top bitrates and excessive dimensional knowledge. These VQ strategies have been optimized by basic expectation maximization and k-menas algorithm to date. On this paper, we optimize these VQ strategies by machine studying optimization utilizing our just lately proposed Noise Substitution in Vector Quantization (NSVQ) [11] method. As well as, NSVQ permits end-to-end optimization of VQ strategies in Neural Netwroks. We additionally examine the trade-offs between bitrate, accuracy, and complexity of those three VQ strategies. Therefore, our open-source implementation [14] helps to make the only option of VQ technique for a selected use-case.
We offer the PyTorch implementation of those VQ strategies within the following webpage.
[ad_2]
Source link