This paper proposed a CNN framework for object instance segmentation. The method is called Mask R-CNN, which is the extension of faster R-CNN by adding an FCN branch for predicting segmentation masks on each RoI.
Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. E.g. in a network like this: output[i] has edge back to input[j] for every hidden node. Usually, the number of hidden units is much less than the number of visible ones. As a result, when you pass data through such a network, it first encodes input vector to fit a smaller representation and then decodes it back. The task of the training is to minimize an error of reconstruction or find the most efficient compact representation for input data. The AEs (Autoencoders) are similar to PCA and can be used to dimension reduction. The hidden layer decoded features can be used as input features for downstream classification or another AE. RBM is a generative artificial neural network that can learn a probability distribution over a set of inputs. RBMs are a variant of Boltzmann machines , with the restriction that...
Sparse Coding Sparse coding minimizes the objective L sc = | | W H − X | | 2 2 reconstruction term + λ | | H | | 1 sparsity term where W is a matrix of bases, H is a matrix of codes and X i s a matrix of the data we wish to represent. λ implements a trade of between sparsity and reconstruction. Note that if we are given H , estimation of W is easy via least squares. Autoencoders Autoencoders are a family of unsupervised neural networks. There are quite a lot of them, e.g. deep autoencoders or those having different regularisation tricks attached--e.g. denoising, contractive, sparse. There even exist probabilistic ones, such as generative stochastic networks or the variational autoencoder. Their most abstract form is D ( d ( e ( x ; θ r ) ; θ d ) , x ) but we will go along with a much simpler one for now: L ae = | | W σ ( W T X ) − X | | 2 where σ σ is a nonlinear function such as the logistic sigmoid...
SSD is the extension or updating version of YOLO. Compared with YOLO: 1 SSD use the multi-scale feature maps for detection, but the YOLO only use one scale for detection. 2 Before the decision layer, the YOLO use the fully connected layer for detection, but the SSD use the convolution filter for detection, which can improve the final detection performance. 3 Default boxes and aspect ratios. there are a set of default bounding boxes with each feature map cell for multiple feature maps at the top of the network. R-CNN: Selective Search Region Proposals; SVM classification Fast R-CNN: Selective Search Region Proposals; Shared computation by using SSP. Faster R-CNN: Region Proposal Network OverFeat VS SSD: If the SSD only use one default box per location from the topmost feature map, the SSD would have similar architecture to OverFeat. YOLO VS SSD: If use the whole topmost feature map and add a fully-connected layer for predictions instead of convolutional predictors, it...
Comments
Post a Comment