This paper proposed a CNN framework for object instance segmentation. The method is called Mask R-CNN, which is the extension of faster R-CNN by adding an FCN branch for predicting segmentation masks on each RoI.
Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. E.g. in a network like this: output[i] has edge back to input[j] for every hidden node. Usually, the number of hidden units is much less than the number of visible ones. As a result, when you pass data through such a network, it first encodes input vector to fit a smaller representation and then decodes it back. The task of the training is to minimize an error of reconstruction or find the most efficient compact representation for input data. The AEs (Autoencoders) are similar to PCA and can be used to dimension reduction. The hidden layer decoded features can be used as input features for downstream classification or another AE. RBM is a generative artificial neural network that can learn a probability distribution over a set of inputs. RBMs are a variant of Boltzmann machines , with the restriction that...
Title: You Only Look Once: Unified, Real-Time Object Detection YOLO is a new approach for object detection. You only need run a single CNN on the image once and make the decision for each grid. Instead of using the sliding windows or region proposes. the YOLO proposed to divide the image to S*S grid and predict each grid with regression. Contribution: The traditional model will do a classification for each proposal or sliding window, but the proposed YOLO divide the image to S*S grid and train a regression model. The ground truth target vector has the dimension of S*S*(B*5+C). Each grid cell predicts B bounding boxes and confidence scores for those boxes. The confidence score reflects how confidence the current grid contain the object and also how accurate it thinks the box is that it predicts. Network: The network has 24 constitutional layers followed by 2 fully connected layers. There are four 2*2 max-pooling layers with stride 2. Instead of inception modules used by ...
Sparse Coding Sparse coding minimizes the objective L sc = | | W H − X | | 2 2 reconstruction term + λ | | H | | 1 sparsity term where W is a matrix of bases, H is a matrix of codes and X i s a matrix of the data we wish to represent. λ implements a trade of between sparsity and reconstruction. Note that if we are given H , estimation of W is easy via least squares. Autoencoders Autoencoders are a family of unsupervised neural networks. There are quite a lot of them, e.g. deep autoencoders or those having different regularisation tricks attached--e.g. denoising, contractive, sparse. There even exist probabilistic ones, such as generative stochastic networks or the variational autoencoder. Their most abstract form is D ( d ( e ( x ; θ r ) ; θ d ) , x ) but we will go along with a much simpler one for now: L ae = | | W σ ( W T X ) − X | | 2 where σ σ is a nonlinear function such as the logistic sigmoid...
Comments
Post a Comment