#Deep Maxout Network | Explore Tumblr posts and blogs

swanirbhar · 2 years ago

Text

The first neural language model, that's Yoshua Bengio, one of the "Godfathers of Deep Learning"! He is widely regarded as one of the most impactful people in natural language processing and unsupervised learning. Here are his main contributions:

- 1994 - Identified the problem of vanishing and exploding gradients in RNN: https://lnkd.in/gNNWDnGG

- 1995 - He applied with Lecun Convolutional Neural Network to the speech and time series learning task: https://lnkd.in/gSJT7rd4

- 1998 - He suggested with LeCun a new CNN architecture (LeNet - Graph Transformer Networks) for the document recognition learning task: https://lnkd.in/gFyijKin

- 2003 - Proposed the first neural language model: https://lnkd.in/gJgngk_K

- 2006 - Proposed a new greedy training method for Deep Networks: https://lnkd.in/g4TfHwKc

- 2008 - Invented the Denoising Autoencoder model: https://lnkd.in/gUvwXGNN

- 2009 Developed a new method called curriculum learning, which is a form of structured learning that gradually exposes models to more difficult examples: https://lnkd.in/gyBirMxN

- 2010 - Proposed a new unsupervised learning technique, Stacked Denoising Autoencoder architecture: https://lnkd.in/gyH5-JTs

- 2010 - Proposed with Glorot the weight initialization technique used in most modern neural nets: https://lnkd.in/g4nqvxzh

- 2011 - Invented the ReLU activation function: https://lnkd.in/gwJsxHYQ

- 2011 - Proposed Bayesian optimization for hyperparameter optimization: https://lnkd.in/geSTQZWU

- 2012 - Proposed the random search technique for hyperparameter optimization: https://lnkd.in/gtx_ABwi

- 2013 - Proposed a unifying perspective on representation learning: https://lnkd.in/gVFU7iUh

- 2013 - Invented the gradient clipping strategy to prevent the vanishing and exploding gradient problem in RNN: https://lnkd.in/gQWkKMYq

- 2013 - Proposed a new activation function for Maxout networks: https://lnkd.in/gWdB72dH

- 2014 - He proposed the RNN encoder-decoder architecture: https://lnkd.in/gnGFsdJe

- 2014 - He proposed with Ian Goodfellow, the Generative Adversarial Networks, a novel class of deep learning models designed to generate synthetic data that resembles real data: https://arxiv.org/pdf/1406.2661.pdf

- 2014 - He proposed the gated recursive convolutional method for machine translation: https://lnkd.in/g-rRQ6km

- 2014 - He invented the Attention mechanism with Bahdanau: https://lnkd.in/g2C3sHjf

- 2014 - Proposed FitNets, a new distillation method to compress Deep NN: https://lnkd.in/gsQy8f8m

- 2014 - Showed the efficacy of transfer learning: https://lnkd.in/g4vfNNUx

- 2015 - Proposed BinaryConnect, a new method for efficient training with binary weights: https://lnkd.in/g-Pg63RT

- 2015 - Proposed a visual Attention mechanism for image caption generation: https://lnkd.in/gbmGDwA2

- 2017 - Proposed a novel architecture with self-attention layers for graph-structured data: https://lnkd.in/gg8tQBtP

- 2017 - Proposed a new convolutional architecture for brain tumor segmentation: https://lnkd.in/gxFuqr53

#machinelearning #datascience #artificialintelligence

0 notes

shilkaren · 5 years ago

Text

Activation Functions In Neural Network

Activation functions are a very important component of neural networks in deep learning. It helps us to determine the output of a deep learning model, its accuracy, and also the computational efficiency of training a model. They also have a major effect on how the neural networks will converge and what will be the convergence speed. In some cases, the activation functions might also prevent neural networks from convergence.

So, let’s understand the activation functions, types of activation functions & their importance and limitations in detail.

What is the activation function?

Activation functions help us to determine the output of a neural network. These types of functions are attached to each neuron in the neural network, and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.

Activation function also helps us to normalize the output of each neuron to a range between 1 and 0 or between -1 and 1.

As we know, sometimes the neural network is trained on millions of data points, So the activation function must be efficient enough that it should be capable of reducing the computation time and improve performance.

Let’s understand how it works?

In a neural network, inputs are fed into the neuron in the input layer. Where each neuron has a weight and multiplying the input number with the weight of each neuron gives the output of the neurons, which is then transferred to the next layer and this process continues. The output can be represented as: Y = ∑ (weights*input + bias)

Note: The range of Y can be in between -infinity to +infinity. So, to bring the output into our desired prediction or generalized results we have to pass this value from an activation function.

The activation function is a type of mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold that is provided. The final output can be represented as shown below: Y = activation function(summation (weights*input + bias))

Why we need Activation Functions?

The core idea behind applying any activation functions is to bring non-linearity into our deep learning models. Non-linear functions are those which have a degree more than one, and they have a curvature when we plot them as shown below.

We apply activation function so that we may add the ability to model to learn more complex and complicated data and become more powerful. It also helps to represent non-linear complex arbitrary functional mappings between inputs and outputs. By applying non-linear activation, we are able to bring non-linear mappings between the input and output.

One another important feature of an activation function is that it should be differentiable. We need it to be differentiable because while performing backpropagation optimization strategy while propagating backward in the network to compute gradients of error (loss) with respect to weights and, therefore, optimize weights using gradient descent or any other optimization techniques to reduce the error.

Types of Activation Functions used in Deep Learning

Below mentioned are some of the different type’s activation functions used in deep learning.

1. Binary step 2. Linear 3. Sigmoid 4. Softmax 5. Tanh 6. ReLu 7. LeakyReLU 8. PReLU 9. ELU (Exponential Linear Units) 10. Swish 11. Maxout 12. Softplus

Note: In this article, I will give a brief introduction of the most commonly used activation functions, and later I will try to write a separate article on each type of activation function.

The most commonly used linear and nonlinear activation functions are as follows:

1. Binary step 2. Linear 3. Sigmoid 4. Softmax 5. Tanh 6. ReLU 7. LeakyReLU

This is one of the most basic activation functions available to use and most of the time it comes to our mind whenever we try to bound output. It is basically a threshold base activation function, here we fix some threshold value to decide whether that the neuron should be activated or deactivated.

Mathematically, Binary step activation function can be represented as:

f(x) = 1 if x > 0 else 0 if x < 0

In the above figure, we decided the threshold value to be 0 as shown. The binary Activation function is very simple and useful to use when we want to classify binary problems or classifiers.

One of the problems with the binary step function is that it does not allow multi-value outputs - for example, it does not support classifying the inputs into one of several categories.

The linear activation function is a simple straight-line activation function where the function is directly proportional to the weighted sum of inputs or neurons.

A linear activation function will be in the form as: Y = mZ

This activation function takes the inputs, multiplies them by the weights of each neuron, and produces the outputs proportional to the input.

Linear activations function is better than a step function because it allows us for multiple outputs instead of only yes or no.

Some of the major problems with Linear Activation problem are as follows:

1. It is not possible to use backpropagation (gradient descent) to train the model as the derivative of this function is constant and has no relationship with the input.

2. With this activation function all layers of the neural network collapse into one.

So, we can simply say that a neural network with a linear activation function is simply a linear regression model. It has limited power and the ability to handle complex problems as varying parameters of input data.

Now, let’s see

In modern neural network models, it uses non-linear activation functions as the complexity of the model increases. This nonlinear activation function allows the model to create complex mappings between the inputs and outputs of the neural network, which are essential for learning and modeling complex data, such as images, video, audio, and data sets that are non-linear or have very high dimensionality.

With the help of Non-linear functions, we are able to deal with the problems of a linear activation function is:

1. They allow us for backpropagation because they have a derivative function which is having a relationship with the inputs.

2. They also allow us for “stacking” of multiple layers of neurons which helps to create a deep neural network. As we need multiple hidden layers of neurons to learn complex data sets with high levels of accuracy and better results.

The Sigmoid activation function is one of the most widely used activation function. This function is mostly used as it performs its task with great efficiency. It is basically a probabilistic approach towards decision making and its value ranges between 0 and 1. When we plot this function it is plotted as ‘S’ shaped graph

If we have to make a decision or to predict an output, we use this activation function because its range is minimum which helps for accurate prediction.

The equation for the sigmoid function can be given as:

f(x) = 1/(1+e(-x))

The most common issues with the sigmoid function are that it causes a problem mainly in termed of vanishing gradient which occurs because here we converted large input in between the range of 0 to 1 and therefore their derivatives become much smaller which does not give satisfactory output.

Another problem with this activation function is that it is Computationally expensive.

To solve the problem of Sigmoid Activation another activation function such as ReLU is used where we do not have a problem of small derivatives.

ReLU or Rectified Linear Unit is one of the most widely used activation functions nowadays. It ranges from 0 to Infinity. It is mostly applied in the hidden layers of the Neural network. All the negative values are converted to zero. It produces an output x if x is positive and 0 otherwise.

Equation of this function is: Y(x) = max(0,x)

The Dying ReLU problem: When inputs approach zero or are negative, the gradient of the function becomes zero so the network cannot perform backpropagation and cannot learn properly. This problem is known as The Dying ReLU problem.

So, to avoid this problem we use the Leaky ReLU activation function instead of ReLU. In Leaky ReLU its range is expanded which helps us to enhances the performance of the model.

We needed the Leaky ReLU activation function to solve the ‘Dying ReLU’ problem, as discussed in ReLU. We observe that all the negative input values turn into zero very quickly and in the case of Leaky ReLU we do not make all negative inputs to zero but instead we make a value near to zero which solves the major problem of ReLU activation function and helps us in increasing model performance.

Figure. Leaky ReLU Activation Function

In most cases, the Tanh activation function always works better than the sigmoid function. Tanh stands for Tangent Hyperbolic function. It’s actually a modified version of the sigmoid function. Both of them can be derived from each other. Its values lie between -1 and 1.

The equation of the tanh activation function is given as:

f(x) = tanh(x) = 2/(1 + e-2x) – 1

tanh(x) = 2 * sigmoid(2x) - 1

The Softmax Activation function is also a type of sigmoid function but is quite useful when we are dealing with classification problems. This function is usually used when trying to handle multiple classes.

It would bring the results for each class between 0 and 1 and would also divide by the sum of the outputs.

The softmax function is ideally used in the output layer of the classifier model where we are actually trying to attain the probabilities to define the class of each input.

Note: For Binary classification we can use both sigmoid, as well as the softmax activation function which is equally approachable. But when we are having multi-class classification problems, we generally use softmax and cross-entropy along with it.

The equation of the Softmax Activation function is:

Figure. Equation of the Softmax Activation function

As you may get familiar with the most commonly used activation functions. Let me summarize them in one place and provide you a reference as a cheat sheet which you may keep handy whenever you need any reference

Insideaiml is one of the best platforms where you can learn Python, Data Science, Machine Learning, Artificial Intelligence & showcase your knowledge to the outside world.

0 notes

digitalanurag-blog · 4 years ago

Text

Activation of ReLU Function - Machine Learning

ReLU stands for Rectified Linear Unit.ReLU activation function is among the most used activation functions in the profound learning versions. ReLU function is used in almost all of the convolutional neural networks or deep learning models. The ReLU job takes the highest value.

The ReLU (Rectified Linear Unit) purpose is an activation function that is now more popular as compared using all the sigmoid function and the tanh function.

The best way to write a ReLU function and its derivative in python? Thus, composing a ReLU purpose and its derivative is quite simple. Just we have to specify a purpose for the formula. It's used as shown below: ReLU Function

The equation of the ReLU purpose is given by:

Return 1 if z > 0 0

Note: ReLU work is not completely interval-derivable, however we can shoot sub-gradient, as shown in the figure below. Though ReLU is simple, it is an important achievement recently for deep learning researchers.

Benefits of tanh Function

once the input is OK, no gradient saturation problem. The calculation speed is extremely quickly. The ReLU function has only a direct connection. Even so forward or backward, much faster than tanh and sigmoid. (tanh and Sigmoid you need to figure the object, which will move gradually.)

Disadvantages of tanh Function

Once the input is negative, ReLU is not fully operational, meaning when it has to do with the incorrect number set up, ReLU will die. This problem is also known as the Dead Neurons difficulty. During the time you're forward propagation process, not a issue. Many regions are somewhat sensitive while others exist unsympathetic. But from the rear propagation process, if you enter something negative amount, then the gradient will be entirely zero, with precisely the same problem as sigmoid purpose and tanh function. We discover the consequence of ReLU perform could be 0 or positive amount, meaning that ReLU activity is not 0-centric action. ReLU function may only be used in Hidden layers of a Neural Network Model. To overcome the Dead Neurons problem of ReLU function a different modification has been introduced which is named Leaky ReLU. It introduces a small slope to keep the updates living and overcome the dead volunteers problem of ReLU. Another variant was created from both ReLu and Leaky ReLu called which is called Maxout function which we'll be discussing in details in different articles.

I hope you liked reading this article and finally, you have come to learn about ReLU Activation Function. For more such blogs/courses on information science, machine learning, artificial intelligence and emerging new technologies do visit us in InsideAIML. Thanks for reading...Joyful Learning...

#machine learning #artificial intelligence #python #data science

1 note · View note

vieclam365vn · 5 years ago

Text

[Góc chia sẻ] Convolutional neural network là gì? Bạn đã biết đến chưa?

1. Những điều bạn cần biết về Convolutional neural network Với những ai không có năng khiếu về công nghệ thông tin, hay các lĩnh vực liên quan đến tin học, thuật toán và các phần mềm máy tính thì Convolutional neural network sẽ khá xa lạ với các bạn. Bài viết này sẽ cung cấp những thông tin cơ bản giúp bạn hiểu rõ hơn về thuật toán này nhé. 1.1. Bạn hiểu Convolutional neural network là gì? Không phải ai cũng có thể giải nghĩa chính xác về Convolutional neural network. Nhưng nói một cách đơn giản thì thuật toán Convolutional neural network còn gọi là Mạng nơ ron tích chập, thường được viết tắt là CNN. Đây là một trong những mô hình của Deep Learning (Deep learning là tập hợp các thuật toán để cố gắng mô hình dữ liệu trừu tượng hóa ở mức cao bằng cách sử dụng nhiều lớp xử lý với cấu trúc phức tạp hoặc bằng cách khác). Tác dụng của thuật toán này chính là giúp chúng ta tạo ra được những hệ thống thông minh, có sự phản ứng với độ chính xác cao. Thêm vào đó là khả năng áp dụng được vào đời sống thực tiễn. Ví dụ như Facebook, Google,...đã đưa vào sản phẩm của mình chức năng nhận diện khuôn mặt người dùng,... Định nghĩa của Convolutional neural network Lấy ví dụ đơn giản để giải thích cho khái niệm này. Tức là chúng ta sử dụng CNN để xác định xem hình ảnh đó là gì, tức là nó có thể là cái này hoặc là cái kia. Một điều khó đối với chúng ta chính là việc máy tính chỉ hiểu được các con số mà thôi. Nếu ta đưa hình ảnh vào thì nó sẽ giống như một mảng các điểm ảnh hai chiều và khi so sánh các điểm ảnh không trùng thì sẽ không khớp. Và điều chúng ta muốn là dù bị thay đổi thì ta vẫn có thể biết được hình ảnh đó là gì? Đó chính là lúc ta dùng CNN. 1.2. Feature Feature có thể hiểu là đặc điểm. Ở đây, ta thấy CNN so sánh hình ảnh theo từng mảnh, mỗi mảnh đó được gọi là Feature. So với việc khớp các bức ảnh lại với nhau thì CNN làm việc nhìn ra sự tương đồng trong việc tìm kiếm thô các Feature khớp với nhau trong hai hình ảnh tốt hơn. Mỗi feature được coi như một hình ảnh mini, tức là chúng cũng là những mảng hai chiều nhỏ. Các Feature sẽ được khớp với các khía cạnh chung của bức ảnh đó nghĩa là feature này sẽ tương ứng với khía cạnh nào đó của hình ảnh và chúng sẽ được khớp lại với nhau. 1.3. Thế nào là Convolutional? Convolutional ở đây có thể hiểu với ý nghĩa là tích chập. Nói một cách đơn giản thì khi xem một hình ảnh mới, CNN sẽ không biết nó ở vị trí nào, và các Feature sẽ khớp với nhau ở đâu, vì vậy nó thử chúng ở tất cả các vị trí khác nhau.Trong quá trình đó chúng ta tạo thành được một bộ lọc, được gọi là Filter. Và để thực hiện được điều này, chúng ta đã sử dụng phần toán gọi là nơ ron tích chập. Định nghĩa của Convolutional Nếu bạn muốn tính toán được sự khớp của các Feature với mỗi mảnh của hình ảnh thì ta lấy kết quả của phép tính giữa mỗi điểm ảnh trong Feature nhân với giá trị của điểm ảnh tương ứng trong hình ảnh đó. Sau khi có được kết quả của phép tính tùng feature với ảnh đó ta sẽ đem cộng lại hết với nhau rồi chia cho số lượng tất cả các điểm ảnh có trong Feature đó. Nếu các điểm ảnh mà khớp nhau thì sẽ cho kết quả là 1, còn nếu không thì kết qua sẽ là (-1). Để hoàn tất được quá trình tích chập, chúng ta phải lặp lại hành động trên. Quá trình đó chính là việc ta xếp tất cả các Feature vào tất cả mọi mảnh hình ảnh có thể thực hiện được. Kết quả của quá trình này chính là chúng ta có những hình ảnh đã được lọc, mỗi cái sẽ có filter tương ứng. Có thể nói, quá trình tích chập diễn ra theo từng lớp một và nó được gọi là layer. 2. Các lớp cơ bản trong CNN 2.1. Convolutional layer Có thể nói đây là một lớp cực kỳ quan trọng trong CNN, bởi ở lớp này sẽ thực hiện mọi phép tính toán. Một số khái niệm cần nhắc đến ở Convolutional layer là filter map, stride, padding, feature map. - Nếu như ANN kết nối với từng pixel của hình ảnh đầu vào thì CNN sử dụng những filter để áp vào những vùng của hình ảnh. Các filter map này chính là một ma trận 3 chiều, trong đó bao gồm những con số và các con số đó chính là parameter. - Stride ở đây có thể hiểu là khi bạn dịch chuyển filter map theo pixel dựa vào một giá trị từ trái sang phải. Stride đó chính là chỉ sự dịch chuyển này. Các lớp trong Convolutional layer - Padding: những giá trị 0 được thêm vào lớp input - Feature map: thể hiện kết quả mỗi lần filter map quét qua input. Mỗi lần quét như thế sẽ xảy ra quá trình tính toán. 2.2. Pooling layer Nếu như đầu vào quá lớn, các lớp pooling layer sẽ được xếp vào giữa các lớp Convolutional layer để làm giảm các parameter. Pooling layer có 2 loại phổ biến là max pooling và average pooling. Ở đây, khi sử dụng lớp max pooling thì số lượng parameter giảm đi. Khi đo CNN gồm nhiều lớp filter map, mỗi filter map đó sẽ cho max pooling khác nhau. 2.3. Relu layer Relu layer chính là một hàm kích hoạt trong neural network. Hàm kích hoạt còn được gọi là activation function. Tác dụng chính của hàm kích hoạt này chính là việc mô phỏng các neuron có tỷ lệ truyền xung qua axon. Trong activation function có các hàm cơ bản như: Sigmoid, Tanh, Relu, Leaky relu, Maxout. Hiện nay, hàm relu đang được sử dụng khá phổ biến và thông dụng. Đặc biệt là trong việc huấn luyện các mạng neuron thì relu có những ưu điểm khá nổi bật. Có thể kể đến như việc tính toán nhanh hơn,... Cac lớp cơ bản trong CNN Khi sử dụng relu đầu tiên thì chúng ta phải chú ý đến việc tùy chỉnh các learning rate và theo dõi dead unit. Lớp relu layer được sử dụng sau khi mỗi filter map được tính toán ra và áp dụng hàm relu lên tất cả các giá trị của filter map. 2.4. Fully connected layer Dùng để đưa ra kết quả. Ví dụ, sau khi các lớp Convolutional layer và pooling layer đã nhận được các ảnh đã truyền qua nó, thì lúc đó ta sẽ thu được kết quả là model dã đọc được khá nhiều thông tin về ảnh. Vì vậy, để liên kết các đặc điểm đó lại và cho ra output chúng ta dùng fully connected layer. ngoài ra, ở fully connected layer, thì khi có được các giữ liệu hình ảnh, chúng sẽ chuyển nó thành các mục có sự phân chia chất lượng. Giống như kiểu chia nó thành các phiếu bầu và sau đó sẽ đánh giá đề bầu cho hình ảnh đạt chất lượng tốt. Mặc dù vậy, quá trình này không được coi là quá trình dân chủ cho lắm. 3. Cấu trúc của CNN Mạng CNN gồm nhiều lớp Convolution chồng lên nhau, sử dụng các hàm và tanh để kích hoạt các trọng số. Mỗi một lớp sau khi được kích hoạt sẽ cho ra kết quả trừu tượng cho các lớp tiếp theo. Mỗi layer kế tiếp chính là thể hiện kết quả của layer trước đó. Thông qua quá trình training, các lớp layer CNN tự động học các giá trị được thể hiện qua các lớp filter. Có 2 điều cần quan tâm ở mô hình CNN là tính bất biến và tính kết hợp. Trong trường hợp, cùng một đối tượng mà chiếu theo những góc khác nhau thì sẽ cho độ chính xác có sự bị ảnh hưởng. Đối với phép dịch chuyển, quay và co dãn sẽ sử dụng pooling layer để sử dụng làm bất biến các tính chất kia. Vì vậy mà CNN đưa ra kết quả có độ chính xác cao ở các mô hình. Cấu trúc cơ bản trong CNN Cấu trúc cơ bản của CNN gồm 3 phần chính: Local receptive field, shared weights and bias, pooling - Local receptive field: hay còn gọi là các trường cục bộ. Tác dụng của lớp này chính là nó giúp chúng ta tách lọc các dữ liệu, thông tin của ảnh và chọn được những vùng ảnh có giá trị sử dụng nhất. - Shared weights and bias: tiếng Việt có nghĩa là Trọng số chia sẻ. Làm giảm tối đa số lượng các tham số là tác dụng chính của yếu tố này trong mạng CNN hiện nay. Bởi trong mỗi convolution có những feature map khác nhau, mỗi feature map lại giúp detect một vài feature trong ảnh. - Pooling layer: lớp tổng hợp. Đây gần như là lớp cuối cùng trước khi cho ra kết quả. Vì vậy, để có được kết quả dễ hiểu và dễ dùng nhất thì pooling layer sẽ có tác dụng làm đơn giản hóa thông tin đầu ra.Tức là, sau khi hoàn tất các quá trình tính toán và quét các lớp thì sẽ đi đến pooling layer để giảm lược bớt những thông tin không cần thiết, sau đó cho ra kết quả mà chúng ta mong muốn. 4. Nên chọn tham số như thế nào cho CNN? Để chọn được tham số cho CNN thì chúng ta cần để ý đến các số lượng của các mục sau: số convolution layer, filter size, pooling size và việc train test. - Số convolution layer: lớp này càng nhiều thì chương trình chạy càng được cải thiện. Việc sử dụng các layer với số lượng lớn sẽ dẫn đến các tác động có thể được giảm một cách đáng kể. Có thể chỉ sau 3 đến 4 layer thôi cũng đã đạt được kết quả như mong muốn. - Filter size: thông thường, các filter size sẽ có kích thước là 3x3 hoặc 5x5 Cách chọn tham số trong CNN - Pooling size: nếu hình ảnh thông thường thì sẽ sử dụng kích thước 2x2, còn nếu đầu vào hình ảnh lớn ta có thể sử dụng 4x4 - Train test: Việc train test nên được thực hiện một cách nhiều lần. Như vậy, sẽ cho ra được các parameter tốt nhất. Có thể nó, thuật toán Convolutional neural network đêm đến cho ta mô hình có chất lượng rất tốt. Mặc dù, về bản chất, đây là thuật toán không quá đơn giản, nhưng nó cho ra kết quả khá hài lòng. tuy nhiên, không phải ai cũng có thể hiểu được thuật toán này ngay khi mới tiếp xúc với nó. Mong rằng, bài viết này đã giúp các bạn độc giả hiểu rõ hơn về Convolutional neural network, một thuật toán ứng dụng rất nhiều trong việc áp dụng các hệ thống xử lý thông minh như phát triển xe hơi tự lái hay giao hàng tự động,... Nếu bạn yêu thích IT và muốn biết rõ hơn về các công việc trong ngành nghề này thì bạn có thể tra cứu trên website timviec365.vn. Đây là trang web cập nhật những thông tin mới nhất và đầy đủ nhất về việc làm cũng như định hướng các nghề nghiệp tương lai cho bản thân.

Xem nguyên bài viết tại: [Góc chia sẻ] Convolutional neural network là gì? Bạn đã biết đến chưa?

#timviec365vn

0 notes

wonbindatascience · 7 years ago

Text

Activation Function

1. 배경

Deep neural network는 사람의 뇌를 본따서 만들었다. 사람의 뇌는 수많은 뉴런(Neuron)과 그 뉴런들을 잇는 시냅스(Synapse)로 이루어져 있다. 한 뉴런에서 시냅스를 거쳐 다음 뉴런으로 자극이 전달되면 ‘활성화’ 되었다고 하고 전달되지 않으면 활성화되지 않았다고 한다. 이렇게 활성화 되느냐 되지 않느냐를 (혹은 얼만큼 활성화 되느냐를) 결정짓는 ��수가 activation function이다.

이제 activation function의 역할을 알았다면 다음으로는 activation function이 어떻게 생겨야만 딥러닝의 목표를 잘 달성할 수 있는지에 대해서 알아보자.

(https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0)

2. 종류

2.0 step function

f(x) = { (1, if x > threshold), (0, if x <= treshold) }

: multiple class 문제에서는 여러 ouput neuron이 동시에 1의 값을 가져버리면 어떤 class라고 판별하기가 애매해져버린다.

2.0 linear function

f(x) = cx

: NN(Neural Net)에서 feedfoward 방향으로 y_hat 계산하고 나서 loss function을 optimizing할 때 gradient descent를 적용하면 loss function값에 대해서 미분할 때 x와 관련없는 c라는 상수가 나와버리기 때문에 NN에 존재하는 weight와 bias값들이 의미가 없어져 버린다.

(이제 아래는 나머지 activation function으로 적합한 함수들)

2.1 sigmoid

장점 1: -2 < x < 2 에서 y가 급격하게 변하고 있으므로 x의 작은 변화도 y에 반영할 수 있다.

단점 1: 반면에 양쪽 curve 쪽에서는 x의 변화를 잘 반영하지 못하고 gradient값이 매우 작게 나와서 학습 속도가 느려지는 현상이 발생할 수 있다.

장점 2: 대부분의 y값을 양쪽 curve인 1또는 0에 가깝게 보내고 있기 때문에 명확하게 distinct할 수 있다는 측면에서 classification에 적합하다.

장점 3: y의 범위가 0에서 1까지로 한정되어 있으므로 activatino function으로 인해 layer가 깊어질수록 확대 해석되는 일이 없을 것이다.

2.2 tanh

장단점: sigmoid와 비슷하다.

장점 2: sigmoid에 비해서 gradient가 strong하다. 따라서 sigmoid를 쓸지, tanh를 쓸지는 필요로 되는 gradient strength에 달려있다.

2.3 ReLU

단점 1: (sigmoid ‘장점 3′의 반대)

장점 1: sigmoid나 tanh와 달리 ReLU는 특정 범위에서만 neuron들을 activate시키기 때문에 큰 규모의 NN인 상황에서 model이 light해질 수 있고 이는 cost를 절약하는 의미를 가진다.

장점 2: sigmoid나 tanh에 비해 함수가 간단하므로 less computationally expensive 하다.

단점 2: x < 0 인 부분에서는 gradient가 0이 될 수 있으므로 어떤 error나 input에 대해서도 update하는 반응을 못하게 된다(neuron이 그냥 죽어버리는 현상). 그래서 leaky ReLU가 대안.

(몇 몇 경우는 ReLU 대신 leaky ReLU를 사용함으로써 성능이 향상 되었지만 그렇지 않은 경우도 있다)

3. Which Activation Function to Use

먼저 ReLU를 사용해보자. (가장 많이 사용되는 함수다)

Leaky ReLU / Maxout / ELU도 시도해보자. (그러나 딱히 성능이 좋아질 거라 장담할 수는 없다)

tanh도 사용할 순 있지만 큰 기대는 하지 않는게 좋다.

sigmoid는 절대 사용하지 말자 (RNN에서는 사용하긴 하지만 다른 이유가 있기 때문이다).

http://nmhkahn.github.io/NN https://nittaku.tistory.com/267

0 notes