#mathematicians solve decades old problem and create new shapes
Explore tagged Tumblr posts
goldislops · 9 months ago
Text
Shapes That Roll
Mathematicians have created a new shape that can roll, but it’s not a circle or sphere. They devised a multi-dimensional "guitar pick" shape that rolls in ways beyond our three-dimensional comprehension. This breakthrough solves a decades-old geometry problem.
How it works: Circles and spheres have so-called "constant width," which means there’s always the same distance between the edge of the shape and two parallel planes, such as the ground, as it rolls along. The researchers figured out how to create new shapes in any dimension by examining the intersection of an infinite number of n-dimensional balls–shapes where all edge points are the same distance from the center in n-dimensions. The result is the shape of a guitar pick in three dimensions that can “roll” by maintaining a constant width between two parallel planes in any dimension.
What the experts say: “It’s really difficult to estimate volume in high dimensions, yet this whole proof is fairly simple and so elegant,” says Gil Kalai, a professor at the Einstein Institute of Mathematics. He suggests this could mark the beginning of a new era in studying constant-width shapes from a new perspective, and that in the future mathematicians might construct even smaller ones. --Max Springer, news intern
Animated diagram shows a circle moving along a wire track shaped into an equilateral triangle with sides that match the radius of the circle in length. As the circle completes its trip around the track, the area of common overlap among all its positions over time forms a Reuleaux triangle.
The infinite intersection of n-dimensional balls in two dimensions Credit: Amanda Montañez
1 note · View note
lthmath · 6 years ago
Text
Recently we have been reorganizing our LThMath Book Club. The whole idea behind it is to read and discuss books with other people. We are happy that the Goodreads Club grew to 272 people. Recently people have been asking if we can use other platforms for the Book Club as well. Therefor, we have created a Facebook Group with the same idea as the Goodreads one. After the first 2 months we have reached 226 members in the group and we have some really great book recommendations. Hope you all enjoy the idea.
Due to this change, we cannot do just a Goodreads poll for the bi-monthly book. Therefor, we decided to do a survey (created using Google forms). In this way more people can vote for the book. If you want to vote, you need to do it HERE.
  “The Thrilling Adventures of Lovelace and Babbage: The (Mostly) True Story of the First Computer” by Sydney Padua
The Thrilling Adventures of Lovelace and Babbage presents a rollicking alternate reality in which Lovelace and Babbage do build the Difference Engine and then use it to build runaway economic models, battle the scourge of spelling errors, explore the wilder realms of mathematics, and, of course, fight crime—for the sake of both London and science. Complete with extensive footnotes that rival those penned by Lovelace herself, historical curiosities, and never-before-seen diagrams of Babbage’s mechanical, steam-powered computer, The Thrilling Adventures of Lovelace and Babbage is wonderfully whimsical, utterly unusual, and, above all, entirely irresistible.
“The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography” by Simon Singh
Simon Singh offers the first sweeping history of encryption, tracing its evolution and revealing the dramatic effects codes have had on wars, nations, and individual lives. From Mary, Queen of Scots, trapped by her own code, to the Navajo Code Talkers who helped the Allies win World War II, to the incredible (and incredibly simple) logisitical breakthrough that made Internet commerce secure, The Code Book tells the story of the most powerful intellectual weapon ever known: secrecy.
Throughout the text are clear technical and mathematical explanations, and portraits of the remarkable personalities who wrote and broke the world’s most difficult codes. Accessible, compelling, and remarkably far-reaching, this book will forever alter your view of history and what drives it.  It will also make you wonder how private that e-mail you just sent really is.
“Things to Make and Do in the Fourth Dimension: A Mathematician’s Journey Through Narcissistic Numbers, Optimal Dating Algorithms, at Least Two Kinds of Infinity, and More” by Matt Parker
In the absorbing and exhilarating Things to Make and Do in the Fourth Dimension, Parker sets out to convince his readers to revisit the very math that put them off the subject as fourteen-year-olds. Starting with the foundations of math familiar from school (numbers, geometry, and algebra), he takes us on a grand tour, from four dimensional shapes, knot theory, the mysteries of prime numbers, optimization algorithms, and the math behind barcodes and iPhone screens to the different kinds of infinity―and slightly beyond. Both playful and sophisticated, Things to Make and Do in the Fourth Dimension is filled with captivating games and puzzles, a buffet of optional hands-on activities that entice us to take pleasure in mathematics at all levels. Parker invites us to relearn much of what baffled us in school and, this time, to be utterly enthralled by it.
“A Beautiful Mind” by Sylvia Nasar
Economist and journalist Sylvia Nasar has written a biography of Nash that looks at all sides of his life. She gives an intelligent, understandable exposition of his mathematical ideas and a picture of schizophrenia that is evocative but decidedly unromantic. Her story of the machinations behind Nash’s Nobel is fascinating and one of very few such accounts available in print.
We are very interested in this book due to the movie “A Beautiful Mind”. It is an incredible, emotional and interesting movie about the life of John Nash. If this book was chosen, we believe it would be a great idea to watch the movie after we read the book. What do you think?
“Lost in Math: How Beauty Leards Physics Astray” by Sabine Hossenfelder
Whether pondering black holes or predicting discoveries at CERN, physicists believe the best theories are beautiful, natural, and elegant, and this standard separates popular theories from disposable ones. This is why, Sabine Hossenfelder argues, we have not seen a major breakthrough in the foundations of physics for more than four decades. The belief in beauty has become so dogmatic that it now conflicts with scientific objectivity: observation has been unable to confirm mindboggling theories, like supersymmetry or grand unification, invented by physicists based on aesthetic criteria. Worse, these “too good to not be true” theories are actually untestable and they have left the field in a cul-de-sac. To escape, physicists must rethink their methods. Only by embracing reality as it is can science discover the truth.
Looking at the general description, this sounds more like a book about physics but we are still interested to see how the author deals with the bondary between mathematics and physics. Also, this book was released in 2018.
“How Long is a Piece of String? More Hidden Mathematics of Everyday Life” by Rob Eastaway and Jeremy Wyndham
In this book, you will find that many intriguing everyday questions have mathematical answers. Discover the astonishing 37% rule for blind dates, the avoidance tactics of the gentleman’s urinal, and some extraordinary scams that have been devised to get rich quick. Also included are the origins of the seven-day week and the seven-note scale, an explanation of why underdogs win, clever techniques for detecting fraud, and the reason why epidemics sweep across a nation and disappear just as quickly. Whatever your mathematical ability, this fun, thought-provoking book will illuminate the ways in which math underlies so much in our everyday lives.
“A Brief History of Infinity” by Brian Clegg
Infinity is a concept that fascinates everyone from a seven-year-old child to a maths professor. An exploration of the most mind-boggling feature of maths and physics, this work examines amazing paradoxes and looks at many features of this fascinating concept.
After reading “Beyond Infinity” by Eugenia Cheng, this book might feel like a double kill especially if you feel like you need a break from infinity. On the other hand, we find the concept so mesmerizing that we just want to find out more about it.
“Gamma: Exploring Euler’s Constant” by Julian Havil
Among the many constants that appear in mathematics, π, e, and i are the most familiar. Following closely behind is y, or gamma, a constant that arises in many mathematical areas yet maintains a profound sense of mystery. In a tantalizing blend of history and mathematics, Julian Havil takes the reader on a journey through logarithms and the harmonic series, the two defining elements of gamma, toward the first account of gamma’s place in mathematics. Gamma takes us through countries, centuries, lives, and works, unfolding along the way the stories of some remarkable mathematics from some remarkable mathematicians.
“Magical MAthematics: The Mathematical Ideas that Animate Great Magic Tricks” by Persi Diaconis and Ron Graham
Magical Mathematics reveals the secrets of fun-to-perform card tricks–and the profound mathematical ideas behind them–that will astound even the most accomplished magician. Persi Diaconis and Ron Graham provide easy, step-by-step instructions for each trick, explaining how to set up the effect and offering tips on what to say and do while performing it. Each card trick introduces a new mathematical idea, and varying the tricks in turn takes readers to the very threshold of today’s mathematical knowledge. The book exposes old gambling secrets through the mathematics of shuffling cards, explains the classic street-gambling scam of three-card Monte, traces the history of mathematical magic back to the oldest mathematical trick–and much more.
We have read another book by Persi Diaconis (“Ten Great Ideas about Chance”) and we thought we could give it a try to another of his books, this time more fun and less stickt. If you want to find out more about “Ten Great Ideas about Chance” and what I thought about it, you can check the reivew.
“Here’s Looking at Euclid: A Surprizing Excursion Through the Astonishing World of Math” by Allex Bellos (also called: “Alex’s Adventures in Numberland”)
Bellos has traveled all around the globe and has plunged into history to uncover fascinating stories of mathematical achievement, from the breakthroughs of Euclid, the greatest mathematician of all time, to the creations of the Zen master of origami, one of the hottest areas of mathematical work today. Throughout, the journey is enhanced with a wealth of intriguing illustrations, such as of the clever puzzles known as tangrams and the crochet creation of an American math professor who suddenly realized one day that she could knit a representation of higher dimensional space that no one had been able to visualize. Whether writing about how algebra solved Swedish traffic problems, visiting the Mental Calculation World Cup to disclose the secrets of lightning calculation, or exploring the links between pineapples and beautiful teeth, Bellos is a wonderfully engaging guide who never fails to delight even as he edifies. “Here’s Looking at Euclid “is a rare gem that brings the beauty of math to life.
We hope this helped you decide what book you would like to read in August – September with us. Hope you liked this post. Have a great day. You can find us on Facebook,  Tumblr, Twitter and Instagram. We will try to post there as often as possible.
  October – November Book Choice Recently we have been reorganizing our LThMath Book Club. The whole idea behind it is to read and discuss books with other people.
22 notes · View notes
theconservativebrief · 7 years ago
Link
Algorithms are a black box.
We can see them at work in the world. We know they’re shaping outcomes all around us. But most of us have no idea what they are — or how we’re being influenced by them.
Algorithms are invisible pieces of code that tell a computer how to accomplish a specific task. Think of it as a recipe for a computer: An algorithm tells the computer what to do in order to produce a certain outcome. Every time you do a Google search or look at your Facebook feed or use GPS navigation in your car, you’re interacting with an algorithm.
A new book by Hannah Fry, a mathematician at University College London, argues that we shouldn’t think of algorithms themselves as either good or bad, but that we should be paying much more attention to the people programming them.
Algorithms are making hugely consequential decisions in our society on everything from medicine to transportation to welfare benefits to criminal justice and beyond. Yet the general public knows almost nothing about them, and even less about the engineers and coders who are creating them behind the scenes.
I reached out to Fry to talk about how algorithms are quietly changing the rules of human life and whether the benefits of algorithms ultimately outweigh the costs.
A lightly edited transcript of our conversation follows.
Sean Illing
How are algorithms changing human life?
Hannah Fry
In all sorts of ways, really. From what we choose to read and watch to who we choose to date, algorithms are increasingly playing a huge role. And it’s not just the obvious cases, like Google search algorithms or Amazon recommendation algorithms.
We’ve invited these algorithms into our courtrooms and our hospitals and our schools, and they’re making these tiny decisions on our behalf that are subtly shifting the way our society is operating.
Sean Illing
Do you think our trust in algorithms is misplaced? Are we making a mistake by handing over so much decision-making authority to these programs?
Hannah Fry
That’s a difficult question. We’ve got a really complicated relationship with machines. On the one hand, we sometimes do misplace our trust in them. We expect them to be almost godlike, to be so perfect that we will blindly follow them wherever they lead us.
But at the same time, we have a habit of dismissing an algorithm as soon as it is shown to be slightly flawed. So if Siri gets something wrong, or if our GPS app miscalculates the traffic, we think the whole machine is just rubbish. But that doesn’t make any sense.
Algorithms are not perfect, and they often contain the biases of the people who create them, but they’re still incredibly effective and they’ve made all of our lives a lot easier. So I think the right attitude is somewhere in the middle: We shouldn’t blindly trust algorithms, but we also shouldn’t dismiss them altogether.
Sean Illing
What advantages do we gain by relying so heavily on algorithms?
Hannah Fry
Humans are quite bad at a lot of things. We’re bad at being consistent. We’re bad at not being biased. We get tired and sloppy.
Algorithms posses none of those flaws. They’re incredibly consistent. They never get tired, and they’re absolutely precise. The problem is that algorithms don’t understand context or nuance. They don’t understand emotion and empathy in the way that humans do.
“We don’t have to create a world in which machines are telling us what to do or how to think, although we may very well end up in a world like that”
Sean Illing
Can you give me an example of an algorithm going disastrously wrong?
Hannah Fry
I write in the book about Christopher Drew Brooks, a 19-year-old man from Virginia who was convicted of the statutory rape of a 14-year-old girl. They’d had a consensual relationship, but she was underage and that’s illegal.
During sentencing, the judge in the case relied on an algorithm designed to make a prediction about how likely an individual is to go on to commit a crime if they’re released from jail. The algorithm assessed his score of reoffending, and it determined that because he was such a young man and he was already committing sexual offenses, there was quite a high chance that he would continue in this life of crime. So it recommended that he be given 18 months in jail.
And maybe that’s fair. But it also demonstrates just how illogical these algorithms can be sometimes. Because it turned out that this particular algorithm places a lot of weight on the age of the offender — so if he had been 36 instead of 19, it would’ve deemed him a much lower threat. But in that case, he would’ve been 22 years older than the victim, and I think any reasonable person would consider that worse.
This is an example of how these perfectly logical algorithms can arrive at bizarre results. And in this case, you’d think that the judge would’ve exercised his discretion and overruled the algorithm, but he actually increased Brooks’s sentence, in part because of the algorithm.
Sean Illing
Do you think the people creating these algorithms, the engineers at Google or Facebook or wherever, fully understand what they’re creating?
Hannah Fry
They’re starting to care a lot about the implications of this stuff. Facebook used to have the motto “move fast and break things,” and that was the attitude of much of the tech world. But the tide has shifted in the last couple of years. There’s been a wake-up call for a lot of these people as the unintended consequences of these creations have become much clearer.
Every social media platform, every algorithm that becomes part of our lives, is part of this massive unfolding social experiment. Billions of people around the world are interacting with these technologies, which is why the tiniest changes can have such a gigantic impact on all of humanity. And these companies are starting to recognize this and take it seriously.
Sean Illing
You say that algorithms themselves are neither good nor bad, but I want to push you on this a bit. Algorithms can produce unexpected outcomes, especially machine-learning algorithms that can program themselves. Since it’s impossible for us to anticipate all of these scenarios, can’t we say that some algorithms are bad, even if they weren’t designed to be?
Hannah Fry
That’s a good question. We have to think of these technologies, especially machine-learning and artificial intelligence, as more like the invention of electricity than the invention of the light bulb. By that I mean we don’t know how these things are going to be used and in what situations or what context.
But electricity in its own right isn’t good or bad — it’s just a tool that can be used in an infinite number of ways. Algorithms are like that, too. I haven’t come across an algorithm that was 100 percent bad or good. I think the context and everything around it is the thing that makes the difference.
“People can make any claims they want about what their algorithm can or can’t do, even if it’s absolute nonsense, and no one can really stop them from doing it”
Sean Illing
Do you worry that the proliferation of algorithms is eroding our ability to think and decide for ourselves?
Hannah Fry
There are places where that is clearly happening, where the role of humans has been sidelined. And that’s a really dangerous thing to allow to happen. But I also don’t think that it needs to be like that. Humans and machines don’t have to be opposed to one another. We have to work with machines, acknowledging that they are flawed, just as we are. And that they will make mistakes, just as we do.
We don’t have to create a world in which machines are telling us what to do or how to think, although we may very well end up in a world like that. I’d much prefer a world in which humans and machines, humans and algorithms, are partners.
Sean Illing
Do you believe that humans and artificial algorithms will eventually combine in ways that blur the distinction between the two?
Hannah Fry
It’s entirely possible, but we are a really, really, really long way away from that.
There is a project, for example, that’s been trying to replicate the brain of the C. elegans worm, which is a microscopic worm with something like 200 neurons in its brain — and we can’t do it. Even with the most sophisticated cutting-edge artificial intelligence, we’re nowhere near being able to simulate the brain of a teeny-tiny microscopic worm. So we’re galaxies away from simulating more complex animals, and even further away from replicating humans.
So these conversations are interesting, but they can also serve as a distraction from what’s going on right now. The rules and systems that govern our lives are changing all around us, and algorithms are a big part of that.
Sean Illing
Do we need a stronger regulatory framework for algorithms?
Hannah Fry
Absolutely. We’ve been living in the technological Wild West, where you can collect private data on people without their permission and sell it to advertisers. We’re turning people into products, and they don’t even realize it. And people can make any claims they want about what their algorithm can or can’t do, even if it’s absolute nonsense, and no one can really stop them from doing it.
And even if a particular algorithm works, there is no one assessing whether or not it is providing a net benefit or cost to society. There’s nobody doing any of those checks. We need an equivalent to the FDA, some agency that can protect the intellectual property of a company that comes up with an algorithm but also ensure that the public isn’t being harmed or violated in any way by it.
Sean Illing
At the end of the day, are algorithms solving more problems for human beings than they’re creating?
Hannah Fry
Yes, I think they’re solving more problems than they’re creating. I’m still mostly positive about this stuff. I’ve worked in this area for over a decade, and there are huge upsides to these technologies. Algorithms are being used to help prevent crimes and help doctors get more accurate cancer diagnoses, and in countless other ways.
All of these things are really, really positive steps forward for humanity. We just have to be careful in the way that we employ them. We can’t do it recklessly. We can’t just move fast, and we can’t break things.
Original Source -> How algorithms are controlling your life
via The Conservative Brief
0 notes
kristinsimmons · 7 years ago
Text
Artificial Intelligence and Deep Learning For the Extremely Confused
By STEPHEN BORSTELMANN, MD
Artificial Intelligence is at peak buzzword: it elicits either the euphoria of a technological paradise with anthropomorphic robots to tidy up after us, or fears of hostile machines breaking the human spirit in a world without hope. Both are fiction.
The Artificial Intelligences of our reality are those of Machine Learning and Deep Learning. Let’s make it simple: both are AI – but not the AI of fiction. Instead, these are limited intelligences capable of only the task they are created for: “weak” or “narrow” AI. Machine Learning is essentially applied Statistics, excellently explained in Hastie and Tibshirani’s Introduction to Statistical Learning. Machine Learning is a more mature field, with more practitioners, and a deeper body of evidence and experience.
Deep Learning is a different animal – a hybrid of Computer Science and Statistics, using networks defined in computer code. Deep Learning isn’t entirely new – Yann LeCun’s 1998 LeNet network was used for optically recognizing 10% of US checks.   But the compute power necessary for other image recognition tasks would require an additional decade. Sensationalism by overly optimistic press releases co-exists with establishment inertia and claims of “black box” opacity. For the non-practitioner, it is very difficult to know what to believe, with confusion the rule.
A game of chance
Inspiration is found in an unlikely place – the carnival sideshow where one can find Plinko: a game of chance. In Plinko balls or discs travel through a field of metal pins and land in slots at the bottom. With evenly placed pins and a center start, the probability of landing in the center slots is highest, and the side slots lowest.   The University of Colorado’s PHET project has a fantastic simulation of Plinko you can run yourself. If you played the game 10,000 times counting how many balls land in each slot, the accumulated pattern would look like this:
It should look familiar – it’s a textbook bell curve – the Gaussian Normal distribution that terrorized us in high school math. Its usually good enough to solve many basic Machine Learning problems – as long as the balls are all the same. But what if the balls are different – green, blue, red? How can we get the red balls to go into the red slot? That’s a classification problem. We can’t solely rely upon the Normal distribution to sort balls by color.
So, let’s make our Plinko game board computerized, with a camera and the ability to bump the board slightly left or right to guide the ball more towards the correct color slot. There is still an element of randomness, but as the ball descends through the array of pins, the repeated bumps nudges it into the correct slot.
The colored balls are our data, and the Plinko board is our AI.
One Matrix to rule them all
For those still fearing being ruled by an all-powerful artificial intelligence, meet your master:
Are you terrified beyond comprehension yet?
Math can be scary – when you’re in middle school. Matrix Math or Linear Algebra is a tool for solving many similar problems simultaneously and quickly. Without getting too technical, matrices can represent many different similar equations, like we would find in the layers of an AI model. Its behind the AI’s that use Deep Learning, and partly responsible for the “magic”.
This ‘magic’ happened because of serendipitous parallel advances in Computer Science and Statistics, and similar advances in processor speed, memory, and storage. Reduced Instruction Set Chips (RISCs) allowed Graphics Processing Units (GPU’s) capable of performing fast parallel operations on graphics like scaling, rotations, and reflections. These are affine transformations. It turns out that you can define a shape as a matrix, apply matrix multiplication to it, and end up with an affine transformation. Precisely the calculations used in Deep Learning.
The watershed moment in Deep Learning is typically cited as 2012’s AlexNet, by Alex Krizhevsky and Geoffrey Hinton, a state of the art GPU accelerated Deep Learning network that won that year’s Imagenet Large Scale Visual Recognition Challenge (ILSVRC) by a large margin. Thereafter, other GPU accelerated Deep Learning algorithms consistently outperformed all others.
Remember our colored ball-sorter? Turn the board on its side, and it looks suspiciously like a deep neural network, with each pin representing a point, or node, in the network. A Deep neural network can also be named a Multi-Layer Perceptron (MLP) or an Artificial Neural Network (ANN) . Both are a layer of software “neurons” followed by 0-4 layers of “hidden” neurons which output to a final neuron. The output neuron typically will give an output of a probability, from 0 to 1.0, or 0% to 100% if you prefer.
The “hidden” layers are hidden because their output is not displayed. Feed the ANN an input, and the output probability pops out. This is why ANN’s are called “black boxes” – you don’t routinely evaluate the inner states, leading many to incorrectly deem them “incomprehensible” and “opaque”. There are ways to view the inner layers (but they may not be as enlightening as hoped).
Everything Old is New Again
The biggest problem was getting the network to work. A one-layer MLP was created in the 1940’s. You could only travel forward through the network (feed forward), updating the values of each and every neuron individually via a brute-force method. It was so computationally expensive with 1940-1960’s technology that it was unrealistic for larger models. And that was the end of that. For a few decades. But smart mathematicians kept working, and had a realization.
If we know the inputs and the outputs of a neural net, we can do some maneuvering. A network can be modeled as a number of Matrix operations, representing a series of equations (Y=mX+b, anyone?). Because we know both inputs & outputs, that matrix is differentiable; i.e. the slope(m), or first derivative, is solvable. That first derivative is named the gradient. Application of Calculus’ chain rule enables the gradient of the network to be calculated in a backward pass. This is Backpropagation. Hold that thought – and my beer – for a moment.
By the way, while Backpropagation was solved in the 1960’s, it was not applied to AI until the mid 1980’s. The 50’s-80’s are often referred to as the First AI ‘winter’.
Go back to Plinko; but turn it upside down. This time, we won’t need to nudge it. Instead, let’s color the balls with a special paint – its wet, so it comes off on any pin it touches, and its magnetic, only attracting balls of the same color. Feeding the colored balls from their respective slots, they’ll run down by gravity, colored paint rubbing off on the pins they touch. The balls then exit from the apex of the triangle. It would look suspiciously like Figure 5, rotated 90 degrees clockwise.
After running many wet balls through, looking at our board, the pins closest to the green slot are the greenest, pins closest to the red slot reddest, and the same for blue. Mid-level pins in front of red and blue become purple, and mid-level pins in front of blue and green become cyan. At the apex, from mixing the green, red, and blue paint the pins are a muddy color. The amount of specific color paint deposited on the pin depends on how many balls of that color hit that individual pin on their random path out. Therefore, each pin has a certain amount of red, green, and/or blue colored paint on it. We actually just trained our Plinko board to sort colored balls!
Turn the model rightside up and feed it a green paint colored ball in from the apex of the pyramid. Let’s make the special magnetic paint dry this time.
The ball bounces around, but it is generally attracted to the pins with more green paint. As it passes down the layers of pins, it orients first towards the cyan pins, then those cyan pins with the most green shading, then the purely green pins before falling in the green slot. We can repeat the experiment with blue or red balls, and they will sort similarly.
The pins are the nodes, or neurons in our Deep Learning network, and the amount of paint of each color is the weight of that particular node.
Sharpen your knives, for here comes the meat.
Let’s look at an ANN, like the one in figure 5. Each neuron, or node, in the network will have a numerical value, a weight assigned to it. When our neural network is fully optimized, or trained, these weights will allow us to correctly sort, or classify the inputs.   There is a constant, the bias, that also contributes to every layer.
Remember the algebraic Y=mX+b equation?     Here is its deep learning equivalent:
The overly simplified neural network equation has W representing the weights, and B the bias for a given input X. Y is the output. As both the weights W and the input X are matrices, they are multiplied by a special operator called a Dot Product. Without getting too technical, the dot product is multiplying matrices in such a way that their dimensions are maintained and their similarities are grown/enhanced.
In figure 5 above, the bias is a circle on top of each layer with a “1” inside. That value of 1 avoids multiplying by zero, which would clear out our algorithm. Bias is actually the output of the neural network when the input is zero. Why is it important? Bias allows us to solve the Backpropagation algorithm by solving for the network’s gradients. The network’s gradients will allow us to optimize the weights of our network by a process known as gradient descent.
On a forward pass through the network, everything depends on the loss function. The loss function is simply a mathematical distance between two data points: X2-X1. Borrowing the old adage, “birds of a feather flock together”, data points with small distances between each other will tend to belong to the same group, or class, and data points with a distance more akin to Kansas and Phuket will have no relationship. It is more typical to use a loss function such as a root mean squared function, but many exist.
First, let’s randomize all the weights of our neural network before starting and avoid zeroes and ones which can cause our gradients to prematurely get too small (vanishing gradients) or too large (exploding gradients).
To train our network, a known (labeled) input runs forward through the network. On this randomly initialized network, we know this first output (Y%) will be garbage – but that’s OK! Knowing what this input’s label is – its ground truth – we will now calculate the loss. The loss is the difference between 100% and the output Y, i.e. (100%-Y%).
We want to minimize that loss; to try to get it as close to zero as possible. That will indicate that our neural network is classifying our inputs perfectly – outputting a probability of 100% (zero uncertainty) for a known item. To do so, we are going to adjust the weights in the network – but how? Recall Backpropagation. By calculating the gradients of the network, we can adjust the weights of the network in a small step-wise fashion away from the gradient, which is towards zero. This is stochastic gradient descent and the small step-wise amount is the learning rate. This should decrease the loss and yield a more accurate output prediction on the next run through, or iteration, of the network on that same data. Each input is an opportunity to adjust, or learn, the best weights. And typically you will iterate over each of these inputs 10, 20, 100 (or more) times, or epochs, each time driving the loss down and adjusting the weights in your network to be more accurate in classifying the training data.
Alas, perfection has its drawbacks. There are many nuances here. The most important is to avoid overfitting the network too closely to the training data; a common cause of real-world application failure. To avoid this, datasets are usually separated into training, validation, and test datasets. The training dataset teaches your model, the validation dataset helps prevent overfitting, and the test dataset is only used once for final measurement of accuracy at the end.
One of the more interesting features of deep learning is that deep learning algorithms, when designed in a layered, hierarchical manner, exhibit essentially self-organizing behavior. In a 2013 study on images, Zeiler and Fergus (1) showed that lower levels in the algorithm focused on lines, corners, and colors. The middle levels focused on circles, ovals, and rectangles. And the higher levels would synthesize complex abstractions – a wheel on a car, the eyes of a dog.
Why this was so exciting was prior Visual Evoked Potentials on the primary visual cortex of a cat showed activations by simple shapes uncannily similar to the appearance of the first level of the algorithm, suggesting this organizing principle is present both in nature and AI.
Evolution is thus contingent on… variation and selection (attr. Ernst Mayer)
ANN’s/MLP’s aren’t that useful in practice as they don’t handle variation well – i.e. your test samples must match the training data exactly. However, by changing the hidden layers, things get interesting. An operation called a convolution can be applied to the data in an ANN. The input data is arranged into a matrix, and then gone over stepwise with a smaller window, which performs a dot product on the underlying data.
For example, take an icon, 32 pixels by 32 pixels with 3 color channels to that image (R-G-B). We take that data, arrange it into a 32x32x3 matrix, and then convolve over the matrix with a 3×3 window. This transforms our 32×32 matrix into a 16×16 matrix, 6 deep. The process of convolving creates multiple filters – areas of pattern recognition. In training, these layers self-organize to activate on similar patterns found within the training images.
Multiple convolutions are generally performed, each time halving the size of the matrix while increasing its depth. An operation called a MaxPool is frequently performed after a series of convolutions to force the model to associate these narrow windowed representations to the larger data set (an image, in this case) by downsampling.
This Deep Learning network composed of convolutional layers is the Convolutional Neural Network (CNN). CNN’s are particularly well suited to image classification, but can also be used in voice recognition or regression tasks, learning both variation and selectivity, with some limitations. Recent published research has claimed human level performance in medical image identification. (4) CNN’s are powerful, with convolutional layers assembling simple blocks of data into more complex and abstract representations as the number of layers increases. These complex and abstract representations can then be identified anywhere in the image.
One drawback to CNN’s is that increasing model power requires increased model depth. This increases the number of parameters in the model, lengthening training time and predisposing to the vanishing gradient problem, where gradients disappear and the model stalls in stochastic gradient descent, failing to converge. The introduction of Residual Networks in 2015 (ResNets) solved some of the problems with increasing network depth, as residual connections (seen above in a DenseNet) allow backpropagation to take a gradient from the last layer and follow it through all the way to the first layer. Recognition that CNN’s are agnostic to position, but not orientation is important to note. Capsule Networks were recently proposed to address orientation limitations of CNN’s.
The Convolutional network is one of the easier Deep Learning algorithms to peer inside. Figure 7 does exactly that, using a deconvolutional network to show what selected levels of the algorithm are “seeing”. While these patterns are interesting, they may not be easily evident depending upon the learning set. To that aim, GRAD-CAM models based on the last convolutional layer before the output have been designed, producing a heatmap to explain why the CNN chose the classification it did. This was a test on ImageNet data for the “lion” classifier:
There are quite a number of Convolutional Neural Networks available for experimentation. ILSVRC winners like AlexNet, VGG-16, ResNet-152, GoogLeNet, Inception, DenseNets, U-Nets are most commonly used, with newer networks like NAS-Net and Se-Net approaching state of the art (SOTA). While a discussion of the programming languages and hardware requirements to run neural networks is beyond the scope of this work, a guide to building a deep learning computer is available on the net, and many investigators use the Python programming language with PyTorch or Tensorflow and its slightly easier to use cousin, Keras.
Sequenced or temporal data needs a different algorithm – a LSTM (Long-Short-Term Memory), which is one of the Recurrent Neural Networks (RNN’s). RNN’s feed their computed output back into themselves. The LSTM module feeds information into itself in two ways – a short term input, predicated only on the prior iteration; and a long term input, re-using older computations. This particular algorithm is particularly well suited to such as text analysis, Natural Language Processing (NLP), and image captioning. There is a great deal of unstructured textual data in medicine – RNN’s performing NLP will probably be part of that solution. The main problem with RNN’s is their recurrent, iterative nature. Training can be lengthy – 100x as long as a CNN. Google’s language translation engine reportedly uses a LSTM seven layers deep, the training of which must have been immense in time and data resources. RNN’s are generally considered an advanced topic in deep learning.
Another advanced topic are Generative Adversarial Networks (GAN’s): two neural networks in parallel, one of which generates simulated data, and the other of which evaluates or discriminates that data in a competitive, or adversarial fashion. The generator generates data to pass the discriminator. As the discriminator is fed more data by the generator, it becomes better at discriminating. So both spur higher achievement until the discriminator can no longer tell that the generator’s simulations are fake. GAN’s use in healthcare appear to be mostly for simulating data, but the possibility of pharmaceutical design and drug discovery has been proposed as a task for GAN’s. GAN’s are used in style transfer algorithms for computer art, as well as creating fake celebrity photos and videos.
Deep reinforcement learning (RL) is briefly mentioned – it is an area of intense investigation and appears useful in temporal prediction. However, few healthcare applications have been attempted with RL. In general, RL is difficult to work with and still mostly experimental.
Finally, not every problem in medicine needs a deep learning classifier applied to it. For many applications, simple rules and linear models work reasonably well. Traditional supervised machine learning (i.e. applied statistics) is still a reasonable choice for rapid development of models, namely techniques such as dimension reduction, principal component analysis (PCA), Random Forests (RF), Support Vector Machines (SVM) and Extreme Gradient Boosting (XGBoost).   These analyses are often done not with the previously mentioned software, but with a freely available language called ‘R’. The tradeoff between the large amount of sample data, compute resources, and parameter tuning a deep learning network requires vs. a simpler method which can work very well with limited data should be considered.  Ensembles utilizing multiple deep learning algorithms combined with machine learning methods can be very powerful.
My Brain is the key that sets me free. – Attr. Harry Houdini
Magic is what deep learning has been compared to, with its feats of accurate image and facial recognition, voice transcription and language translation. This is inevitably followed by the fictive “there’s no way of understanding what the black box is thinking”. While the calculations required to understand deep learning are repetitive and massive, they are not beyond human comprehension nor inhumanly opaque. If these entities have now been demystified for you, I have done my job well.  Deep Learning remains an active area of research for me, and I learn new things every day as the field advances rapidly.
Is deep learning magic? No. I prefer to think of it as alchemy – turning data we once considered dross into modern day gold.
        Visualizing and Understanding Convolutional Networks, MD Zeiler and R Fergus, ECCV 2014 part I LNCS 8689, pp 818-833, 2014.
DH Hubel and TN Weisel J Physiol. 1959 Oct; 148(3): 574–591.
G Huang, Z Liu, L van der Maaten et al Densely Connected Convolutional Networks arXiv:1608.06993
P Rajpurkar, J Irvin, K Zhu, et al. ChexNet: Radiologist-level Pneumonia Detection on Chest X-rays with Deep Learning. arXiv:1711.05225 [cs.CV]
  Artificial Intelligence and Deep Learning For the Extremely Confused published first on https://wittooth.tumblr.com/
0 notes
spottypotatostuff · 8 years ago
Text
How to Build Beautiful 3-D Fractals Out of the Simplest Equations
Ordinary equations can be transformed into complex 3-D figures that offer new questions to explore.Olena Shmahalo/Quanta Magazine; original figure by Laurent Bartholdi and Laura DeMarco
If you came across an animal in the wild and wanted to learn more about it, there are a few things you might do: You might watch what it eats, poke it to see how it reacts, and even dissect it if you got the chance.
Mathematicians are not so different from naturalists. Rather than studying organisms, they study equations and shapes using their own techniques. They twist and stretch mathematical objects, translate them into new mathematical languages, and apply them to new problems. As they find new ways to look at familiar things, the possibilities for insight multiply.
That’s the promise of a new idea from two mathematicians: Laura DeMarco, a professor at Northwestern University, and Kathryn Lindsey, a postdoctoral fellow at the University of Chicago. They begin with a plain old polynomial equation, the kind grudgingly familiar to any high school math student: f(x) = x2 – 1. Instead of graphing it or finding its roots, they take the unprecedented step of transforming it into a 3-D object.
With polynomials, “everything is defined in the two-dimensional plane,” Lindsey said. “There isn’t a natural place a third dimension would come into it until you start thinking about these shapes Laura and I are building.”
The 3-D shapes that they build look strange, with broad plains, subtle bends and a zigzag seam that hints at how the objects were formed. DeMarco and Lindsey introduce the shapes in a forthcoming paper in the Arnold Mathematical Journal, a new publication from the Institute for Mathematical Sciences at Stony Brook University. The paper presents what little is known about the objects, such as how they’re constructed and the measurements of their curvature. DeMarco and Lindsey also explain what they believe is a promising new method of inquiry: Using the shapes built from polynomial equations, they hope to come to understand more about the underlying equations—which is what mathematicians really care about.
Breaking Out of Two Dimensions
In mathematics, several motivating factors can spur new research. One is the quest to solve an open problem, such as the Riemann hypothesis. Another is the desire to build mathematical tools that can be used to do something else. A third—the one behind DeMarco and Lindsey’s work—is the equivalent of finding an unidentified species in the wild: One just wants to understand what it is. “These are fascinating and beautiful things that arise very naturally in our subject and should be understood!” DeMarco said by email, referring to the shapes.
Laura DeMarco, a professor at Northwestern University.Courtesy of Laura DeMarco
“It’s sort of been in the air for a couple of decades, but they’re the first people to try to do something with it,” said Curtis McMullen, a mathematician at Harvard University who won the Fields Medal, math’s highest honor, in 1988. McMullen and DeMarco started talking about these shapes in the early 2000s, while she was doing graduate work with him at Harvard. DeMarco then went off to do pioneering work applying techniques from dynamical systems to questions in number theory, for which she will receive the Satter Prize—awarded to a leading female researcher—from the American Mathematical Society on January 5.
Meanwhile, in 2010 William Thurston, the late Cornell University mathematician and Fields Medal winner, heard about the shapes from McMullen. Thurston suspected that it might be possible to take flat shapes computed from polynomials and bend them to create 3-D objects. To explore this idea, he and Lindsey, who was then a graduate student at Cornell, constructed the 3-D objects from construction paper, tape and a precision cutting device that Thurston had on hand from an earlier project. The result wouldn’t have been out of place at an elementary school arts and crafts fair, and Lindsey admits she was kind of mystified by the whole thing.
“I never understood why we were doing this, what the point was and what was going on in his mind that made him think this was really important,” said Lindsey. “Then unfortunately when he died, I couldn’t ask him anymore. There was this brilliant guy who suggested something and said he thought it was an important, neat thing, so it’s natural to wonder ‘What is it? What’s going on here?’”
In 2014 DeMarco and Lindsey decided to see if they could unwind the mathematical significance of the shapes.
A Fractal Link to Entropy
To get a 3-D shape from an ordinary polynomial takes a little doing. The first step is to run the polynomial dynamically—that is, to iterate it by feeding each output back into the polynomial as the next input. One of two things will happen: either the values will grow infinitely in size, or they’ll settle into a stable, bounded pattern. To keep track of which starting values lead to which of those two outcomes, mathematicians construct the Julia set of a polynomial. The Julia set is the boundary between starting values that go off to infinity and values that remain bounded below a given value. This boundary line—which differs for every polynomial—can be plotted on the complex plane, where it assumes all manner of highly intricate, swirling, symmetric fractal designs.
Lucy Reading-Ikkanda/Quanta Magazine
If you shade the region bounded by the Julia set, you get the filled Julia set. If you use scissors and cut out the filled Julia set, you get the first piece of the surface of the eventual 3-D shape. To get the second, DeMarco and Lindsey wrote an algorithm. That algorithm analyzes features of the original polynomial, like its degree (the highest number that appears as an exponent) and its coefficients, and outputs another fractal shape that DeMarco and Lindsey call the “planar cap.”
“The Julia set is the base, like the southern hemisphere, and the cap is like the top half,” DeMarco said. “If you glue them together you get a shape that’s polyhedral.”
The algorithm was Thurston’s idea. When he suggested it to Lindsey in 2010, she wrote a rough version of the program. She and DeMarco improved on the algorithm in their work together and “proved it does what we think it does,” Lindsey said. That is, for every filled Julia set, the algorithm generates the correct complementary piece.
The filled Julia set and the planar cap are the raw material for constructing a 3-D shape, but by themselves they don’t give a sense of what the completed shape will look like. This creates a challenge. When presented with the six faces of a cube laid flat, one could intuitively know how to fold them to make the correct 3-D shape. But, with a less familiar two-dimensional surface, you’d be hard-pressed to anticipate the shape of the resulting 3-D object.
“There’s no general mathematical theory that tells you what the shape will be if you start with different types of polygons,” Lindsey said.
Mathematicians have precise ways of defining what makes a shape a shape. One is to know its curvature. Any 3-D object without holes has a total curvature of exactly 4π; it’s a fixed value in the same way any circular object has exactly 360 degrees of angle. The shape—or geometry—of a 3-D object is completely determined by the way that fixed amount of curvature is distributed, combined with information about distances between points. In a sphere, the curvature is distributed evenly over the entire surface; in a cube, it’s concentrated in equal amounts at the eight evenly spaced vertices.
A unique attribute of Julia sets allows DeMarco and Lindsey to know the curvature of the shapes they’re building. All Julia sets have what’s known as a “measure of maximal entropy,” or MME. The MME is a complicated concept, but there is an intuitive (if slightly incomplete) way to think about it. First, picture a two-dimensional filled Julia set on the plane. Then picture a point on the same plane but very far outside the Julia set’s boundary (infinitely far, in fact). From that distant location the point is going to take a random walk across two-dimensional space, meandering until it strikes the Julia set. Wherever it first strikes the Julia set is where it comes to rest.
The MME is a way of quantifying the fact that the meandering point is more likely to strike certain parts of the Julia set than others. For example, the meandering point is more likely to strike a spike in the Julia set that juts out into the plane than it is to intersect with a crevice tucked into a region of the set. The more likely the meandering point is to hit a point on the Julia set, the higher the MME is at that point.
In their paper, DeMarco and Lindsey demonstrated that the 3-D objects they build from Julia sets have a curvature distribution that’s exactly proportional to the MME. That is, if there’s a 25 percent chance the meandering point will hit a particular place on the Julia set first, then 25 percent of the curvature should also be concentrated at that point when the Julia set is joined with the planar cap and folded into a 3-D shape.
“If it was really easy for the meandering point to hit some area on our Julia set we’d want to have a lot of curvature at the corresponding point on the 3-D object,” Lindsey said. “And if it was harder to hit some area on our Julia set, we’d want the corresponding area in the 3-D object to be kind of flat.”
This is useful information, but it doesn’t get you as far as you’d think. If given a two-dimensional polygon, and told exactly how its curvature should be distributed, there’s still no mathematical way to identify exactly where you need to fold the polygon to end up with the right 3-D shape. Because of this, there’s no way to completely anticipate what that 3-D shape will look like.
“We know how sharp and pointy the shape has to be, in an abstract, theoretical sense, and we know how far apart the crinkly regions are, again in an abstract, theoretical sense, but we have no idea how to visualize it in three dimensions,” DeMarco explained in an email.
She and Lindsey have evidence of the existence of a 3-D shape, and evidence of some of that shape’s properties, but no ability yet to see the shape. They are in a position similar to that of astronomers who detect an unexplained stellar wobble that hints at the existence of an exoplanet: The astronomers know there has to be something else out there and they can estimate its mass. Yet the object itself remains just out of view.
A Folding Strategy
Thus far, DeMarco and Lindsey have established basic details of the 3-D shape: They know that one 3-D object exists for every polynomial (by way of its Julia set), and they know the object has a curvature exactly given by the measure of maximal entropy. Everything else has yet to be figured out.
In particular, they’d like to develop a mathematical understanding of the “bending laminations,” or lines along which a flat surface can be folded to create a 3-D object. The question occurred early on to Thurston, too, who wrote to McMullen in 2010, “I wonder how hard it is to compute or characterize the pair of bending laminations, for the inside and the outside, and what they might tell us about the geometry of the Julia set.”
Kathryn Lindsey, a mathematician at the University of Chicago.Courtesy of Kathryn Lindsey
In this, DeMarco and Lindsey’s work is heavily influenced by the mid 20th-century mathematician Aleksandr Aleksandrov. Aleksandrov established that there is only one unique way of folding a given polygon to get a 3-D object. He lamented that it seemed impossible to mathematically calculate the correct folding lines. Today, the best strategy is often to make a best guess about where to fold the polygon—and then to get out scissors and tape to see if the estimate is right.
“Kathryn and I spent hours cutting out examples and gluing them ourselves,” DeMarco said.
DeMarco and Lindsey are currently trying to describe the folding lines on their particular class of 3-D objects, and they think they have a promising strategy. “Our working conjecture is that the folding lines, the bending laminations, can be completely described in terms of certain dynamical properties,” DeMarco said. Put another way, they hope that by iterating the underlying polynomial in the right way, they’ll be able to identify the set of points along which the folding line occurs.
From there, possibilities for exploration are numerous. If you know the folding lines associated to the polynomial f(x) = x2– 1, you might then ask what happens to the folding lines if you change the coefficients and consider f(x) = x2 – 1.1. Do the folding lines of the two polynomials differ a little, a lot or not at all?
“Certain polynomials might have similar bending laminations, and that would tell us all these polynomials have something in common, even if on the surface they don’t look like they have anything in common,” Lindsey said.
It’s a bit early to think about all of this, however. DeMarco and Lindsey have found a systematic way to think about polynomials in 3-D terms, but whether that perspective will answer important questions about those polynomials is unclear.
“I would even characterize it as being sort of playful at this stage,” McMullen said, adding, “In a way that’s how some of the best mathematical research proceeds—you don’t know what something is going to be good for, but it seems to be a feature of the mathematical landscape.”
Source: How to Build Beautiful 3-D Fractals Out of the Simplest Equations
The post How to Build Beautiful 3-D Fractals Out of the Simplest Equations appeared on Spotty Potato.
0 notes
kristinsimmons · 7 years ago
Text
Artificial Intelligence and Deep Learning For the Extremely Confused
By STEPHEN BORSTELMANN, MD
Artificial Intelligence is at peak buzzword: it elicits either the euphoria of a technological paradise with anthropomorphic robots to tidy up after us, or fears of hostile machines breaking the human spirit in a world without hope. Both are fiction.
The Artificial Intelligences of our reality are those of Machine Learning and Deep Learning. Let’s make it simple: both are AI – but not the AI of fiction. Instead, these are limited intelligences capable of only the task they are created for: “weak” or “narrow” AI. Machine Learning is essentially applied Statistics, excellently explained in Hastie and Tibshirani’s Introduction to Statistical Learning. Machine Learning is a more mature field, with more practitioners, and a deeper body of evidence and experience.
Deep Learning is a different animal – a hybrid of Computer Science and Statistics, using networks defined in computer code. Deep Learning isn’t entirely new – Yann LeCun’s 1998 LeNet network was used for optically recognizing 10% of US checks.   But the compute power necessary for other image recognition tasks would require an additional decade. Sensationalism by overly optimistic press releases co-exists with establishment inertia and claims of “black box” opacity. For the non-practitioner, it is very difficult to know what to believe, with confusion the rule.
A game of chance
Inspiration is found in an unlikely place – the carnival sideshow where one can find Plinko: a game of chance. In Plinko balls or discs travel through a field of metal pins and land in slots at the bottom. With evenly placed pins and a center start, the probability of landing in the center slots is highest, and the side slots lowest.   The University of Colorado’s PHET project has a fantastic simulation of Plinko you can run yourself. If you played the game 10,000 times counting how many balls land in each slot, the accumulated pattern would look like this:
It should look familiar – it’s a textbook bell curve – the Gaussian Normal distribution that terrorized us in high school math. Its usually good enough to solve many basic Machine Learning problems – as long as the balls are all the same. But what if the balls are different – green, blue, red? How can we get the red balls to go into the red slot? That’s a classification problem. We can’t solely rely upon the Normal distribution to sort balls by color.
So, let’s make our Plinko game board computerized, with a camera and the ability to bump the board slightly left or right to guide the ball more towards the correct color slot. There is still an element of randomness, but as the ball descends through the array of pins, the repeated bumps nudges it into the correct slot.
The colored balls are our data, and the Plinko board is our AI.
One Matrix to rule them all
For those still fearing being ruled by an all-powerful artificial intelligence, meet your master:
Are you terrified beyond comprehension yet?
Math can be scary – when you’re in middle school. Matrix Math or Linear Algebra is a tool for solving many similar problems simultaneously and quickly. Without getting too technical, matrices can represent many different similar equations, like we would find in the layers of an AI model. Its behind the AI’s that use Deep Learning, and partly responsible for the “magic”.
This ‘magic’ happened because of serendipitous parallel advances in Computer Science and Statistics, and similar advances in processor speed, memory, and storage. Reduced Instruction Set Chips (RISCs) allowed Graphics Processing Units (GPU’s) capable of performing fast parallel operations on graphics like scaling, rotations, and reflections. These are affine transformations. It turns out that you can define a shape as a matrix, apply matrix multiplication to it, and end up with an affine transformation. Precisely the calculations used in Deep Learning.
The watershed moment in Deep Learning is typically cited as 2012’s AlexNet, by Alex Krizhevsky and Geoffrey Hinton, a state of the art GPU accelerated Deep Learning network that won that year’s Imagenet Large Scale Visual Recognition Challenge (ILSVRC) by a large margin. Thereafter, other GPU accelerated Deep Learning algorithms consistently outperformed all others.
Remember our colored ball-sorter? Turn the board on its side, and it looks suspiciously like a deep neural network, with each pin representing a point, or node, in the network. A Deep neural network can also be named a Multi-Layer Perceptron (MLP) or an Artificial Neural Network (ANN) . Both are a layer of software “neurons” followed by 0-4 layers of “hidden” neurons which output to a final neuron. The output neuron typically will give an output of a probability, from 0 to 1.0, or 0% to 100% if you prefer.
The “hidden” layers are hidden because their output is not displayed. Feed the ANN an input, and the output probability pops out. This is why ANN’s are called “black boxes” – you don’t routinely evaluate the inner states, leading many to incorrectly deem them “incomprehensible” and “opaque”. There are ways to view the inner layers (but they may not be as enlightening as hoped).
Everything Old is New Again
The biggest problem was getting the network to work. A one-layer MLP was created in the 1940’s. You could only travel forward through the network (feed forward), updating the values of each and every neuron individually via a brute-force method. It was so computationally expensive with 1940-1960’s technology that it was unrealistic for larger models. And that was the end of that. For a few decades. But smart mathematicians kept working, and had a realization.
If we know the inputs and the outputs of a neural net, we can do some maneuvering. A network can be modeled as a number of Matrix operations, representing a series of equations (Y=mX+b, anyone?). Because we know both inputs & outputs, that matrix is differentiable; i.e. the slope(m), or first derivative, is solvable. That first derivative is named the gradient. Application of Calculus’ chain rule enables the gradient of the network to be calculated in a backward pass. This is Backpropagation. Hold that thought – and my beer – for a moment.
By the way, while Backpropagation was solved in the 1960’s, it was not applied to AI until the mid 1980’s. The 50’s-80’s are often referred to as the First AI ‘winter’.
Go back to Plinko; but turn it upside down. This time, we won’t need to nudge it. Instead, let’s color the balls with a special paint – its wet, so it comes off on any pin it touches, and its magnetic, only attracting balls of the same color. Feeding the colored balls from their respective slots, they’ll run down by gravity, colored paint rubbing off on the pins they touch. The balls then exit from the apex of the triangle. It would look suspiciously like Figure 5, rotated 90 degrees clockwise.
After running many wet balls through, looking at our board, the pins closest to the green slot are the greenest, pins closest to the red slot reddest, and the same for blue. Mid-level pins in front of red and blue become purple, and mid-level pins in front of blue and green become cyan. At the apex, from mixing the green, red, and blue paint the pins are a muddy color. The amount of specific color paint deposited on the pin depends on how many balls of that color hit that individual pin on their random path out. Therefore, each pin has a certain amount of red, green, and/or blue colored paint on it. We actually just trained our Plinko board to sort colored balls!
Turn the model rightside up and feed it a green paint colored ball in from the apex of the pyramid. Let’s make the special magnetic paint dry this time.
The ball bounces around, but it is generally attracted to the pins with more green paint. As it passes down the layers of pins, it orients first towards the cyan pins, then those cyan pins with the most green shading, then the purely green pins before falling in the green slot. We can repeat the experiment with blue or red balls, and they will sort similarly.
The pins are the nodes, or neurons in our Deep Learning network, and the amount of paint of each color is the weight of that particular node.
Sharpen your knives, for here comes the meat.
Let’s look at an ANN, like the one in figure 5. Each neuron, or node, in the network will have a numerical value, a weight assigned to it. When our neural network is fully optimized, or trained, these weights will allow us to correctly sort, or classify the inputs.   There is a constant, the bias, that also contributes to every layer.
Remember the algebraic Y=mX+b equation?     Here is its deep learning equivalent:
The overly simplified neural network equation has W representing the weights, and B the bias for a given input X. Y is the output. As both the weights W and the input X are matrices, they are multiplied by a special operator called a Dot Product. Without getting too technical, the dot product is multiplying matrices in such a way that their dimensions are maintained and their similarities are grown/enhanced.
In figure 5 above, the bias is a circle on top of each layer with a “1” inside. That value of 1 avoids multiplying by zero, which would clear out our algorithm. Bias is actually the output of the neural network when the input is zero. Why is it important? Bias allows us to solve the Backpropagation algorithm by solving for the network’s gradients. The network’s gradients will allow us to optimize the weights of our network by a process known as gradient descent.
On a forward pass through the network, everything depends on the loss function. The loss function is simply a mathematical distance between two data points: X2-X1. Borrowing the old adage, “birds of a feather flock together”, data points with small distances between each other will tend to belong to the same group, or class, and data points with a distance more akin to Kansas and Phuket will have no relationship. It is more typical to use a loss function such as a root mean squared function, but many exist.
First, let’s randomize all the weights of our neural network before starting and avoid zeroes and ones which can cause our gradients to prematurely get too small (vanishing gradients) or too large (exploding gradients).
To train our network, a known (labeled) input runs forward through the network. On this randomly initialized network, we know this first output (Y%) will be garbage – but that’s OK! Knowing what this input’s label is – its ground truth – we will now calculate the loss. The loss is the difference between 100% and the output Y, i.e. (100%-Y%).
We want to minimize that loss; to try to get it as close to zero as possible. That will indicate that our neural network is classifying our inputs perfectly – outputting a probability of 100% (zero uncertainty) for a known item. To do so, we are going to adjust the weights in the network – but how? Recall Backpropagation. By calculating the gradients of the network, we can adjust the weights of the network in a small step-wise fashion away from the gradient, which is towards zero. This is stochastic gradient descent and the small step-wise amount is the learning rate. This should decrease the loss and yield a more accurate output prediction on the next run through, or iteration, of the network on that same data. Each input is an opportunity to adjust, or learn, the best weights. And typically you will iterate over each of these inputs 10, 20, 100 (or more) times, or epochs, each time driving the loss down and adjusting the weights in your network to be more accurate in classifying the training data.
Alas, perfection has its drawbacks. There are many nuances here. The most important is to avoid overfitting the network too closely to the training data; a common cause of real-world application failure. To avoid this, datasets are usually separated into training, validation, and test datasets. The training dataset teaches your model, the validation dataset helps prevent overfitting, and the test dataset is only used once for final measurement of accuracy at the end.
One of the more interesting features of deep learning is that deep learning algorithms, when designed in a layered, hierarchical manner, exhibit essentially self-organizing behavior. In a 2013 study on images, Zeiler and Fergus (1) showed that lower levels in the algorithm focused on lines, corners, and colors. The middle levels focused on circles, ovals, and rectangles. And the higher levels would synthesize complex abstractions – a wheel on a car, the eyes of a dog.
Why this was so exciting was prior Visual Evoked Potentials on the primary visual cortex of a cat showed activations by simple shapes uncannily similar to the appearance of the first level of the algorithm, suggesting this organizing principle is present both in nature and AI.
Evolution is thus contingent on… variation and selection (attr. Ernst Mayer)
ANN’s/MLP’s aren’t that useful in practice as they don’t handle variation well – i.e. your test samples must match the training data exactly. However, by changing the hidden layers, things get interesting. An operation called a convolution can be applied to the data in an ANN. The input data is arranged into a matrix, and then gone over stepwise with a smaller window, which performs a dot product on the underlying data.
For example, take an icon, 32 pixels by 32 pixels with 3 color channels to that image (R-G-B). We take that data, arrange it into a 32x32x3 matrix, and then convolve over the matrix with a 3×3 window. This transforms our 32×32 matrix into a 16×16 matrix, 6 deep. The process of convolving creates multiple filters – areas of pattern recognition. In training, these layers self-organize to activate on similar patterns found within the training images.
Multiple convolutions are generally performed, each time halving the size of the matrix while increasing its depth. An operation called a MaxPool is frequently performed after a series of convolutions to force the model to associate these narrow windowed representations to the larger data set (an image, in this case) by downsampling.
This Deep Learning network composed of convolutional layers is the Convolutional Neural Network (CNN). CNN’s are particularly well suited to image classification, but can also be used in voice recognition or regression tasks, learning both variation and selectivity, with some limitations. Recent published research has claimed human level performance in medical image identification. (4) CNN’s are powerful, with convolutional layers assembling simple blocks of data into more complex and abstract representations as the number of layers increases. These complex and abstract representations can then be identified anywhere in the image.
One drawback to CNN’s is that increasing model power requires increased model depth. This increases the number of parameters in the model, lengthening training time and predisposing to the vanishing gradient problem, where gradients disappear and the model stalls in stochastic gradient descent, failing to converge. The introduction of Residual Networks in 2015 (ResNets) solved some of the problems with increasing network depth, as residual connections (seen above in a DenseNet) allow backpropagation to take a gradient from the last layer and follow it through all the way to the first layer. Recognition that CNN’s are agnostic to position, but not orientation is important to note. Capsule Networks were recently proposed to address orientation limitations of CNN’s.
The Convolutional network is one of the easier Deep Learning algorithms to peer inside. Figure 7 does exactly that, using a deconvolutional network to show what selected levels of the algorithm are “seeing”. While these patterns are interesting, they may not be easily evident depending upon the learning set. To that aim, GRAD-CAM models based on the last convolutional layer before the output have been designed, producing a heatmap to explain why the CNN chose the classification it did. This was a test on ImageNet data for the “lion” classifier:
There are quite a number of Convolutional Neural Networks available for experimentation. ILSVRC winners like AlexNet, VGG-16, ResNet-152, GoogLeNet, Inception, DenseNets, U-Nets are most commonly used, with newer networks like NAS-Net and Se-Net approaching state of the art (SOTA). While a discussion of the programming languages and hardware requirements to run neural networks is beyond the scope of this work, a guide to building a deep learning computer is available on the net, and many investigators use the Python programming language with PyTorch or Tensorflow and its slightly easier to use cousin, Keras.
Sequenced or temporal data needs a different algorithm – a LSTM (Long-Short-Term Memory), which is one of the Recurrent Neural Networks (RNN’s). RNN’s feed their computed output back into themselves. The LSTM module feeds information into itself in two ways – a short term input, predicated only on the prior iteration; and a long term input, re-using older computations. This particular algorithm is particularly well suited to such as text analysis, Natural Language Processing (NLP), and image captioning. There is a great deal of unstructured textual data in medicine – RNN’s performing NLP will probably be part of that solution. The main problem with RNN’s is their recurrent, iterative nature. Training can be lengthy – 100x as long as a CNN. Google’s language translation engine reportedly uses a LSTM seven layers deep, the training of which must have been immense in time and data resources. RNN’s are generally considered an advanced topic in deep learning.
Another advanced topic are Generative Adversarial Networks (GAN’s): two neural networks in parallel, one of which generates simulated data, and the other of which evaluates or discriminates that data in a competitive, or adversarial fashion. The generator generates data to pass the discriminator. As the discriminator is fed more data by the generator, it becomes better at discriminating. So both spur higher achievement until the discriminator can no longer tell that the generator’s simulations are fake. GAN’s use in healthcare appear to be mostly for simulating data, but the possibility of pharmaceutical design and drug discovery has been proposed as a task for GAN’s. GAN’s are used in style transfer algorithms for computer art, as well as creating fake celebrity photos and videos.
Deep reinforcement learning (RL) is briefly mentioned – it is an area of intense investigation and appears useful in temporal prediction. However, few healthcare applications have been attempted with RL. In general, RL is difficult to work with and still mostly experimental.
Finally, not every problem in medicine needs a deep learning classifier applied to it. For many applications, simple rules and linear models work reasonably well. Traditional supervised machine learning (i.e. applied statistics) is still a reasonable choice for rapid development of models, namely techniques such as dimension reduction, principal component analysis (PCA), Random Forests (RF), Support Vector Machines (SVM) and Extreme Gradient Boosting (XGBoost).   These analyses are often done not with the previously mentioned software, but with a freely available language called ‘R’. The tradeoff between the large amount of sample data, compute resources, and parameter tuning a deep learning network requires vs. a simpler method which can work very well with limited data should be considered.  Ensembles utilizing multiple deep learning algorithms combined with machine learning methods can be very powerful.
My Brain is the key that sets me free. – Attr. Harry Houdini
Magic is what deep learning has been compared to, with its feats of accurate image and facial recognition, voice transcription and language translation. This is inevitably followed by the fictive “there’s no way of understanding what the black box is thinking”. While the calculations required to understand deep learning are repetitive and massive, they are not beyond human comprehension nor inhumanly opaque. If these entities have now been demystified for you, I have done my job well.  Deep Learning remains an active area of research for me, and I learn new things every day as the field advances rapidly.
Is deep learning magic? No. I prefer to think of it as alchemy – turning data we once considered dross into modern day gold.
        Visualizing and Understanding Convolutional Networks, MD Zeiler and R Fergus, ECCV 2014 part I LNCS 8689, pp 818-833, 2014.
DH Hubel and TN Weisel J Physiol. 1959 Oct; 148(3): 574–591.
G Huang, Z Liu, L van der Maaten et al Densely Connected Convolutional Networks arXiv:1608.06993
P Rajpurkar, J Irvin, K Zhu, et al. ChexNet: Radiologist-level Pneumonia Detection on Chest X-rays with Deep Learning. arXiv:1711.05225 [cs.CV]
  Artificial Intelligence and Deep Learning For the Extremely Confused published first on https://wittooth.tumblr.com/
0 notes