preciousinformation - Tumblr blog

preciousinformation · 4 years ago

Text

Oppdrag humor

Wow! JA! Jeg fant et nytt oppdrag.

Putte mer humor inn i akademia og gjøre det allment akseptert å ha det gøy på jobb.

WHY SO F*in serious!!!

0 notes

preciousinformation · 4 years ago

Text

Markov Chain Monte Carlo method

Smart thing.... This is in context of phylogenetic tree construction, used to generate a forest of sample trees with the property that the probability of finding a tree in the forest is proportional to its likelihood x prior probability. Handy in Bayesian clustering methods!

It begins by calculating the likelihood of a test tree, L1. It then moves to a nearby tree in the tree-space, which slightly differs, for example in terms of branch lengths, change in rate parameters (such as substitution rate) or topology change. Likelihood for this tree is calculated, L2.

L1 and L2 will be slightly different. The new tree is accepted or rejected according to the Metrobolis Algorithm:

If L2 > L1 --> L2 is accepted

If L2 < L1 --> L2 is accepted by a probability of L2 / L1.

If L2 is accepted, it becomes the new tree.

This means that both uphill and downhill moves in terms of likely trees are allowed and may prevent us ending up stuck on a small local top with a bad view - sometimes we’ve got to go the distance to reach gold.

The intricat beauty of this algorithm is that the downhill moves are allowed at a “correct” probability, so that the equilibrium of probabilities of observing the different trees are given by the likelihoods. It is a property known as detailed balance. It originates from statistical physics where the equilibrium distribution is the thermodynamic equilibrium at a fixed temperature and the likelihoods of the states are functions of energies of the states.

Anyways: the art of developing a good MCMC method lays in the chice of move set. A balance must be found: if changes are very small, the likelihood ratio of the states will be close to 1 and the move has a good chance of being accepted. A very large number of moves would be required to alter the tree significantly and to obtain a diverse forest of trees.

On the other hand, if changes are very large, the likelihood ratio of the states will be far from 1 and the likelihood of accepting a downhill move will be very small.

As everything always comes down to; balance <3

0 notes

preciousinformation · 4 years ago

Photo

0 notes

preciousinformation · 5 years ago

Quote

To pick the right neural network architecture is more an art than a science

0 notes

preciousinformation · 5 years ago

Text

Drivers of evolution

Genomes vary vastly when it comes to size and number of genes. However, the complexity of an organism is little correlated to its size, and only slightly to its number of genes. What makes species complex? One can say that species that constantly have had to adapt to new environments may have evoluted more different genes, however, if a species have had a long evolutionary “break” from potential dangers or food deficiency, the evolutionary drivers have lowered its speed as well. But what are the main genomic drivers of evolution, how does it happen? We have all heard of the good old mutations, that are driven by random forces and once in a while result in a trait that is more advantageous than other existing traits in the population. If this trait is allowed to shine and inherited further and becomes a commones to the species, one could say it evoluted. Mutations exists in multiple forms, from a single base not causing any trouble, to a lethal single base mutation, examplewise a deletion or insertion leading to a change of reading frame. Larger structural mutations are also very possible, with several mechanisms more or less controlling this randomness. Anyhow, there are multiple more mechanisms occurring. Plants, like wheat and strawberries, are in contrast to diploid humans, polyploids. They have multiple copies of their whole genome in every single cell in contrast to e.g. humans, which have two copies). Plants have genomes ranging from 100 million bp up to 100 billion bp, and (un-correspondingly to this) contain between 30.000-100.000 genes. For comparison, humans have approximately 23.000 genes in a genome of 3 billion basepairs. One human average gene contains 27.000 basepairs, but can vary from as little as 1000 bp to 2.4 million bp. Easily said, there is little consistency in structure and genes other than its chemical structure, which is the only predictable source of information when investigating unknown genomes and genes. This is an example of the huge variation existing between species.

But back to the point: there are several reason for polyploidy, where often four or six chromosome copies exists in each species (3 and 5 are found to be unstable). This phenomenon arises from a little mitotic or meiotic catastrophe such as nondisjunction, where an unequal number of chromosome copies ends up in the new formatting cells. The gametes have complete sets of duplicate chromosomes, uike its parents. Polyploids can arise within a species (autopolyploidy), or as a result of hybridization between species(allopolyploidy). Autopolyploidz are essentially homozygous at every locus in the genome, whereas allopolyploids are heterozygotic to varying degrees, depending on the divergence of the parental genomes. There are both advantages and disadvantages with this phenomena, but that’s another chapter.

Even though flowering plants commonly possess genome duplications, other types of species are also prone to it, but they are often less stable and lethal. I read somewhere that as much as 10% of child abortions in human is a result of this phenomenon. However, ancient whole genome duplication known as paleopolyploidization, have been reported in most evolutionary lineages. Tentatively, more sublineages in the animal kingdom has undergone polyploidization events one or two times, where fish are known to have undergone at least one 80 million years ago, and two in total for the well-studied salmon. These events have been important in the development of a range of species.

Rather than the whole genome duplicating, single genes or sequences are also able to duplicate themselves, creating an extra copy of its function (if any). A mutation in this extra copy is not as dangerous as in the original copy, which makes it more open for accepting alterations leading to new functions.

Another contributor to the genetic variation are transposable elements (TEs). These are elements or genes that are able to jump around in the genome, which can lead to interruption of other sequences or itself. The sequence either cuts or copies itself out of its region and pastes into another place. A sequence can also be copied into the DNA from transcribed RNAs, requiring the enzyme reverse transcriptase to back-copy itself into to the genome. These are called retrotransposons and exists in two forms; LTR (long terminal repeated) which have long identical DNA-sequences largely repeated in the ends of the retrotransposon, and non-LTRs being the opposite. Most non-LTRs can be categorized further into SINEs and LINEs, being short- and long interspersed nuclear elements. Larger genomes and polyploid genomes tend to have more TEs than smaller genomes. More than 40% of the human genome are transposons, which indeed is a decent amount. They have been connected to disease on several occasions, not strangely as they possess the possibility to totally ruin vital function. However, these elements are also drivers of variation, and are thought to be important contributors to evolution.

0 notes

preciousinformation · 5 years ago

Text

Kreft, DEGs og psuedofunn

En merkelig siste store oppgave i dette enorme landskapet av funksjonell genomikk, skvist sammen til å touche dypt innom intrikate metoder for å analysere ulike type “funksjonell genomikk data”, ofte mRNA counts fra RNAseq som jo gir et godt bilde på cellens faktiske tilstand. Vi har analysert ulikt uttrykte gener i ulike typer kreftvev, hvor vi spesifikt valgte de regionsnære vevene pankreas, mage, blære og rektum. Det var alt for lett å lage maskinlæringsalgoritmer for å kunne predikere hvilket vev de spesifikke genuttryks-verdiene kom fra, men R er og blir en dårlig kandidat til dette formål i mine øyne; det er K-N-O-T.

Men hvem trenger å predikere hvilket vev en prøve de har tatt kommer fra? Ingen! Det viser likevel at det er markante forskjeller mellom genuttrykk i de ulike vevene, som er informasjon i seg selv. Vi prøvde å tolke verdiene som ML algoritmene fikk, men fant også ut at det var flere fallgruver her. Fant en slags feature selection metode tilslutt, Boruta, som så ut til å gi oss mer robuste resultater. De signifikante opp- og nedregulerte genene vi fant som viktige for denne separasjonen og klassifiseringen av vev var naturligvis mer vevs-spesifikke enn de med mer generelle biologiske funksjoner. De kan jo ha en viss betydning for akkurat den type kreft i akkurat det vevet, men igjen; dette blir å trekke konklusjoner om bananer basert på poteter. Jeg gleder meg til å studere bananene videre.

0 notes