#AlphaFold2 | Explore Tumblr posts and blogs

okusana-org · 8 months ago

Text

Kimya Nobel Ödülü Protein Sırlarını Çözen Bilim İnsanlarına Verildi!

Protein Araştırmalarında Çığır Açan Keşifler David Baker, Demis Hassabis ve John M. Jumper, protein tasarımı ve tahmininde yaptıkları çalışmalarla 2024 Nobel Kimya Ödülü’nü kazandı. Ödül tutarı olan 11 milyon İsveç Kronu’nun (yaklaşık 1 milyon ABD Doları) yarısı Baker’a, diğer yarısı ise Hassabis ve Jumper arasında paylaştırılacak. Proteinlerin Yaşamdaki Temel Rolü Proteinler sadece spor…

#AlphaFold2 #Bilimsel Keşif #Nobel Kimya Ödülü #Protein Yapısı #yapay zeka

2 notes · View notes

anselmolucio · 4 months ago

Text

Pequeños y grandes pasos hacia el imperio de la inteligencia artificial

Fuente: Open Tech Traducción de la infografía: 1943 – McCullock y Pitts publican un artículo titulado Un cálculo lógico de ideas inmanentes en la actividad nerviosa, en el que proponen las bases para las redes neuronales. 1950 – Turing publica Computing Machinery and Intelligence, proponiendo el Test de Turing como forma de medir la capacidad de una máquina. 1951 – Marvin Minsky y Dean…

1 note · View note

lefkosahaberleri · 6 months ago

Text

Yapay Zeka ve Türkiye'nin Geleceği: Etik, Yatırımlar ve İş Gücüne Etkisi

New Post has been published on https://lefkosa.com.tr/yapay-zeka-ve-turkiyenin-gelecegi-etik-yatirimlar-ve-is-gucune-etkisi-31501/

Yapay Zeka ve Türkiye'nin Geleceği: Etik, Yatırımlar ve İş Gücüne Etkisi

Yapay zeka, Türkiye’nin geleceğini şekillendiriyor. Bu yazıda, yapay zekanın etik boyutları, yatırımlar ve iş gücü üzerindeki etkileri ele alınıyor. Teknolojinin ülkemizdeki potansiyelini keşfedin.

https://lefkosa.com.tr/yapay-zeka-ve-turkiyenin-gelecegi-etik-yatirimlar-ve-is-gucune-etkisi-31501/ --------

#AlphaFold2 #ChatGPT #DeepMind #Dijital Çağ #etik yapay zeka #iş gücü #İTO #otonom sistemler #Şekib Avdagiç #teknoloji #ticaret sektörü #Türkiye #üretken yapay zeka #yapay zeka #yatırım #Ekonomi

0 notes

rayhaber · 8 months ago

Text

2024 Nobel Kimya Ödülü: AlphaFold2 ve Yeni Protein Tasarımı

2024 Nobel Kimya Ödülü’nde Dönüm Noktası Çalışmalar Uzmanlar, 2024 Nobel Kimya Ödülü’nü kazandıran çalışmaların detaylarını ve bu gelişmelerin potansiyel sonuçlarını CNN’e değerlendirdi. İsveç’teki Karolinska Enstitüsü’nde Tıbbi Genetik Profesörü ve İsveç Kraliyet Bilimler Akademisi üyesi olan Anna Wedell, bu seneki ödülü kazandıran “protein yapı tahmini” konusundaki araştırmaların, kimya…

#2024 Nobel Kimya Ödülü #AlphaFold2 #Bilim #David Baker #Demis Hassabis #John Jumper #Nadir Hastalıklar #protein tasarımı #protein yapısı tahmini #Yapay Zeka

0 notes

gokyuzumanzarasi · 8 months ago

Text

Kimya Nobel Ödülü Protein Sırlarını Çözen Bilim İnsanlarına Verildi!

Protein Araştırmalarında Çığır Açan Keşifler David Baker, Demis Hassabis ve John M. Jumper, protein tasarımı ve tahmininde yaptıkları çalışmalarla 2024 Nobel Kimya Ödülü’nü kazandı. Ödül tutarı olan 11 milyon İsveç Kronu’nun (yaklaşık 1 milyon ABD Doları) yarısı Baker’a, diğer yarısı ise Hassabis ve Jumper arasında paylaştırılacak. Proteinlerin Yaşamdaki Temel Rolü Proteinler sadece spor…

#AlphaFold2 #Bilimsel Keşif #Nobel Kimya Ödülü #Protein Yapısı #yapay zeka

0 notes

gururbenimneyime · 8 months ago

Text

Kimya Nobel Ödülü Protein Sırlarını Çözen Bilim İnsanlarına Verildi!

Protein Araştırmalarında Çığır Açan Keşifler David Baker, Demis Hassabis ve John M. Jumper, protein tasarımı ve tahmininde yaptıkları çalışmalarla 2024 Nobel Kimya Ödülü’nü kazandı. Ödül tutarı olan 11 milyon İsveç Kronu’nun (yaklaşık 1 milyon ABD Doları) yarısı Baker’a, diğer yarısı ise Hassabis ve Jumper arasında paylaştırılacak. Proteinlerin Yaşamdaki Temel Rolü Proteinler sadece spor…

#AlphaFold2 #Bilimsel Keşif #Nobel Kimya Ödülü #Protein Yapısı #yapay zeka

0 notes

ellerielimde · 8 months ago

Text

Kimya Nobel Ödülü Protein Sırlarını Çözen Bilim İnsanlarına Verildi!

Protein Araştırmalarında Çığır Açan Keşifler David Baker, Demis Hassabis ve John M. Jumper, protein tasarımı ve tahmininde yaptıkları çalışmalarla 2024 Nobel Kimya Ödülü’nü kazandı. Ödül tutarı olan 11 milyon İsveç Kronu’nun (yaklaşık 1 milyon ABD Doları) yarısı Baker’a, diğer yarısı ise Hassabis ve Jumper arasında paylaştırılacak. Proteinlerin Yaşamdaki Temel Rolü Proteinler sadece spor…

#AlphaFold2 #Bilimsel Keşif #Nobel Kimya Ödülü #Protein Yapısı #yapay zeka

0 notes

sanacicekaldim · 8 months ago

Text

Kimya Nobel Ödülü Protein Sırlarını Çözen Bilim İnsanlarına Verildi!

Protein Araştırmalarında Çığır Açan Keşifler David Baker, Demis Hassabis ve John M. Jumper, protein tasarımı ve tahmininde yaptıkları çalışmalarla 2024 Nobel Kimya Ödülü’nü kazandı. Ödül tutarı olan 11 milyon İsveç Kronu’nun (yaklaşık 1 milyon ABD Doları) yarısı Baker’a, diğer yarısı ise Hassabis ve Jumper arasında paylaştırılacak. Proteinlerin Yaşamdaki Temel Rolü Proteinler sadece spor…

#AlphaFold2 #Bilimsel Keşif #Nobel Kimya Ödülü #Protein Yapısı #yapay zeka

0 notes

okusanaorgsblog · 8 months ago

Text

Kimya Nobel Ödülü Protein Sırlarını Çözen Bilim İnsanlarına Verildi!

Protein Araştırmalarında Çığır Açan Keşifler David Baker, Demis Hassabis ve John M. Jumper, protein tasarımı ve tahmininde yaptıkları çalışmalarla 2024 Nobel Kimya Ödülü’nü kazandı. Ödül tutarı olan 11 milyon İsveç Kronu’nun (yaklaşık 1 milyon ABD Doları) yarısı Baker’a, diğer yarısı ise Hassabis ve Jumper arasında paylaştırılacak. Proteinlerin Yaşamdaki Temel Rolü Proteinler sadece spor…

#AlphaFold2 #Bilimsel Keşif #Nobel Kimya Ödülü #Protein Yapısı #yapay zeka

0 notes

govindhtech · 11 months ago

Text

DeepMind’s AlphaFold 3 Server For Molecular Life Blueprint

How AlphaFold 3 Server was constructed to predict the composition and interactions of every molecule in life

Launched in 2020, Google DeepMind’s AlphaFold 2 protein prediction model has been applied by over 2 million researchers working on cancer treatments, vaccination development, and other fields. This has allowed scholars to solve a challenge they have been working on for more than fifty years. It would have been simple for the group to sit back and relax after assisting scientists in the prediction of hundreds of millions of structures.

Google DeepMind’s AlphaFold 3 Server

Rather, they began to work on AlphaFold 3 Server. The Google DeepMind and Isomorphic Labs teams released a newer model in May that improves on their earlier models by predicting not only the structure of proteins but also the interactions and structures of all other molecules in life, such as DNA, RNA, and ligands (small molecules that bind to proteins).

Research scientist Jonas Adler of Google DeepMind claims that “looking at recent high-impact research, we made enormous progress on this decades-old open problem of protein folding with AlphaFold 2.”, researchers are moving beyond that.” “Their findings frequently dealt with more intricate topics, such as the binding of RNA or tiny molecules, which AlphaFold 2 was unable to accomplish. In order to get to the current state of biology and chemistry, we needed to be able to cover every type of biomolecule because experimental research has advanced the field.

“Everything” includes ligands, which comprise roughly 50% of all pharmaceuticals. Adrian Stecula, the research head at Isomorphic Labs, states, “We see the tremendous potential of AlphaFold 3 for rational drug design, and we’re already using it in our day-to-day work.” “All of those capabilities are unlocked by the new model, including investigating the binding of novel small molecules to novel drug targets, responding to queries like ‘How do proteins interact with DNA and RNA?,’ and examining the impact of chemical modifications on protein structure.”

An order of magnitude more potential combinations were introduced with the advent of these other molecule kinds. “There is a lot of order in proteins. There are just 20 typical amino acids, for instance,” explains Jonas. Small molecules, on the other hand, have an endless amount of space and are capable of doing almost anything. They are really varied.”

This implied that it would have been impossible to create a database with all the features. Rather, Google DeepMind have made available AlphaFold Server, a free utility that allows scientists to enter their own sequences for which AlphaFold can produce molecular complexes. It has been used by researchers to create over a million structures since its May introduction.

Lindsay Willmore, a Google DeepMind research engineer, compares it to “Google Maps for molecular complexes.” “Any user who is completely non-technical can simply copy and paste the names of their small molecules, DNA, RNA, and protein sequences, hit a button, and wait a short while.” They will be able to view and assess their forecast thanks to the release of their structure and confidence metrics.

The team greatly increased the amount of data that the newer model was trained on to include DNA, RNA, tiny molecules, and more in order to enable AlphaFold 3 to function with this far wider spectrum of biomolecules. Google is able to decide, “Let’s just train on everything that exists in this dataset that has really helped us with proteins, and let’s see how far we can get,” according to Lindsay. “And it looks like we can go a fair distance.” A change in the design for the last portion of the model that creates the structure is another significant alteration to AlphaFold 3.

AlphaFold 3 Server employs a generative model that is based on diffusion, similar to their other state-of-the-art image generation models, like Imagen. This considerably simplifies the way the model handles all the new molecule kinds, whereas AlphaFold 2 employed a complicated bespoke geometry-based module.

However, that change brought up a fresh problem: Instead of anticipating disordered sections, the diffusion model would attempt to construct an erroneous “ordered” structure with a distinct spiral form because the so-called “disordered regions” of proteins weren’t included in the training data.

The group decided to use AlphaFold 2, which is already very adept at identifying which interactions which resemble a pile of disorganised spaghetti would be disordered and which ones wouldn’t. According to Lindsay, “We were able to use those predicted structures from AlphaFold 2 as distillation training for AlphaFold 3 Server, so that AlphaFold 3 could learn to predict disorder.”

The group is excited to watch how scientists will employ AlphaFold 3 Server to progress a variety of areas, including medication development and genomics research.

“The amount of progress Google DeepMind made is amazing,” remarks Jonas. “What was extremely difficult before has now become really simple. Even though there are still many challenging issues to resolve, they are enthusiastic about AlphaFold 3 Server‘s potential to contribute to their resolution. What was once unthinkable is now achievable.

Read more at UNC School of Medicine

///

Other recent news and insights

Bot Image, Inc. introduces its prostate cancer detection, diagnosis, and screening software (Bot Image Inc./PRNewswire)

#biotech #AlphaFold2 #medtech #health tech #drug discovery #ai #protein #bot image #cancer #diagnostics #oncology

0 notes

drakefruitddg · 8 months ago

Text

no, don't say that

we hate misuse of ai, fake artists, and theft!

#ai #alphafold2 #alpha fold #biology #machine learning #art theft #no ai

20 notes · View notes

mbr-br · 6 months ago

Text

ちょっとしたきっかけで、科学とAIについてのアドベントカレンダーの記事を一つ書くことになったのだが、書くのはいいとしてどこで書けばいいのか良くわからないので、ここで書くことにした。

ここは普段はてなブログを拠点にしている自分が軽い独り言を書くための場所で、どちらかというとX（Twitter）のような短文が中心なので、あまり長々と書く場所ではない（と自分で勝手に決めている）のだけども、内容が内容なのでここに記すことにした。

普段は独り言なのでいきなり本題から入ってしまうが、今回はアドベントカレンダーなので自己紹介をしなければならない。自分は、AlphaFold2というAIに自分の専門分野の中核を撃たれたこと（そしてそこからある種のドミノ倒しが起きたこと）で、学生の頃から数えて20年あまり所属している分野が混乱とともに「バラバラ」になっていくのを見ている、大学の一教員である。

世間はAlphaFoldがノーベル賞を取ったこともあり、かなりの歓迎ムードだが、分野史を追えばAlphaFold2を作ったDeepMindがある種の侵略者・征服者（他分野からいきなりやってきて、コンテスト荒らしどころか20年以上続いたコンテストそのものを数年で終わらせた）であるのは明らかで、この個人的な体験が今回のこのポストを書く上での発端になっている。

正直なところ、実際に科学研究というものを曲がりなりにも職業として行っている立場として、科学分野の大きな問いや分野そのものが「唐突に終わりうる」ということ、特に当該分野で長年知見を積んできた人間ではなく、全く異なる技術を持って横から来た人間が分野を終わらせることがあるという事態に直面すると、専門性を維持するモチベーションを保つのが難しくなることを、じわじわと実感しつつある。自分は大学の講義などでこの一連の歴史の流れを年に数回口にすることもあってか、そのたびに少しずつ心の中に澱のようなものが溜まるように思う。

AlphaFold2の存在が明らかになったのは2020年の11月末だが、一般に公開されたのは2021年7月のことだった。DeepMindは前バージョンのAlphaFoldを出し渋ったこともあり、AlphaFold2が計算済み予測モデルのデータベースと共に全面フリーで公開されたことは今思い返しても信じがたい。当時の混乱をエッセイとして書けと言われればいくらでも書けるくらいで、たぶん同分野の自分と同世代～上の世代の人たちはみな同じだろう。

そしてそれから1年半ほどあとの2022年11月末、ChatGPTが世に出た。ChatGPTを使ってみてしばらくして思ったのは「自分たちが味わった衝撃を、全ての分野の全ての人が味わうのか」という感覚だった。ただ、この時点では、AlphaFold2にしてもChatGPTにしても、単独の機能に優れたAIであって、それほど広がりを持たないものだったように思う。もちろんこの時点では、だけれども。

さらにあれから2年弱が経ち、自分の分野はAlphaFold2が引き起こしたドミノ倒しによって次々と問題が解かれるようになり、「解く問題がなくなる」という方向で分野が崩壊しつつあるが、世界はまだAIによってすべてが崩壊する程ではない。ちょうどこのアドベントカレンダーを書く予定になっていた12月20日にOpenAIからo3が発表され、ベンチマークの都合か数学方面が狙い撃ちされつつあるようだけれども、まだAlphaFold2ほどの衝撃はないように見える（ひょっとしたらAlphaFold1の瞬間かもしれない）。

これまでの4年間、AlphaFold2からo3までを自分の立場で振り返ると、研究分野は中核を撃たれる（重要な問題を解かれる）とそこからドミノ倒しが発生し得る・次々と問題が解決していく可能性があること、撃ってくるのは分野外の人間かもしれずタイミングは分からないこと（ここが一番つらい）、ドミノ倒しが始まると分野の流れが急激に速まり、多くの研究者はいわば土砂崩れから逃げ惑うような苦しい立場に立た��れること、だろうか。この苦しさから逃れるため（そして研究業績を上げ続けるため）に、多くの研究者は当然もがきながら方針転換するのだけども、転換するより早く分野が崩壊する可能性もあって、正直なところこれが他分野でも同様に起こるとすると、あまりにも厳しすぎるし、気の毒すぎるように思う。

この苦しさは崩壊の過程が引き延ばされればされるほど長く続くと思われる（もちろん、従前のように崩壊が非常にゆっくりであれば問題ないが、もはやそこまでスローダウンすることは考えられない）ので、AIの進歩が加速し科学のすべてを「早く終わらせる」ことでしか、この苦境を脱することはできないのではないかと感じている

…というのが、過去4年を踏まえた2024年末の現在の心境だが、また来年再来年には考えが変わっているかもしれない。それと、自分がここに書いた分野観はそれほど異端ではないはずだけども、まだあからさまに口にできる状況でもない（皆悪いことはあまり口に出したくない）ので、できればそろそろ自分の分野以外にも、AIで崩壊し始める分野が出てきてくれて、こうした見方が一般化してほしいものだ。

追記：今年の1月に似た話題（AIと科学研究と自身のキャリア）について独り言を書いていた。自分がこの崩壊の中である程度冷静でいられるのは、「自分は他人にとって代わられて当然である」という価値観であるからかもしれない。 https://mbr-br.tumblr.com/post/739340490648043520/

追記2：自分が科学の発展に何を望むかについて書いたものがあったので、参考としてここにつけておく。正確には「科学の発展で何が可能になって欲しいか」という問いで、自分の回答は「生死の境界・生物無生物の差異・自他の区別を完全に破壊したい」である。書いたのは3年数か月ほど前だが、そこにある「現在ほぼ全ての人に植え付けられている生命とか自我とか社会とかの概念をぶち壊して、その先に何が出てくるか見てみたい」のは今でもそうで、現在のAIは生物無生物の壁を破壊しつつある点で、自分にとってはとても好ましい存在だと感じている。追記3（2024年12月31日6:41AM）：アドベントカレンダーに載せたせいかそれなりに読んでもらっているようで、某所では面白いと評してもらったりもしてやや恥ずかしい気分になっている。せっかくなので、このブログポストに関連する話題としてVirtual Lab論文にも触れておきたい。これは2024年11月半ばに発表されたAIによる生命科学（タンパク質工学）研究自動化の試みで、取り組みとしては課題設定も含めそれほど目新しいもの��ないのだけども、研究チームを率いるAIを設定し仮想のチーム作りからAIにやらせるところと、AIが計算で設計したものを人間が実験で検証するところがやや新しい。12月初旬にNatureのNewsで取り上げられて、12月下旬には日本国内のSNSでも生命科学者の間で（恐怖を伴いながら）話題になった。

すでに書いた通り、GPT-4oなどの商用AIを活用した研究自動化の取り組み自体は新しくないし、生命科学者が自分がAIに代替されるかもしれない未来に思いを馳せながら、あれこれ騒ぐのも理解できる。ただ自分としては、論文のイントロダクションにあった「学際領域の研究は大事だが、そういう研究者は少ないのでAIで代替する」という文言が一番堪えた。実際、この論文ではComputational BiologistとMachine Learning SpecialistがAIチームメンバーとして登場するが、これらはまさに自分たちをリプレースする存在である（つまり、多くの生命科学者と違い、自分は「すでにAIに代替されうる」側に立たされている）。

これがただでさえAlphaFoldで崩壊しつつある自分の分野に何をもたらすのか。元々、この展開は予想していたことではあるし、書けることもたくさんあるのだけれど、一つ言えるのは「人間が学際領域を研究するインセンティブ、そういう人材を教育するインセンティブが極端に減ってしまう」ことだと思う。もともと複数の学問領域にまたがる分野を研究するのは複数の分野の知識が必要な点でやや大変だし、そういうところを目指す学生もそれほどは多くないので大事に教育してきたつもりだけども、最初から無数の分野のそれなりのエキスパートとしてAIが降臨してしまうと、新しく分野に参入する気持ちはくじかれてしまうだろう。企業研究者であれば新卒の代わりにAIを雇用することでとりあえずは解決できるかもしれないが、教育を担う大学教員としてはこれをどのように扱えばいいのか正直まだ答えはない。そして、流入する人間が減ってくると分野は実質的に「蒸発」するだろう。

結局のところ、AIによって分野の問題がすべて解かれるという崩壊と、AIによって研究者が代替されるという蒸発の二方向から、研究分野は消滅に追い込まれていくのかもしれない。

10 notes · View notes

rayhaber · 8 months ago

Text

2024 Nobel Kimya Ödülü Sahipleri Duyuruldu

2024 Nobel Kimya Ödülü Sahipleri Açıklandı İsveç Kraliyet Bilimler Akademisi, 2024 Nobel Kimya Ödülü’nün sahiplerini duyurmak üzere düzenlediği basın toplantısında, bu yılki ödülün Demis Hassabis, John M. Jumper ve David Baker’a verileceğini açıkladı. Bu bilim insanları, bilgisayar ve yapay zeka teknolojileri aracılığıyla proteinlerin karmaşık yapılarının sırlarını çözmeyi başardılar. Uzun…

#2024 #AlphaFold2 #Bilim #biyokimya #David Baker #Demis Hassabis #John M. Jumper #Nobel Kimya Ödülü #protein yapısı #Yapay Zeka

0 notes

greatwyrmgold · 5 months ago

Text

To be fair, stuff trying to pass the Turing test is what the news is talking about most.

The thing about AI is that, even if we accept everything called "AI" is equally artificially-intelligent, there are still multiple things we lump together under that one name. There's AI the technology, AI the marketing buzzword, AI the threat of unemployment for entire industries, among others. Like AI the sci-fi trope, which is part of the discussion whether we like it or not.

These things are obviously connected. AI the marketing buzzword funds AI the technology, which legitimizes all other uses of AI, with the possible exception of AI the sci-fi trope (which has done a lot of work to legitimize AI the buzzword).

They're connected, but they're distinct. People hear about the marketing buzzword, they feat the threat of unemployment (or what concessions employers will force out of their employees using that threat). But AI the technology is less relevant to most people.

It's always so weird that like. Fully a third of job listings I see in machine learning are for biomedical research. And easily two thirds of postdocs. Massive, huge family of applications that seems completely absent from the public discourse. From the way people talk about it you'd think half the field works in image generation and nobody does medical research, but in reality only a tiny handful of people seem to be doing image generation, and everyone else is either doing language models or studying cancer and designing novel drugs

#most of that biological AI is going to be parsed as a kind of medical technology by the general public.#unless it's marketed as AI. and that marketing somehow gets laymen to care about AlphaFold2.#ai #biology #science #discourse

3K notes · View notes

cleverhottubmiracle · 2 months ago

Link

[ad_1] PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment of recognition for the of AI role in biology. What comes next after protein folding? In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins. It can accept compositional function and organism prompts, and can be trained on sequence databases, which are 2-4 orders of magnitude larger than structure databases. Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates. From structure prediction to real-world drug design Though recent works demonstrate promise for the ability of diffusion models to generate proteins, there still exist limitations of previous models that make them impractical for real-world applications, such as: All-atom generation: Many existing generative models only produce the backbone atoms. To produce the all-atom structure and place the sidechain atoms, we need to know the sequence. This creates a multimodal generation problem that requires simultaneous generation of discrete and continuous modalities. Organism specificity: Proteins biologics intended for human use need to be humanized, to avoid being destroyed by the human immune system. Control specification: Drug discovery and putting it into the hands of patients is a complex process. How can we specify these complex constraints? For example, even after the biology is tackled, you might decide that tablets are easier to transport than vials, adding a new constraint on soluability. Generating “useful” proteins Simply generating proteins is not as useful as controlling the generation to get useful proteins. What might an interface for this look like? For inspiration, let's consider how we'd control image generation via compositional textual prompts (example from Liu et al., 2022). In PLAID, we mirror this interface for control specification. The ultimate goal is to control generation entirely via a textual interface, but here we consider compositional constraints for two axes as a proof-of-concept: function and organism: Learning the function-structure-sequence connection. PLAID learns the tetrahedral cysteine-Fe2+/Fe3+ coordination pattern often found in metalloproteins, while maintaining high sequence-level diversity. Training using sequence-only training data Another important aspect of the PLAID model is that we only require sequences to train the generative model! Generative models learn the data distribution defined by its training data, and sequence databases are considerably larger than structural ones, since sequences are much cheaper to obtain than experimental structure. Learning from a larger and broader database. The cost of obtaining protein sequences is much lower than experimentally characterizing structure, and sequence databases are 2-4 orders of magnitude larger than structural ones. How does it work? The reason that we’re able to train the generative model to generate structure by only using sequence data is by learning a diffusion model over the latent space of a protein folding model. Then, during inference, after sampling from this latent space of valid proteins, we can take frozen weights from the protein folding model to decode structure. Here, we use ESMFold, a successor to the AlphaFold2 model which replaces a retrieval step with a protein language model. Our method. During training, only sequences are needed to obtain the embedding; during inference, we can decode sequence and structure from the sampled embedding. ❄️ denotes frozen weights. In this way, we can use structural understanding information in the weights of pretrained protein folding models for the protein design task. This is analogous to how vision-language-action (VLA) models in robotics make use of priors contained in vision-language models (VLMs) trained on internet-scale data to supply perception and reasoning and understanding information. Compressing the latent space of protein folding models A small wrinkle with directly applying this method is that the latent space of ESMFold – indeed, the latent space of many transformer-based models – requires a lot of regularization. This space is also very large, so learning this embedding ends up mapping to high-resolution image synthesis. To address this, we also propose CHEAP (Compressed Hourglass Embedding Adaptations of Proteins), where we learn a compression model for the joint embedding of protein sequence and structure. Investigating the latent space. (A) When we visualize the mean value for each channel, some channels exhibit “massive activations”. (B) If we start examining the top-3 activations compared to the median value (gray), we find that this happens over many layers. (C) Massive activations have also been observed for other transformer-based models. We find that this latent space is actually highly compressible. By doing a bit of mechanistic interpretability to better understand the base model that we are working with, we were able to create an all-atom protein generative model. What’s next? Though we examine the case of protein sequence and structure generation in this work, we can adapt this method to perform multi-modal generation for any modalities where there is a predictor from a more abundant modality to a less abundant one. As sequence-to-structure predictors for proteins are beginning to tackle increasingly complex systems (e.g. AlphaFold3 is also able to predict proteins in complex with nucleic acids and molecular ligands), it’s easy to imagine performing multimodal generation over more complex systems using the same method. If you are interested in collaborating to extend our method, or to test our method in the wet-lab, please reach out! Further links If you’ve found our papers useful in your research, please consider using the following BibTeX for PLAID and CHEAP: @articlelu2024generating, title=Generating All-Atom Protein Structure from Sequence-Only Training Data, author=Lu, Amy X and Yan, Wilson and Robinson, Sarah A and Yang, Kevin K and Gligorijevic, Vladimir and Cho, Kyunghyun and Bonneau, Richard and Abbeel, Pieter and Frey, Nathan, journal=bioRxiv, pages=2024--12, year=2024, publisher=Cold Spring Harbor Laboratory @articlelu2024tokenized, title=Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure, author=Lu, Amy X and Yan, Wilson and Yang, Kevin K and Gligorijevic, Vladimir and Cho, Kyunghyun and Abbeel, Pieter and Bonneau, Richard and Frey, Nathan, journal=bioRxiv, pages=2024--08, year=2024, publisher=Cold Spring Harbor Laboratory You can also checkout our preprints (PLAID, CHEAP) and codebases (PLAID, CHEAP). Some bonus protein generation fun! Additional function-prompted generations with PLAID. Unconditional generation with PLAID. Transmembrane proteins have hydrophobic residues at the core, where it is embedded within the fatty acid layer. These are consistently observed when prompting PLAID with transmembrane protein keywords. Additional examples of active site recapitulation based on function keyword prompting. Comparing samples between PLAID and all-atom baselines. PLAID samples have better diversity and captures the beta-strand pattern that has been more difficult for protein generative models to learn. Acknowledgements Thanks to Nathan Frey for detailed feedback on this article, and to co-authors across BAIR, Genentech, Microsoft Research, and New York University: Wilson Yan, Sarah A. Robinson, Simon Kelow, Kevin K. Yang, Vladimir Gligorijevic, Kyunghyun Cho, Richard Bonneau, Pieter Abbeel, and Nathan C. Frey. [ad_2] Source link

0 notes