Tumgik
#Autoencoders
jcmarchi · 2 months
Text
Machine learning and the microscope
New Post has been published on https://thedigitalinsider.com/machine-learning-and-the-microscope/
Machine learning and the microscope
Tumblr media Tumblr media
With recent advances in imaging, genomics and other technologies, the life sciences are awash in data. If a biologist is studying cells taken from the brain tissue of Alzheimer’s patients, for example, there could be any number of characteristics they want to investigate — a cell’s type, the genes it’s expressing, its location within the tissue, or more. However, while cells can now be probed experimentally using different kinds of measurements simultaneously, when it comes to analyzing the data, scientists usually can only work with one type of measurement at a time.
Working with “multimodal” data, as it’s called, requires new computational tools, which is where Xinyi Zhang comes in.
The fourth-year MIT PhD student is bridging machine learning and biology to understand fundamental biological principles, especially in areas where conventional methods have hit limitations. Working in the lab of MIT Professor Caroline Uhler in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society, and collaborating with researchers at the Eric and Wendy Schmidt Center at the Broad Institute and elsewhere, Zhang has led multiple efforts to build computational frameworks and principles for understanding the regulatory mechanisms of cells.
“All of these are small steps toward the end goal of trying to answer how cells work, how tissues and organs work, why they have disease, and why they can sometimes be cured and sometimes not,” Zhang says.
The activities Zhang pursues in her down time are no less ambitious. The list of hobbies she has taken up at the Institute include sailing, skiing, ice skating, rock climbing, performing with MIT’s Concert Choir, and flying single-engine planes. (She earned her pilot’s license in November 2022.)
“I guess I like to go to places I’ve never been and do things I haven’t done before,” she says with signature understatement.
Uhler, her advisor, says that Zhang’s quiet humility leads to a surprise “in every conversation.”
“Every time, you learn something like, ‘Okay, so now she’s learning to fly,’” Uhler says. “It’s just amazing. Anything she does, she does for the right reasons. She wants to be good at the things she cares about, which I think is really exciting.”
Zhang first became interested in biology as a high school student in Hangzhou, China. She liked that her teachers couldn’t answer her questions in biology class, which led her to see it as the “most interesting” topic to study.
Her interest in biology eventually turned into an interest in bioengineering. After her parents, who were middle school teachers, suggested studying in the United States, she majored in the latter alongside electrical engineering and computer science as an undergraduate at the University of California at Berkeley.
Zhang was ready to dive straight into MIT’s EECS PhD program after graduating in 2020, but the Covid-19 pandemic delayed her first year. Despite that, in December 2022, she, Uhler, and two other co-authors published a paper in Nature Communications.
The groundwork for the paper was laid by Xiao Wang, one of the co-authors. She had previously done work with the Broad Institute in developing a form of spatial cell analysis that combined multiple forms of cell imaging and gene expression for the same cell while also mapping out the cell’s place in the tissue sample it came from — something that had never been done before.
This innovation had many potential applications, including enabling new ways of tracking the progression of various diseases, but there was no way to analyze all the multimodal data the method produced. In came Zhang, who became interested in designing a computational method that could.
The team focused on chromatin staining as their imaging method of choice, which is relatively cheap but still reveals a great deal of information about cells. The next step was integrating the spatial analysis techniques developed by Wang, and to do that, Zhang began designing an autoencoder.
Autoencoders are a type of neural network that typically encodes and shrinks large amounts of high-dimensional data, then expand the transformed data back to its original size. In this case, Zhang’s autoencoder did the reverse, taking the input data and making it higher-dimensional. This allowed them to combine data from different animals and remove technical variations that were not due to meaningful biological differences.
In the paper, they used this technology, abbreviated as STACI, to identify how cells and tissues reveal the progression of Alzheimer’s disease when observed under a number of spatial and imaging techniques. The model can also be used to analyze any number of diseases, Zhang says.
Given unlimited time and resources, her dream would be to build a fully complete model of human life. Unfortunately, both time and resources are limited. Her ambition isn’t, however, and she says she wants to keep applying her skills to solve the “most challenging questions that we don’t have the tools to answer.”
She’s currently working on wrapping up a couple of projects, one focused on studying neurodegeneration by analyzing frontal cortex imaging and another on predicting protein images from protein sequences and chromatin imaging.
“There are still many unanswered questions,” she says. “I want to pick questions that are biologically meaningful, that help us understand things we didn’t know before.”
2 notes · View notes
sifytech · 9 months
Text
Unveiling the Depths of Intelligence: A Journey into the World of Deep Learning
Tumblr media
Embark on a captivating exploration into the cutting-edge realm of Deep Learning, a revolutionary paradigm within the broader landscape of artificial intelligence. Read More. https://www.sify.com/ai-analytics/unveiling-the-depths-of-intelligence-a-journey-into-the-world-of-deep-learning/
0 notes
raspberryjamnnn · 12 days
Text
there was like an effect used for a lot of the visualizations (it was most noticable particularly in the new song that I didn't catch the name of, the one with the live girls saying what I think was "cut and drain your blood" or smth) where it had that pseudo-flickery ai patterning effect and it was so so cool, I'd really like to know how it works
1 note · View note
thedevmaster-tdm · 13 days
Text
youtube
STOP Using Fake Human Faces in AI
1 note · View note
VAE for Anomaly Detection
Tumblr media
Variational Autoencoders (VAEs) are powerful tools for generating data, especially useful for data augmentation and spotting anomalies. By working with latent spaces, VAEs help to diversify datasets and capture complex data patterns, making them particularly effective at identifying outliers. Advanced versions, like Conditional VAEs and Beta-VAEs, further enhance data generation and improve model performance. With their ability to handle complex data, VAEs are making a big impact in AI, offering innovative solutions across various fields. Read the full article here
0 notes
cbirt · 9 months
Link
Due to the rapid advancements in AI, structure-based medication design is now widely used as one of the most important methods in early drug development. AI’s deep generative modeling is expanding molecular design beyond classical methods by learning essential intra- and intermolecular interactions from available data, thanks to the increase of crystal structure data and libraries. DrugHIVE, a deep structure-based generative model, was introduced by researchers at the University of Southern California to generate a high-quality drug-like molecule. DrugHIVE enables fine-grained control over molecular generation. Molecular optimization, scaffold hopping, and linker design are high-performance approaches applied to various drug design problems using DrugHIVE.
The early drug discovery process relies heavily on the quality of initial candidate molecules. Historically, the size of drug-like molecular libraries has been limited to a few million compounds. However, with the advent of virtually enumerated libraries and fragment-based virtual screening techniques, the capacity has increased to billions, potentially on a tera-scale. These techniques have accelerated and lowered the cost of early drug discovery but still rely on explicit enumeration of chemical libraries. Despite advancements in computational tools applying SBDD, these tasks still rely heavily on human intelligence expertise.
Continue Reading
34 notes · View notes
shruti3802 · 2 months
Text
Exploring Generative AI: Unleashing Creativity through Algorithms
Generative AI, a fascinating branch of artificial intelligence, has been making waves across various fields from art and music to literature and design. At its core, generative AI enables computers to autonomously produce content that mimics human creativity, leveraging complex algorithms and vast datasets.
One of the most compelling applications of generative AI is in the realm of art. Using techniques such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), AI systems can generate original artworks that blur the line between human and machine creativity. Artists and researchers alike are exploring how these algorithms can inspire new forms of expression or augment traditional creative processes.
In the realm of music, generative AI algorithms can compose melodies, harmonies, and even entire pieces that resonate with listeners. By analyzing existing compositions and patterns, AI can generate music that adapts to different styles or moods, providing musicians with novel ideas and inspirations.
Literature and storytelling have also been transformed by generative AI. Natural Language Processing (NLP) models can generate coherent and engaging narratives, write poetry, or even draft news articles. While these outputs may still lack the depth of human emotional understanding, they showcase AI's potential to assist writers, editors, and journalists in content creation and ideation.
Beyond the arts, generative AI has practical applications in fields like healthcare, where it can simulate biological processes or generate synthetic data for research purposes. In manufacturing and design, AI-driven generative design can optimize product designs based on specified parameters, leading to more efficient and innovative solutions.
However, the rise of generative AI also raises ethical considerations, such as intellectual property rights, bias in generated content, and the societal impact on creative industries. As these technologies continue to evolve, it's crucial to navigate these challenges responsibly and ensure that AI augments human creativity rather than replacing it.
In conclusion, generative AI represents a groundbreaking frontier in technology, unleashing new possibilities across creative disciplines and beyond. As researchers push the boundaries of what AI can achieve, the future promises exciting developments that could redefine how we create, innovate, and interact with technology in the years to come.
If you want to become a Generative AI Expert in India join the Digital Marketing class from Abhay Ranjan
3 notes · View notes
silic0n0asis · 2 months
Text
thinking about my problematic relationship with notions of agency again and suddenly feeling so scatterbrained, like a few nights ago when i found myself drawn to that arno schmidt documentary...regardless i still feel a little guilty about publishing the music i apparently make—that i at least realize the concrete structure of—under my name, it's hard to believe i make it and that its my domain anymore than a deep learning system (autoencoders are so often excluded!) has dominion over what images and audio they synthesize! there is a more existential generalized notion of self-agency here that im also thinking of here that seems necessarily moored to this, or this is moored to it rather, what are really their lack thereofs
3 notes · View notes
389 · 2 years
Photo
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
Nachev sought out thousands of portraits of individuals from different social or historical clusters that have a common characteristic and then submitted these arbitrary particulars to an algorithm that generated a combined portrait for each group, a ‘canonical’, if you will. Using a machine, Nachev presents us with the single portrait of the wealthiest one hundred women, a portrait of all American gangsters of note, of all the US soldiers who have died during one week of the Vietnam War, or even of thousands of women at the moment of self-reported climax. The artist does this by training an entity of self-organising mathematics called a deep autoencoder to extract the inexistent, essential and ideal human form from thousands of arbitrary instances. 
Click on the images to view ‘Titles’
44 notes · View notes
avnnetwork · 9 months
Text
Exploring the Depths: A Comprehensive Guide to Deep Neural Network Architectures
In the ever-evolving landscape of artificial intelligence, deep neural networks (DNNs) stand as one of the most significant advancements. These networks, which mimic the functioning of the human brain to a certain extent, have revolutionized how machines learn and interpret complex data. This guide aims to demystify the various architectures of deep neural networks and explore their unique capabilities and applications.
1. Introduction to Deep Neural Networks
Deep Neural Networks are a subset of machine learning algorithms that use multiple layers of processing to extract and interpret data features. Each layer of a DNN processes an aspect of the input data, refines it, and passes it to the next layer for further processing. The 'deep' in DNNs refers to the number of these layers, which can range from a few to several hundreds. Visit https://schneppat.com/deep-neural-networks-dnns.html
2. Fundamental Architectures
There are several fundamental architectures in DNNs, each designed for specific types of data and tasks:
Convolutional Neural Networks (CNNs): Ideal for processing image data, CNNs use convolutional layers to filter and pool data, effectively capturing spatial hierarchies.
Recurrent Neural Networks (RNNs): Designed for sequential data like time series or natural language, RNNs have the unique ability to retain information from previous inputs using their internal memory.
Autoencoders: These networks are used for unsupervised learning tasks like feature extraction and dimensionality reduction. They learn to encode input data into a lower-dimensional representation and then decode it back to the original form.
Generative Adversarial Networks (GANs): Comprising two networks, a generator and a discriminator, GANs are used for generating new data samples that resemble the training data.
3. Advanced Architectures
As the field progresses, more advanced DNN architectures have emerged:
Transformer Networks: Revolutionizing the field of natural language processing, transformers use attention mechanisms to improve the model's focus on relevant parts of the input data.
Capsule Networks: These networks aim to overcome some limitations of CNNs by preserving hierarchical spatial relationships in image data.
Neural Architecture Search (NAS): NAS employs machine learning to automate the design of neural network architectures, potentially creating more efficient models than those designed by humans.
4. Training Deep Neural Networks
Training DNNs involves feeding large amounts of data through the network and adjusting the weights using algorithms like backpropagation. Challenges in training include overfitting, where a model learns the training data too well but fails to generalize to new data, and the vanishing/exploding gradient problem, which affects the network's ability to learn.
5. Applications and Impact
The applications of DNNs are vast and span multiple industries:
Image and Speech Recognition: DNNs have drastically improved the accuracy of image and speech recognition systems.
Natural Language Processing: From translation to sentiment analysis, DNNs have enhanced the understanding of human language by machines.
Healthcare: In medical diagnostics, DNNs assist in the analysis of complex medical data for early disease detection.
Autonomous Vehicles: DNNs are crucial in enabling vehicles to interpret sensory data and make informed decisions.
6. Ethical Considerations and Future Directions
As with any powerful technology, DNNs raise ethical questions related to privacy, data security, and the potential for misuse. Ensuring the responsible use of DNNs is paramount as the technology continues to advance.
In conclusion, deep neural networks are a cornerstone of modern AI. Their varied architectures and growing applications are not only fascinating from a technological standpoint but also hold immense potential for solving complex problems across different domains. As research progresses, we can expect DNNs to become even more sophisticated, pushing the boundaries of what machines can learn and achieve.
3 notes · View notes
Text
so, like, gpt and so on can be viewed as (at least partially comprised of) lossy compression algorithms. someone has to have tested them as such, right? where are the papers on this? the most recent stuff i can find on this stuff is from gpt-2. is it just that openai has gpt-3+ locked down so people can't test pieces of it with the required granularity, and doesn't have the imagination to test this sort of thing? i found an article about using the autoencoder of stable diffusion for image compression (14% improvement over jpeg when taking a bunch of extra steps to minimize latent size; cool and all but not that impressive). what happens when you run a book or article or whatever through the latent space of gpt-3 instead of just prompting it "write this in someone else's style"? what's the file size of the latent and how accurately does it come out on the other end, either semantically or in terms of exact wording?
6 notes · View notes
jcmarchi · 15 days
Text
Sapiens: Foundation for Human Vision Models
New Post has been published on https://thedigitalinsider.com/sapiens-foundation-for-human-vision-models/
Sapiens: Foundation for Human Vision Models
The remarkable success of large-scale pretraining followed by task-specific fine-tuning for language modeling has established this approach as a standard practice. Similarly, computer vision methods are progressively embracing extensive data scales for pretraining. The emergence of large datasets, such as LAION5B, Instagram-3.5B, JFT-300M, LVD142M, Visual Genome, and YFCC100M, has enabled the exploration of a data corpus well beyond the scope of traditional benchmarks. Salient work in this domain includes DINOv2, MAWS, and AIM. DINOv2 achieves state-of-the-art performance in generating self-supervised features by scaling the contrastive iBot method on the LDV-142M dataset. MAWS studies the scaling of masked-autoencoders (MAE) on billion images. AIM explores the scalability of autoregressive visual pretraining similar to BERT for vision transformers. In contrast to these methods, which mainly focus on general image pretraining or zero-shot image classification, Sapiens takes a distinctly human-centric approach: Sapiens’ models leverage a vast collection of human images for pretraining, subsequently fine-tuning for a range of human-related tasks. The pursuit of large-scale 3D human digitization remains a pivotal goal in computer vision. 
Significant progress has been made within controlled or studio environments, yet challenges persist in extending these methods to unconstrained environments. To address these challenges, developing versatile models capable of multiple fundamental tasks, such as key popoint estimation, body-part segmentation, depth estimation, and surface normal prediction from images in natural settings, is crucial. In this work, Sapiens aims to develop models for these essential human vision tasks that generalize to in-the-wild settings. Currently, the largest publicly accessible language models contain upwards of 100B parameters, while the more commonly used language models contain around 7B parameters. In contrast, Vision Transformers (ViT), despite sharing a similar architecture, have not been scaled to this extent successfully. While there are notable endeavors in this direction, including the development of a dense ViT-4B trained on both text and images, and the formulation of techniques for the stable training of a ViT-22B, commonly utilized vision backbones still range between 300M to 600M parameters and are primarily pre-trained at an image resolution of about 224 pixels. Similarly, existing transformer-based image generation models, such as DiT, use less than 700M parameters and operate on a highly compressed latent space. To address this gap, Sapiens introduces a collection of large, high-resolution ViT models that are pretrained natively at a 1024-pixel image resolution on millions of human images. 
Sapiens presents a family of models for four fundamental human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Sapiens models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. Sapiens observes that, given the same computational budget, self-supervised pre-training on a curated dataset of human images significantly boosts performance for a diverse set of human-centric tasks. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. The simple model design also brings scalability—model performance across tasks improves as the number of parameters scales from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks, achieving significant improvements over prior state-of-the-art results: 7.6 mAP on Humans-5K (pose), 17.1 mIoU on Humans-2K (part-seg), 22.4% relative RMSE on Hi4D (depth), and 53.5% relative angular error on THuman2 (normal). 
Recent years have witnessed remarkable strides toward generating photorealistic humans in 2D and 3D. The success of these methods is greatly attributed to the robust estimation of various assets such as 2D key points, fine-grained body-part segmentation, depth, and surface normals. However, robust and accurate estimation of these assets remains an active research area, and complicated systems to boost performance for individual tasks often hinder wider adoption. Moreover, obtaining accurate ground-truth annotation in-the-wild is notoriously difficult to scale. Sapiens’ goal is to provide a unified framework and models to infer these assets in-the-wild, unlocking a wide range of human-centric applications for everyone.
Sapiens argues that such human-centric models should satisfy three criteria: generalization, broad applicability, and high fidelity. Generalization ensures robustness to unseen conditions, enabling the model to perform consistently across varied environments. Broad applicability indicates the versatility of the model, making it suitable for a wide range of tasks with minimal modifications. High fidelity denotes the ability of the model to produce precise, high-resolution outputs, essential for faithful human generation tasks. This paper details the development of models that embody these attributes, collectively referred to as Sapiens.
Following insights, Sapiens leverages large datasets and scalable model architectures, key for generalization. For broader applicability, Sapiens adopts the pretrain-then-finetune approach, enabling post-pretraining adaptation to specific tasks with minimal adjustments. This approach raises a critical question: What type of data is most effective for pretraining? Given computational limits, should the emphasis be on collecting as many human images as possible, or is it preferable to pretrain on a less curated set to better reflect real-world variability? Existing methods often overlook the pretraining data distribution in the context of downstream tasks. To study the influence of pretraining data distribution on human-specific tasks, Sapiens collects the Humans-300M dataset, featuring 300 million diverse human images. These un-labelled images are used to pre-train a family of vision transformers from scratch, with parameter counts ranging from 300M to 2B.
Among various self-supervision methods for learning general-purpose visual features from large datasets, Sapiens chooses the masked-autoencoder (MAE) approach for its simplicity and efficiency in pretraining. MAE, having a single-pass inference model compared to contrastive or multi-inference strategies, allows processing a larger volume of images with the same computational resources. For higher fidelity, in contrast to prior methods, Sapiens increases the native input resolution of its pretraining to 1024 pixels, resulting in approximately a 4× increase in FLOPs compared to the largest existing vision backbone. Each model is pretrained on 1.2 trillion tokens. For fine-tuning on human-centric tasks, Sapiens uses a consistent encoder-decoder architecture. The encoder is initialized with weights from pretraining, while the decoder, a lightweight and task-specific head, is initialized randomly. Both components are then fine-tuned end-to-end. Sapiens focuses on four key tasks: 2D pose estimation, body-part segmentation, depth, and normal estimation, as demonstrated in the following figure. 
Consistent with prior studies, Sapiens affirms the critical impact of label quality on the model’s in-the-wild performance. Public benchmarks often contain noisy labels, providing inconsistent supervisory signals during model fine-tuning. At the same time, it is important to utilize fine-grained and precise annotations to align closely with Sapiens’ primary goal of 3D human digitization. To this end, Sapiens proposes a substantially denser set of 2D whole-body key points for pose estimation and a detailed class vocabulary for body part segmentation, surpassing the scope of previous datasets. Specifically, Sapiens introduces a comprehensive collection of 308 key points encompassing the body, hands, feet, surface, and face. Additionally, Sapiens expands the segmentation class vocabulary to 28 classes, covering body parts such as the hair, tongue, teeth, upper/lower lip, and torso. To guarantee the quality and consistency of annotations and a high degree of automation, Sapiens utilizes a multi-view capture setup to collect pose and segmentation annotations. Sapiens also utilizes human-centric synthetic data for depth and normal estimation, leveraging 600 detailed scans from RenderPeople to generate high-resolution depth maps and surface normals. Sapiens demonstrates that the combination of domain-specific large-scale pretraining with limited, yet high-quality annotations leads to robust in-the-wild generalization. Overall, Sapiens’ method shows an effective strategy for developing highly precise discriminative models capable of performing in real-world scenarios without the need for collecting a costly and diverse set of annotations.
Sapiens : Method and Architecture
Sapiens follows the masked-autoencoder (MAE) approach for pretraining. The model is trained to reconstruct the original human image given its partial observation. Like all autoencoders, Sapiens’ model has an encoder that maps the visible image to a latent representation and a decoder that reconstructs the original image from this latent representation. The pretraining dataset consists of both single and multi-human images, with each image resized to a fixed size with a square aspect ratio. Similar to ViT, the image is divided into regular non-overlapping patches with a fixed patch size. A subset of these patches is randomly selected and masked, leaving the rest visible. The proportion of masked patches to visible ones, known as the masking ratio, remains fixed throughout training.
Sapiens’ models exhibit generalization across a variety of image characteristics, including scales, crops, the age and ethnicity of subjects, and the number of subjects. Each patch token in the model accounts for 0.02% of the image area compared to 0.4% in standard ViTs, a 16× reduction—providing fine-grained inter-token reasoning for the models. Even with an increased mask ratio of 95%, Sapiens’ model achieves a plausible reconstruction of human anatomy on held-out samples. The reconstruction of Sapien’s pre-trained model on unseen human images is demonstrated in the following image. 
Furthermore, Sapiens utilizes a large proprietary dataset for pretraining, consisting of approximately 1 billion in-the-wild images, focusing exclusively on human images. The preprocessing involves discarding images with watermarks, text, artistic depictions, or unnatural elements. Sapiens then uses an off-the-shelf person bounding-box detector to filter images, retaining those with a detection score above 0.9 and bounding box dimensions exceeding 300 pixels. Over 248 million images in the dataset contain multiple subjects. 
2D Pose Estimation
The Sapien framework finetunes the encoder and decoder in P across multiple skeletons, including K = 17 [67], K = 133 [55] and a new highly-detailed skeleton, with K = 308, as shown in the following figure.
Compared to existing formats with at most 68 facial key points, Sapien’s annotations consist of 243 facial key points, including representative points around the eyes, lips, nose, and ears. This design is tailored to meticulously capture the nuanced details of facial expressions in the real world. With these key points, the Sapien framework manually annotated 1 million images at 4K resolution from an indoor capture setup. Similar to previous tasks, we set the decoder output channels of the normal estimator N to be 3, corresponding to the xyz components of the normal vector at each pixel. The generated synthetic data is also used as supervision for surface normal estimation.
Sapien : Experiment and Results
Sapiens-2B is pretrained using 1024 A100 GPUs for 18 days with PyTorch. Sapiens uses the AdamW optimizer for all experiments. The learning schedule includes a brief linear warm-up, followed by cosine annealing for pretraining and linear decay for finetuning. All models are pretrained from scratch at a resolution of 1024 × 1024 with a patch size of 16. For finetuning, the input image is resized to a 4:3 ratio, i.e., 1024 × 768. Sapiens applies standard augmentations like cropping, scaling, flipping, and photometric distortions. A random background from non-human COCO images is added for segmentation, depth, and normal prediction tasks. Importantly, Sapiens uses differential learning rates to preserve generalization, with lower learning rates for initial layers and progressively higher rates for subsequent layers. The layer-wise learning rate decay is set to 0.85 with a weight decay of 0.1 for the encoder.
The design specifications of Sapiens are detailed in the following table. Following a specific approach, Sapiens prioritizes scaling models by width rather than depth. Notably, the Sapiens-0.3B model, while architecturally similar to the traditional ViT-Large, consists of twentyfold more FLOPs due to its higher resolution.
Sapiens is fine-tuned for face, body, feet, and hand (K = 308) pose estimation using high-fidelity annotations. For training, Sapiens uses the train set with 1M images, and for evaluation, it uses the test set, named Humans5K, with 5K images. The evaluation follows a top-down approach, where Sapiens uses an off-the-shelf detector for bounding boxes and conducts single human pose inference. Table 3 shows a comparison of Sapiens models with existing methods for whole-body pose estimation. All methods are evaluated on 114 common key points between Sapiens’ 308 key point vocabulary and the 133 key point vocabulary from COCO-WholeBody. Sapiens-0.6B surpasses the current state-of-the-art, DWPose-l, by +2.8 AP. Unlike DWPose, which utilizes a complex student-teacher framework with feature distillation tailored for the task, Sapiens adopts a general encoder-decoder architecture with large human-centric pretraining.
Interestingly, even with the same parameter count, Sapiens models demonstrate superior performance compared to their counterparts. For instance, Sapiens-0.3B exceeds VitPose+-L by +5.6 AP, and Sapiens-0.6B outperforms VitPose+-H by +7.9 AP. Within the Sapiens family, results indicate a direct correlation between model size and performance. Sapiens-2B sets a new state-of-the-art with 61.1 AP, a significant improvement of +7.6 AP over the prior art. Despite fine-tuning with annotations from an indoor capture studio, Sapiens demonstrates robust generalization to real-world scenarios, as shown in the following figure.
Sapiens is fine-tuned and evaluated using a segmentation vocabulary of 28 classes. The train set consists of 100K images, while the test set, Humans-2K, consists of 2K images. Sapiens is compared with existing body-part segmentation methods fine-tuned on the same train set, using the suggested pretrained checkpoints by each method as initialization. Similar to pose estimation, Sapiens shows generalization in segmentation, as demonstrated in the following table.
Interestingly, the smallest model, Sapiens-0.3B, outperforms existing state-of-the-art segmentation methods like Mask2Former and DeepLabV3+ by 12.6 mIoU due to its higher resolution and large human-centric pretraining. Furthermore, increasing the model size further improves segmentation performance. Sapiens-2B achieves the best performance, with 81.2 mIoU and 89.4 mAcc on the test set, in the following figure shows the qualitative results of Sapiens models.
Conclusion
Sapiens represents a significant step toward advancing human-centric vision models into the realm of foundation models. Sapiens models demonstrate strong generalization capabilities across a variety of human-centric tasks. The state-of-the-art performance is attributed to: (i) large-scale pretraining on a curated dataset specifically tailored to understanding humans, (ii) scaled high-resolution and high-capacity vision transformer backbones, and (iii) high-quality annotations on augmented studio and synthetic data. Sapiens models have the potential to become a key building block for a multitude of downstream tasks and provide access to high-quality vision backbones to a significantly wider part of the community. 
1 note · View note
eggshellsareneat · 1 year
Text
Ok, social media bot idea: takes all the profile pictures in a reply section, runs em through an autoencoder, and gives you the statistically average profile picture that's commented on the post.
No clue if it'd work at all, but it'd be funny
4 notes · View notes
kreuzaderny · 2 years
Photo
Tumblr media
A deep-learning search for technosignatures of 820 nearby stars
The goal of the Search for Extraterrestrial Intelligence (SETI) is to quantify the prevalence of technological life beyond Earth via their "technosignatures". One theorized technosignature is narrowband Doppler drifting radio signals. The principal challenge in conducting SETI in the radio domain is developing a generalized technique to reject human radio frequency interference (RFI). Here, we present the most comprehensive deep-learning based technosignature search to date, returning 8 promising ETI signals of interest for re-observation as part of the Breakthrough Listen initiative. The search comprises 820 unique targets observed with the Robert C. Byrd Green Bank Telescope, totaling over 480, hr of on-sky data. We implement a novel beta-Convolutional Variational Autoencoder to identify technosignature candidates in a semi-unsupervised manner while keeping the false positive rate manageably low. This new approach presents itself as a leading solution in accelerating SETI and other transient research into the age of data-driven astronomy.
5 notes · View notes
thedevmaster-tdm · 13 days
Text
youtube
MIND-BLOWING Semantic Data Secrets Revealed in AI and Machine Learning
1 note · View note
womaneng · 1 year
Text
instagram
🎀 𝙒𝙝𝙖𝙩 𝙞𝙨 𝙩𝙝𝙚 𝙙𝙞𝙛𝙛𝙚𝙧𝙚𝙣𝙘𝙚 𝙗𝙚𝙩𝙬𝙚𝙚𝙣 𝙎𝙪𝙥𝙚𝙧𝙫𝙞𝙨𝙚𝙙 𝙡𝙚𝙖𝙧𝙣𝙞𝙣𝙜, 𝙐𝙣𝙨𝙪𝙥𝙚𝙧𝙫𝙞𝙨𝙚𝙙 𝙡𝙚𝙖𝙧𝙣𝙞𝙣𝙜 𝙖𝙣𝙙
𝙍𝙚𝙞𝙣𝙛𝙤𝙧𝙘𝙚𝙢𝙚𝙣𝙩 𝙡𝙚𝙖𝙧𝙣𝙞𝙣𝙜?
Machine learning is the scientific study of algorithms and statistical models that computer systems use to
effectively perform a specific task without using explicit instructions, relying on patterns and inference
instead.
Building a model by learning the patterns of historical data with some relationship between data to make
a data-driven prediction.
🍫🐙 Types of Machine Learning
🐏 Supervised Learning
🐏 Unsupervised Learning
🐏 Reinforcement Learning
☝🐣 𝑺𝒖𝒑𝒆𝒓𝒗𝒊𝒔𝒆𝒅 𝒍𝒆𝒂𝒓𝒏𝒊𝒏𝒈
In a supervised learning model, the algorithm learns on a labeled dataset, to generate reasonable
predictions for the response to new data. (Forecasting outcome of new data)
★ Regression
★Classification
🐘 🎀 𝗨𝗻𝘀𝘂𝗽𝗲𝗿𝘃𝗶𝘀𝗲𝗱 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴
An unsupervised model, in contrast, provides unlabelled data that the algorithm tries to make sense of by
extracting features, co-occurrence and underlying patterns on its own. We use unsupervised learning for
★ Clustering
★ Anomaly detection
★Association
★ Autoencoders
🐊🐸 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴
Reinforcement learning is less supervised and depends on the learning agent in determining the output
solutions by arriving at different possible ways to achieve the best possible solution
.
.
.
#𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁learning #machinelearning #dataanalytics #datascience #python #veribilimi #ai #uk #dataengineering
2 notes · View notes