#multimodal | Explore Tumblr posts and blogs

beyond-mogai-pride-flags · 1 year ago

Text

Sotrigender Pride Flag

Sotrigender or tritrisogender/trisotrigender: trimodal trigender in which someone is iso, trans, and cis; being trisogender as a result of being trigender; or being trigender as a result of being trisomodal.

38 notes · View notes

govindhtech · 2 months ago

Text

Pegasus 1.2: High-Performance Video Language Model

Pegasus 1.2 revolutionises long-form video AI with high accuracy and low latency. Scalable video querying is supported by this commercial tool.

TwelveLabs and Amazon Web Services (AWS) announced that Amazon Bedrock will soon provide Marengo and Pegasus, TwelveLabs' cutting-edge multimodal foundation models. Amazon Bedrock, a managed service, lets developers access top AI models from leading organisations via a single API. With seamless access to TwelveLabs' comprehensive video comprehension capabilities, developers and companies can revolutionise how they search for, assess, and derive insights from video content using AWS's security, privacy, and performance. TwelveLabs models were initially offered by AWS.

Introducing Pegasus 1.2

Unlike many academic contexts, real-world video applications face two challenges:

Real-world videos might be seconds or hours lengthy.

Proper temporal understanding is needed.

TwelveLabs is announcing Pegasus 1.2, a substantial industry-grade video language model upgrade, to meet commercial demands. Pegasus 1.2 interprets long films at cutting-edge levels. With low latency, low cost, and best-in-class accuracy, model can handle hour-long videos. Their embedded storage ingeniously caches movies, making it faster and cheaper to query the same film repeatedly.

Pegasus 1.2 is a cutting-edge technology that delivers corporate value through its intelligent, focused system architecture and excels in production-grade video processing pipelines.

Superior video language model for extended videos

Business requires handling long films, yet processing time and time-to-value are important concerns. As input films increase longer, a standard video processing/inference system cannot handle orders of magnitude more frames, making it unsuitable for general adoption and commercial use. A commercial system must also answer input prompts and enquiries accurately across larger time periods.

Latency

To evaluate Pegasus 1.2's speed, it compares time-to-first-token (TTFT) for 3–60-minute videos utilising frontier model APIs GPT-4o and Gemini 1.5 Pro. Pegasus 1.2 consistently displays time-to-first-token latency for films up to 15 minutes and responds faster to lengthier material because to its video-focused model design and optimised inference engine.

Performance

Pegasus 1.2 is compared to frontier model APIs using VideoMME-Long, a subset of Video-MME that contains films longer than 30 minutes. Pegasus 1.2 excels above all flagship APIs, displaying cutting-edge performance.

Pricing

Cost Pegasus 1.2 provides best-in-class commercial video processing at low cost. TwelveLabs focusses on long videos and accurate temporal information rather than everything. Its highly optimised system performs well at a competitive price with a focused approach.

Better still, system can generate many video-to-text without costing much. Pegasus 1.2 produces rich video embeddings from indexed movies and saves them in the database for future API queries, allowing clients to build continually at little cost. Google Gemini 1.5 Pro's cache cost is $4.5 per hour of storage, or 1 million tokens, which is around the token count for an hour of video. However, integrated storage costs $0.09 per video hour per month, x36,000 less. Concept benefits customers with large video archives that need to understand everything cheaply.

Model Overview & Limitations

Architecture

Pegasus 1.2's encoder-decoder architecture for video understanding includes a video encoder, tokeniser, and big language model. Though efficient, its design allows for full textual and visual data analysis.

These pieces provide a cohesive system that can understand long-term contextual information and fine-grained specifics. It architecture illustrates that tiny models may interpret video by making careful design decisions and solving fundamental multimodal processing difficulties creatively.

Restrictions

Safety and bias

Pegasus 1.2 contains safety protections, but like any AI model, it might produce objectionable or hazardous material without enough oversight and control. Video foundation model safety and ethics are being studied. It will provide a complete assessment and ethics report after more testing and input.

Hallucinations

Occasionally, Pegasus 1.2 may produce incorrect findings. Despite advances since Pegasus 1.1 to reduce hallucinations, users should be aware of this constraint, especially for precise and factual tasks.

#technology #technews #govindhtech #news #technologynews #AI #artificial intelligence #Pegasus 1.2 #TwelveLabs #Amazon Bedrock #Gemini 1.5 Pro #multimodal #API

2 notes · View notes

eclecticsophism · 1 year ago

Text

any experienced multimodal analysts have any Thoughts on ELAN? i'm on a hunt for a mac-compatible software for annotating vids that allows for a customizable coding scheme. lots of the ones i've seen are for conversation analysis -- which is awesome, but not aligned with my needs

#michelle's thesis #yes a new tag lol #gradblr #ELAN #multimodal #LOL i have no idea what to tag this so ppl can see it #conversation analysis #studyblr #research #phdblr #graduate school #grad student #grad school #grad studies #救命 #for context i'm analyzing long-form video essays -- a descriptive sort of component descriptive analysis?#so the often crazy and chaotic multimodal/semiotic entanglements ... warrant a software #personal

2 notes · View notes

simplylaurent · 2 years ago

Text

Content of Multimodality

The image attached above is the graphic I created as a multimodal resource. The image displays the eight concepts of rhetoric, serving as a guide into the complexities of writing. Specifically, how multiple variables influence the literary technique of the writer and the receptive perception of the viewer. Created in a well orchestrated diagram, the graphic shows the viewer framework of each concept in relation to another– much displaying how rhetoric isn’t effective if one piece is missing from the “symmetric�� image. In course of the definitions, they were added as “mini notes” for the individual concepts of rhetoric for people like me who may be unfamiliar with one or two terms. Being a person who had never really knew what discourse community was, I found the graphic to be helpful in remembering the premise of it through a memorable layout.

#writ318mu #multimodal

5 notes · View notes

statistical-distr-of-polls · 3 months ago

Text

Shape: E (Multimodal, Roughly Symmetrical)

#s #poll #multimodal #roughly symmetrical #E

25K notes · View notes

johniac · 3 months ago

Text

SciTech Chronicles. . . . . . . . .Mar 24th, 2025

#genes #chromosome #hippocampus #myelin #Aardvark #multimodal #GFS #democratising #cognitive #assessment #Human-AI #Collaboration #E2A #192Tbit/s #Alcatel #2028 #genomics #bioinformatics #antimicrobial

0 notes

statistical-distr-of-polls · 1 month ago

Text

oops I missed this the first time 😭

Shape: Multimodal (?), Skewed Left

and did you have to look it up

#s #poll #multimodal #skewed left

24K notes · View notes

thelovebudllc · 4 months ago

Text

Nanoplatform strategy for multimodal cancer therapy and tumor visualization

Announcing a new publication for Acta Materia Medica journal. Treatment of cancer can be challenging, because of the disease’s intricate and varied nature. Consequently, developing nanomedicines with multimodal therapeutic capabilities for precise tumor therapy holds substantial promise in advancing cancer treatment. Herein, a nanoplatform strategy involving supramolecular photosensitizers…

View On WordPress

#Cancer #multimodal #Nanoplatform #strategy #therapy #tumor #visualization

0 notes

highlands11 · 4 months ago

Text

INTERFACES ET APPROCHE MULTIMODALE : UN CERTAIN AVENIR DE L’IA

Temps de lecture : 6 minutes (+ vidéos)mots-clés : IA, multimodal, Proof of Concept, architecture, enseignement, apprentissage, échecNombre de « pages » : 2 En bref : En décembre 2024, Google a repris sa place dans la recherche de l’IA alors qu’Open AI se démène avec ses problèmes entropiques. Google a non seulement proposé un Gemini plus performant, mais surtout multimodal. Probablement…

#apprentissage #architecture #échec #enseignement #IA #multimodal #Proof of Concept

0 notes

jamalir · 4 months ago

Text

Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL! | Qwen

#llm #multimodal #deep learning #qwen #vision #document understanding

0 notes

statistical-distr-of-polls · 7 months ago

Text

Shape: Multimodal, Roughly Symmetrical (?)

Target audience

#poll #s #multimodal #roughly symmetrical

402 notes · View notes

sokolygrandaananeva · 6 months ago

Text

TRASMVTATO Magazine

0 notes

ortmoragency · 6 months ago

Text

Discover Project Astra, powered by Gemini 2.0: a revolutionary AI assistant with cross-device intelligence, multimodal understanding, and seamless adaptability.

#ortmoragency #deliveringdigitalhappiness #artificialintelligence #AIRevolution #FutureOfAI #AIInnovation #smarttechnology #AIFuture #MachineLearning #IntelligentSystems #AIForGood #TechEvolution #ProjectAstra #Gemini #revolutionary #nextgenassistant #multimodal #smartadaptation #seamlesstechnology

0 notes

mrstrangecase · 6 months ago

Text

Selfie for the haters <3

#hyde #multimodal

0 notes

statistical-distr-of-polls · 25 days ago

Text

Shape: Multimodal, Skewed Right

No im not putting a fucking graduated cylinder contraptions only

#s #poll #multimodal #skewed right #I have a therory about why the distribution is the way it is but I shan't say

181 notes · View notes

floatingflock · 6 months ago

Text

Week 9, October 23rd, 2024

We’ve arrived at the ninth week of the course. We were past the half-way point in the semester. I was feeling comfortable in my understanding of improvisational performing, particularly with my small number of group mates. We were experimenting with combining visual and audio elements together in this performance. My group mate built a MAX patch that made for some cool audio outputs, and my other group mate brought in a hand-made circuit board that also made for some cool audio!! In our first go, I was on a midi that controlled the delay and X Y Z planes of the visuals. I also controlled the volume of my other group mates who were producing audio. This performance was, by far, my most favourite of the course. It was here that i began to develop a taste for the style and elements I enjoyed in an improvisation performance.

I’ve provided some stills from the taped performance. For the link to videos: https://drive.google.com/file/d/1--fQ-ZLiJezhbJldqH9OtdidCRyhiEdc/view?usp=drivesdk ,

#AME333 #IMPROVISINGCYBORGS #MULTIMODAL #MAXMSP

0 notes