#Data Sources | Explore Tumblr posts and blogs

#Data Sources

Explore tagged Tumblr posts

Visit Tumblr Blog

Explore Tumblr blogs with no restrictions, modern design and the best experience.

Last Seen Tumblr Blogs

skrunkvibes

SkrunkVibes

73 posts

javajackal

JAVA JACKAL∠( ᐛ 」∠)＿

2K posts

capitalbarbie

Capital Barbie

559 posts

hinatasss

花鳥風月

11K posts

texaportglasgow

Texaport

15 posts

Fun Fact

US Tumblr user growth rate is estimated to slow down to 4.1%.

manmishra · 2 months ago

Text

🚀 Exciting news! Google has launched Gemini 2.0 and AI Mode, transforming how we search. Get ready for faster, smarter responses to complex queries! Explore the future of AI in search today! #GoogleAI #Gemini2 #AIMode #SearchInnovation

0 notes

carolxia · 6 months ago

Text

Data sources (not completed)

(1) Factory worker payroll (not yet completed code change) (2) Hand workers' salaries (haven't found the data yet) (3) Luxury brand financial reports (don't know how to grab the data)

#data sources

0 notes

madhukumarc · 1 year ago

Text

Where can businesses find reliable data for audience segmentation?

Sources Businesses Find Reliable Data for Audience Segmentation:

Businesses can find reliable data for audience segmentation from various sources, ensuring accuracy and relevance for targeted marketing efforts.

Some key sources include:

1. Zero-Party Data:

Data that is intentionally and proactively shared by customers with businesses, enables them to personalize their marketing strategies more effectively. Also, refer to email marketing data.

2. First-Party Data:

Data obtained directly from customer interactions on websites and sales channels. Provides authentic insights into customer behavior and preferences.

Refer also to Google Analytics, Google Search Console, and other marketing analytics platforms.

3. Surveys and Interviews:

Gather accurate, basic information about audience interests and behaviors. Offers insights directly from the audience, helping to understand their needs and motivations.

4. Behavioral Data:

This presents the behavior of the audience, including their actions on websites. Provides valuable insights into audience engagement and preferences. Heat maps can come to the rescue.

5. Social Media Platforms:

This delivers information about audience interactions, such as likes, shares, saves, and comments.

Offers insights into the social engagement patterns of the audience and the type of content that is most engaging.

6. Salespeople Interactions:

This provides firsthand insights into customer preferences, objections, behaviors, and needs, which can be valuable for refining audience segmentation strategies.

7. Customer Service Calls and Feedback:

This offers direct feedback and firsthand information about customer preferences, needs, concerns, and behaviors.

This helps in understanding customer segments and tailoring marketing strategies to better meet their needs.

Image Content Source - LinkedIn Ads Privacy Playbook

In summary, by leveraging these diverse sources, businesses can gather comprehensive and reliable data for audience segmentation, enabling them to tailor their marketing strategies effectively.

Here's related information that you may also find helpful – Marketing Automation Statistics [Accelerate Efficiency and Sales].

#data sources #digital marketing #zero party data #first party data #audience segmentation #marketing analytics

0 notes

thinkaicorp · 1 year ago

Text

Seamless Connectivity: Unleashing Efficiency with Enterprise Integration Services

Elevate your business's operational prowess with our cutting-edge Enterprise Integration Services. We seamlessly unite disparate systems, applications, and processes, fostering real-time collaboration, data accuracy, and streamlined workflows. Experience heightened efficiency and agility as your enterprise embraces a new era of interconnected excellence.

#Enterprise integration services #data sources #integration solutions #data exchange #integrating services #integration scenarios #enterprise integration patterns #business process

0 notes

marketxcel · 2 years ago

Text

Primary Research vs Secondary Research: Definitions, Differences, and Examples

Both primary and secondary research holds a significant place in the researcher’s toolkit. Primary research facilitates the collection of fresh, original data, while secondary research leverages existing information to provide context and insights.

0 notes

opensuse-official · 10 months ago

Text

I think it is very cool how tech companies, schools, employers, and universities make it actively difficult to distance yourself from Google, Microsoft, and Apple.

Yes most Linux distros are very stable, way more secure, privacy friendly, and way more customizable. But every institution is built to make technological independence as difficult as possible.

Yelling on the internet that everyone should switch to Linux and FOSS really ignores how much of the technological world is designed to not let that happen.

#yes switch to linux if you can #Data privacy and security needs to be addressed on a much larger legal scale #you cant consume your way out of this my friends #opensuse #linux #open source #data privacy

716 notes · View notes

warpfactor9 · 2 months ago

Text

i feel like we as a community don't dwell on the fact that data drinks lube often enough. does he take it like a shooter for maximum efficiency? does he mix it into a silly cocktail with a little umbrella and sip it with a straw to be social? i think about it constantly

#source: he mentions this in 3x13 Deja Q at about the 18 minute mark #the next generation #star trek #data soong #rambles #star trek: tng #deja q #tng: 3x13

209 notes · View notes

heertohbadisadhai · 2 months ago

Text

adding citations to a document is so annoying like do you not trust me with shit? am i not credible enough for you?

87 notes · View notes

jcmarchi · 7 days ago

Text

Using AI to Predict a Blockbuster Movie

New Post has been published on https://thedigitalinsider.com/using-ai-to-predict-a-blockbuster-movie/

Using AI to Predict a Blockbuster Movie

Although film and television are often seen as creative and open-ended industries, they have long been risk-averse. High production costs (which may soon lose the offsetting advantage of cheaper overseas locations, at least for US projects) and a fragmented production landscape make it difficult for independent companies to absorb a significant loss.

Therefore, over the past decade, the industry has taken a growing interest in whether machine learning can detect trends or patterns in how audiences respond to proposed film and television projects.

The main data sources remain the Nielsen system (which offers scale, though its roots lie in TV and advertising) and sample-based methods such as focus groups, which trade scale for curated demographics. This latter category also includes scorecard feedback from free movie previews – however, by that point, most of a production’s budget is already spent.

The ‘Big Hit’ Theory/Theories

Initially, ML systems leveraged traditional analysis methods such as linear regression, K-Nearest Neighbors, Stochastic Gradient Descent, Decision Tree and Forests, and Neural Networks, usually in various combinations nearer in style to pre-AI statistical analysis, such as a 2019 University of Central Florida initiative to forecast successful TV shows based on combinations of actors and writers (among other factors):

A 2018 study rated the performance of episodes based on combinations of characters and/or writer (most episodes were written by more than one person). Source: https://arxiv.org/pdf/1910.12589

The most relevant related work, at least that which is deployed in the wild (though often criticized) is in the field of recommender systems:

A typical video recommendation pipeline. Videos in the catalog are indexed using features that may be manually annotated or automatically extracted. Recommendations are generated in two stages by first selecting candidate videos and then ranking them according to a user profile inferred from viewing preferences. Source: https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2023.1281614/full

However, these kinds of approaches analyze projects that are already successful. In the case of prospective new shows or movies, it is not clear what kind of ground truth would be most applicable – not least because changes in public taste, combined with improvements and augmentations of data sources, mean that decades of consistent data is usually not available.

This is an instance of the cold start problem, where recommendation systems must evaluate candidates without any prior interaction data. In such cases, traditional collaborative filtering breaks down, because it relies on patterns in user behavior (such as viewing, rating, or sharing) to generate predictions. The problem is that in the case of most new movies or shows, there is not yet enough audience feedback to support these methods.

Comcast Predicts

A new paper from Comcast Technology AI, in association with George Washington University, proposes a solution to this problem by prompting a language model with structured metadata about unreleased movies.

The inputs include cast, genre, synopsis, content rating, mood, and awards, with the model returning a ranked list of likely future hits.

The authors use the model’s output as a stand-in for audience interest when no engagement data is available, hoping to avoid early bias toward titles that are already well known.

The very short (three-page) paper, titled Predicting Movie Hits Before They Happen with LLMs, comes from six researchers at Comcast Technology AI, and one from GWU, and states:

‘Our results show that LLMs, when using movie metadata, can significantly outperform the baselines. This approach could serve as an assisted system for multiple use cases, enabling the automatic scoring of large volumes of new content released daily and weekly.

‘By providing early insights before editorial teams or algorithms have accumulated sufficient interaction data, LLMs can streamline the content review process.

‘With continuous improvements in LLM efficiency and the rise of recommendation agents, the insights from this work are valuable and adaptable to a wide range of domains.’

If the approach proves robust, it could reduce the industry’s reliance on retrospective metrics and heavily-promoted titles by introducing a scalable way to flag promising content prior to release. Thus, rather than waiting for user behavior to signal demand, editorial teams could receive early, metadata-driven forecasts of audience interest, potentially redistributing exposure across a wider range of new releases.

Method and Data

The authors outline a four-stage workflow: construction of a dedicated dataset from unreleased movie metadata; the establishment of a baseline model for comparison; the evaluation of apposite LLMs using both natural language reasoning and embedding-based prediction; and the optimization of outputs through prompt engineering in generative mode, using Meta’s Llama 3.1 and 3.3 language models.

Since, the authors state, no publicly available dataset offered a direct way to test their hypothesis (because most existing collections predate LLMs, and lack detailed metadata), they built a benchmark dataset from the Comcast entertainment platform, which serves tens of millions of users across direct and third-party interfaces.

The dataset tracks newly-released movies, and whether they later became popular, with popularity defined through user interactions.

The collection focuses on movies rather than series, and the authors state:

‘We focused on movies because they are less influenced by external knowledge than TV series, improving the reliability of experiments.’

Labels were assigned by analyzing the time it took for a title to become popular across different time windows and list sizes. The LLM was prompted with metadata fields such as genre, synopsis, rating, era, cast, crew, mood, awards, and character types.

For comparison, the authors used two baselines: a random ordering; and a Popular Embedding (PE) model (which we will come to shortly).

The project used large language models as the primary ranking method, generating ordered lists of movies with predicted popularity scores and accompanying justifications – and these outputs were shaped by prompt engineering strategies designed to guide the model’s predictions using structured metadata.

The prompting strategy framed the model as an ‘editorial assistant’ assigned with identifying which upcoming movies were most likely to become popular, based solely on structured metadata, and then tasked with reordering a fixed list of titles without introducing new items, and to return the output in JSON format.

Each response consisted of a ranked list, assigned popularity scores, justifications for the rankings, and references to any prior examples that influenced the outcome. These multiple levels of metadata were intended to improve the model’s contextual grasp, and its ability to anticipate future audience trends.

Tests

The experiment followed two main stages: initially, the authors tested several model variants to establish a baseline, involving the identification of the version which performed better than a random-ordering approach.

Second, they tested large language models in generative mode, by comparing their output to a stronger baseline, rather than a random ranking, raising the difficulty of the task.

This meant the models had to do better than a system that already showed some ability to predict which movies would become popular. As a result, the authors assert, the evaluation better reflected real-world conditions, where editorial teams and recommender systems are rarely choosing between a model and chance, but between competing systems with varying levels of predictive ability.

The Advantage of Ignorance

A key constraint in this setup was the time gap between the models’ knowledge cutoff and the actual release dates of the movies. Because the language models were trained on data that ended six to twelve months before the movies became available, they had no access to post-release information, ensuring that the predictions were based entirely on metadata, and not on any learned audience response.

Baseline Evaluation

To construct a baseline, the authors generated semantic representations of movie metadata using three embedding models: BERT V4; Linq-Embed-Mistral 7B; and Llama 3.3 70B, quantized to 8-bit precision to meet the constraints of the experimental environment.

Linq-Embed-Mistral was selected for inclusion due to its top position on the MTEB (Massive Text Embedding Benchmark) leaderboard.

Each model produced vector embeddings of candidate movies, which were then compared to the average embedding of the top one hundred most popular titles from the weeks preceding each movie’s release.

Popularity was inferred using cosine similarity between these embeddings, with higher similarity scores indicating higher predicted appeal. The ranking accuracy of each model was evaluated by measuring performance against a random ordering baseline.

Performance improvement of Popular Embedding models compared to a random baseline. Each model was tested using four metadata configurations: V1 includes only genre; V2 includes only synopsis; V3 combines genre, synopsis, content rating, character types, mood, and release era; V4 adds cast, crew, and awards to the V3 configuration. Results show how richer metadata inputs affect ranking accuracy. Source: https://arxiv.org/pdf/2505.02693

The results (shown above), demonstrate that BERT V4 and Linq-Embed-Mistral 7B delivered the strongest improvements in identifying the top three most popular titles, although both fell slightly short in predicting the single most popular item.

BERT was ultimately selected as the baseline model for comparison with the LLMs, as its efficiency and overall gains outweighed its limitations.

LLM Evaluation

The researchers assessed performance using two ranking approaches: pairwise and listwise. Pairwise ranking evaluates whether the model correctly orders one item relative to another; and listwise ranking considers the accuracy of the entire ordered list of candidates.

This combination made it possible to evaluate not only whether individual movie pairs were ranked correctly (local accuracy), but also how well the full list of candidates reflected the true popularity order (global accuracy).

Full, non-quantized models were employed to prevent performance loss, ensuring a consistent and reproducible comparison between LLM-based predictions and embedding-based baselines.

Metrics

To assess how effectively the language models predicted movie popularity, both ranking-based and classification-based metrics were used, with particular attention to identifying the top three most popular titles.

Four metrics were applied: Accuracy@1 measured how often the most popular item appeared in the first position; Reciprocal Rank captured how high the top actual item ranked in the predicted list by taking the inverse of its position; Normalized Discounted Cumulative Gain (NDCG@k) evaluated how well the entire ranking matched actual popularity, with higher scores indicating better alignment; and Recall@3 measured the proportion of truly popular titles that appeared in the model’s top three predictions.

Since most user engagement happens near the top of ranked menus, the evaluation focused on lower values of k, to reflect practical use cases.

Performance improvement of large language models over BERT V4, measured as percentage gains across ranking metrics. Results were averaged over ten runs per model-prompt combination, with the top two values highlighted. Reported figures reflect the average percentage improvement across all metrics.

The performance of Llama model 3.1 (8B), 3.1 (405B), and 3.3 (70B) was evaluated by measuring metric improvements relative to the earlier-established BERT V4 baseline. Each model was tested using a series of prompts, ranging from minimal to information-rich, to examine the effect of input detail on prediction quality.

The authors state:

‘The best performance is achieved when using Llama 3.1 (405B) with the most informative prompt, followed by Llama 3.3 (70B). Based on the observed trend, when using a complex and lengthy prompt (MD V4), a more complex language model generally leads to improved performance across various metrics. However, it is sensitive to the type of information added.’

Performance improved when cast awards were included as part of the prompt – in this case, the number of major awards received by the top five billed actors in each film. This richer metadata was part of the most detailed prompt configuration, outperforming a simpler version that excluded cast recognition. The benefit was most evident in the larger models, Llama 3.1 (405B) and 3.3 (70B), both of which showed stronger predictive accuracy when given this additional signal of prestige and audience familiarity.

By contrast, the smallest model, Llama 3.1 (8B), showed improved performance as prompts became slightly more detailed, progressing from genre to synopsis, but declined when more fields were added, suggesting that the model lacked the capacity to integrate complex prompts effectively, leading to weaker generalization.

When prompts were restricted to genre alone, all models under-performed against the baseline, demonstrating that limited metadata was insufficient to support meaningful predictions.

Conclusion

LLMs have become the poster child for generative AI, which might explain why they’re being put to work in areas where other methods could be a better fit. Even so, there’s still a lot we don’t know about what they can do across different industries, so it makes sense to give them a shot.

In this particular case, as with stock markets and weather forecasting, there is only a limited extent to which historical data can serve as the foundation of future predictions. In the case of movies and TV shows, the very delivery method is now a moving target, in contrast to the period between 1978-2011, when cable, satellite and portable media (VHS, DVD, et al.) represented a series of transitory or evolving historical disruptions.

Neither can any prediction method account for the extent to which the success or failure of other productions may influence the viability of a proposed property – and yet this is frequently the case in the movie and TV industry, which loves to ride a trend.

Nonetheless, when used thoughtfully, LLMs could help strengthen recommendation systems during the cold-start phase, offering useful support across a range of predictive methods.

First published Tuesday, May 6, 2025

0 notes

crow-caller · 2 months ago

Text

A lot of people think autism research is solely this vague grouping of evil non autistic people guessing at things from afar, but as much as that happens, I want to inform you with insider knowledge a lot of modern autism research is done by autistic people!

And a fun fact related to this: "autism" is a common special interest! As in, a lot of autistic people have autism itself as a special interest (esp women, perhaps bc they're likely to be late or self diagnosed). People with a special interest in autism are also more likely to get involved in autism research as participants, and thus there's a known overrepresention of it as a special interest in data

#that post of “how you can tell a study was designed by an autistic person” frustrates me bc. i know its not everywhere but its MUCH BETTER #tumblr just i think assumes we're still 20 years ago. i promise you a ton of academics working on autism are in fact autistic #and theres a good push for involving autistic people in general w design feedback data everything.#source: friend is cited on the autism wikipedia page <- extremely powerful statement deserving of autism crown #but more specifically just a top uni psychology department. i dont get academia much but ive heard a lot #yes theres stuff lke “is camel milk good for autistic people” (REAL STUDY) but like. the autism is coming from inside the lab #also the post annoys me bc good clarifying phrasing is key to ANY study regardless of anyone being autistic #“overrepp in data” is a funny phrase everyone matters but ppl w special interest in autism are just obvs more keen to. get involved w autis

96 notes · View notes

cyle · 3 months ago

Text

still confused how to make any of these LLMs useful to me.

while my daughter was napping, i downloaded lm studio and got a dozen of the most popular open source LLMs running on my PC, and they work great with very low latency, but i can't come up with anything to do with them but make boring toy scripts to do stupid shit.

as a test, i fed deepseek r1, llama 3.2, and mistral-small a big spreadsheet of data we've been collecting about my newborn daughter (all of this locally, not transmitting anything off my computer, because i don't want anybody with that data except, y'know, doctors) to see how it compared with several real doctors' advice and prognoses. all of the LLMs suggestions were between generically correct and hilariously wrong. alarmingly wrong in some cases, but usually ending with the suggestion to "consult a medical professional" -- yeah, duh. pretty much no better than old school unreliable WebMD.

then i tried doing some prompt engineering to punch up some of my writing, and everything ended up sounding like it was written by an LLM. i don't get why anybody wants this. i can tell that LLM feel, and i think a lot of people can now, given the horrible sales emails i get every day that sound like they were "punched up" by an LLM. it's got a stink to it. maybe we'll all get used to it; i bet most non-tech people have no clue.

i may write a small script to try to tag some of my blogs' posts for me, because i'm really bad at doing so, but i have very little faith in the open source vision LLMs' ability to classify images. it'll probably not work how i hope. that still feels like something you gotta pay for to get good results.

all of this keeps making me think of ffmpeg. a super cool, tiny, useful program that is very extensible and great at performing a certain task: transcoding media. it used to be horribly annoying to transcode media, and then ffmpeg came along and made it all stupidly simple overnight, but nobody noticed. there was no industry bubble around it.

LLMs feel like they're competing for a space that ubiquitous and useful that we'll take for granted today like ffmpeg. they just haven't fully grasped and appreciated that smallness yet. there isn't money to be made here.

#machine learning #parenting #ai critique #data privacy #medical advice #writing enhancement #blogging tools #ffmpeg #open source software #llm limitations #ai generated tags

61 notes · View notes

gettheorion · 8 months ago

Text

#source: imgur #star trek #star trek memes #star trek tng #data tng

79 notes · View notes

musashi · 6 months ago

Text

i am begging the other leftists on my fucking dash to stop reblogging anti-voting stuff until the election is over. there are so many doomers on this website who do not have the critical thinking skills to fucking use their brains and tons of them will be genuinely swayed by what they see online.

for the love of god, queue it for after november 5th. queue all your criticism for then. unleash the fucking beast after the election is over. but it is so fucking irresponsible to be reblogging that shit now.

#wordy wendy #just saw this big ass video on my dash of people in gaza saying the election didnt matter #no link back to the source #so obviously just cherrypicked responses #no additional data. just a viralbait video about how voting won't fix the genocide #my dude. that is not the question. the question is how do we fucking minimize casualities #and who will be the easier president to fucking thrash and bully into making some semblance of progress toward a ceasefire.#you cannot be reblogging shit like that you guys. it is blatant propaganda regardless of if its coming from the epicenter of things.#propaganda does not always come in the form of some boogeyman #most of the time it targets exactly what you feel most passionately about #and makes your complacency feel like righteous action

83 notes · View notes

kittykatninja321 · 3 months ago

Text

what’s crazy is that just a year ago I was using the data on the CDC website to research HIV statistics/history for a college project, and right now someone who got the same assignment I did might be having a much harder time with that project because that information is being censored by the government

#Me and weird emotional attachment I developed to the cdc data sheets. If nobody got me the cdc data base got me no longer true I fear #And let me tell you it is kinda difficult to do research on American hiv statistics without the cdc’s centralized information #not to mention the info on prep that’s harder to find now. At least for normal people just looking for guidance there’s other #sources. For people doing research this is kinda horrible but at least we have the way back machine

22 notes · View notes