#AI Data sourcing | Explore Tumblr posts and blogs

cogitotech · 3 months ago

Text

Cogito Tech Introduces DataSum, a New Standard for Ethical and Transparent AI Data Sourcing

The rapid advancement of artificial intelligence relies on vast amounts of training data, yet concerns persist regarding its sourcing, labeling, and ethical implications. Issues such as biased AI models, mislabeled datasets, and exploitative labor conditions in data annotation have underscored the need for greater transparency in data sourcing. read more on Cogito Tech Introduces DataSum, a New Standard for Ethical and Transparent AI Data Sourcing

#ai data sourcing #data annotation #datasum

0 notes

cyle · 5 months ago

Text

still confused how to make any of these LLMs useful to me.

while my daughter was napping, i downloaded lm studio and got a dozen of the most popular open source LLMs running on my PC, and they work great with very low latency, but i can't come up with anything to do with them but make boring toy scripts to do stupid shit.

as a test, i fed deepseek r1, llama 3.2, and mistral-small a big spreadsheet of data we've been collecting about my newborn daughter (all of this locally, not transmitting anything off my computer, because i don't want anybody with that data except, y'know, doctors) to see how it compared with several real doctors' advice and prognoses. all of the LLMs suggestions were between generically correct and hilariously wrong. alarmingly wrong in some cases, but usually ending with the suggestion to "consult a medical professional" -- yeah, duh. pretty much no better than old school unreliable WebMD.

then i tried doing some prompt engineering to punch up some of my writing, and everything ended up sounding like it was written by an LLM. i don't get why anybody wants this. i can tell that LLM feel, and i think a lot of people can now, given the horrible sales emails i get every day that sound like they were "punched up" by an LLM. it's got a stink to it. maybe we'll all get used to it; i bet most non-tech people have no clue.

i may write a small script to try to tag some of my blogs' posts for me, because i'm really bad at doing so, but i have very little faith in the open source vision LLMs' ability to classify images. it'll probably not work how i hope. that still feels like something you gotta pay for to get good results.

all of this keeps making me think of ffmpeg. a super cool, tiny, useful program that is very extensible and great at performing a certain task: transcoding media. it used to be horribly annoying to transcode media, and then ffmpeg came along and made it all stupidly simple overnight, but nobody noticed. there was no industry bubble around it.

LLMs feel like they're competing for a space that ubiquitous and useful that we'll take for granted today like ffmpeg. they just haven't fully grasped and appreciated that smallness yet. there isn't money to be made here.

#machine learning #parenting #ai critique #data privacy #medical advice #writing enhancement #blogging tools #ffmpeg #open source software #llm limitations #ai generated tags

61 notes · View notes

queen-mabs-revenge · 1 month ago

Text

communist generative ai boosters on this website truly like

#generative ai #yes the cheating through school arguments can skew into personal chastisement instead of criticising the for-profit education system #that's hostile to learning in the first place #and yes the copyright defense is self-defeating and goofy #yes yeeeeeeeeeees i get it but fucking hell now the concept of art is bourgeois lmaao contrarian ass reactionary bullshit #whYYYYYYY are you fighting the alienation war on the side of alienation????#fucking unhinged cold-stream marxism really is just like -- what the fuck are you even fighting for? what even is the point of you?#sorry idk i just think that something that is actively and exponentially heightening capitalist alienation #while calcifying hyper-extractive private infrastructure to capture all energy production as we continue descending into climate chaos #and locking skills that our fucking species has cultivated through centuries of communicative learning behind an algorithmic black box #and doing it on the back of hyperexploitation of labour primarily in the neocolonial world #to try and sort and categorise the human experience into privately owned and traded bits of data capital #explicitly being used to streamline systematic emiseration and further erode human communal connection #OH I DON'T KNOW seems kind of bad!#seems kind of antithetical to and violent against the working class and our class struggle?#seems like everything - including technology - has a class character and isn't just neutral tools we can bend to our benefit #it is literally an exploitation; extraction; and alienation machine - idk maybe that isn't gonna aid the struggle #and flourishing of the full panoply of human experience that - i fucking hope - we're fighting for???#for the fullness of human creative liberation that can only come through the first step of socialist revolution???#that's what i'm fighting for anyway - idk what the fuck some of you are doing #fucking brittle economic marxists genuinely defending a technology that is demonstrably violent to the sources of all value:#the soil and the worker #but sure it'll be fine - abundance babey!#WHEW.

9 notes · View notes

bananonbinary · 10 months ago

Text

god i miss when the internet wasn't garbage. you can't google anything these days without whatever answer you're looking for getting lost under a deluge of seo ai bullshit. cannot wait until the bubble pops and we might get useable search engines again.

#and yes. reddit is OFTEN a decent place to find those sorts of answers #but not always #like i just wanted to find out if theres any data on animals cracking their joints #and if not. why are humans so special in doing that?#i can get anecdotal evidence from reddit but nothing from actual experts #all the articles i got online were so obviously ai #they never named sources just 'a behavioral expert said this' and 'a vet said that'#which. if you wont even tell me those supposed experts' names i am SUSPICIOUS

22 notes · View notes

pikslasrce · 2 years ago

Text

i get the outrage over the ai generated mv bc i agree however it irks me that people keep pointing out the wonky/extra fingers/etc as a gotcha bc i think thats the whole point of using ai for the video they wanted the "flaws" that come with it that 'ai generated uncanny valley' vibe like even tho i disagree w it on an ethical level i do get where theyre coming from artistic direction-wise

#my ONLY gripe with ai generated art is the lack of regulation and like. the way the data is sourced.#piksla.txt #ls dunes debacle

39 notes · View notes

onlinemarketingcash4ublogspotcom · 5 months ago

Text

youtube

#digital marketing #@desmondjohnson183 #marketing strategy #DeepSeek AI #digital marketing AI #open-source AI #AI in marketing #AI-driven content creation #predictive marketing #AI chatbots #AI-powered advertising #voice search optimization #influencer marketing AI #ethical AI #data analytics #AI customer engagement #AI-powered SEO #future of digital marketing.#Youtube

3 notes · View notes

rjohnson49la · 5 months ago

Text

#digital marketing #onlinemarketingtips #seo services #DeepSeek AI #digital marketing AI #open-source AI #AI in marketing #AI-driven content creation #predictive marketing #AI chatbots #AI-powered advertising #voice search optimization #influencer marketing AI #ethical AI #data analytics #AI customer engagement

3 notes · View notes

yddaw · 1 year ago

Text

Sometimes it’s unfortunate seeing that a lot of people are anti [insert technology here]. It makes sense of course, but it seems like the idea being shared is that the technological tool itself is “bad” but not the company using it.

Like Chromium is not the same thing as Chrome itself. And AI is not only for stealing content and reselling it. But having so many companies do this and use these tools with little to no regulation (specifically on privacy) paints such a nasty image for the tool that has so much potential 😩

#we LOVE the tools but hate how they’re being used BECAUSE the regulations are in favor of capital 😵‍💫😵‍💫#AI is cool as shit but not when it’s mining illegal data to make profit ya know?#and open source chromium based browsers are great but people see that it’s built off of a tool that has created a monster 😭

6 notes · View notes

canadianlucifer · 1 year ago

Text

it's so sad that i can't say "I love AI!" without a million asterisks

#i love the CONCEPT of ai and the good it can do and has done #and how it can do menial tasks so we can focus on more important things #and can ASSIST in artistic inspiration and such #but i HATE generative ai built on unethically sourced training data #and the environmental impact llms and image generators are currently creating #why does ai have to be such a grey topic with so much negative coverage...

2 notes · View notes

mysocial8onetech · 2 years ago

Text

Meet Subject-Diffusion, the model that revolutionizes open-domain personalized image generation. No test-time fine-tuning required. Just provide a text description and a reference image and get stunning single- or multi-subject images in any domain. Learn more about this model.

#AI #MachineLearning #DeepLearning #SubjectDiffusion #ImageGeneration #open source #artificial intelligence #machine learning #deep learning #data science

4 notes · View notes

nitunio · 2 years ago

Text

I think that if a person knows that something was made using trained on unethically sourced data AI. And still uses it/likes it/supports it/defends it.

Then said person should stop "being mad" when their data is used to train AI without consent.

#nitunio.txt #please dont half-ass it in terms of not supporting this stuff #if you like and willingly use writing AI that scrapes web without consent #then turn around and say 'wahh AI bad' when it concerns digital art. you're just a hypocrite #same goes for photos and music and other creative work #if you come across any 'machine learning AI generation' website immediately go to their FAQ or About sections #just see for yourself if they provide any sources for the data they've used and if it was consensual and only after that #ask yourself if you should be using it or just make something yourself #hell you can even ask somebody or pay somebody to do something you can't do. thats the joy of community #and even then there are many resources that were already made to be used for free with or without credit #i ramble a lot about things like these bc i cant just wrap my head around it #i just need all of these scraped datasets to burn down and self-delete

2 notes · View notes

mossquitoman · 2 years ago

Text

what do people even have against ai art lmao what

#either “ai art” is a shorthand way of saying ai trained on unethically sourced data sets #bc idk why else u would care

3 notes · View notes

opha · 1 year ago

Text

commercially available image generators consume an average 3 watt-hours per generation. running the AC for an hour is a little over 1000. we usually measure power in kilowatt-hours, btw, because watt-hours are so negligible in scope that you can pedal a bike for an hour and have enough power for 33 image generations, or running a single 25 watt light bulb for 4 hours.

as for water, about 500ml for anywhere between 5 and 50 generations (depends on location and season). all of that comes from data centers, which often (but not always) reuse the water and about a third of which are using non-potable wastewater for cooling in the first place. you know, the stuff you personally use up to the tune of 30-50 gallons per day by flushing the toilet, showering, running the washing machine, etc.

training large language models does consume much more energy, but it's still pretty small fish compared to any other tech industry, and that's not something that is constantly happening.

please don't be fooled by contextless figures and inaccurate analogies slung by obvious clickbait articles like "you'll be astonished how much power it takes to generate a single image!" and other people who have a vested interest in fearmongering misinformation. there are plenty of real problems in this sector, like the use of labor exploitation in the global south in fine-tuning LLMs.

See also, "We're in a drought; conserve water!" Meanwhile, bottled water companies and golf courses for rich folk empty the aquifers.

#if those 3 watt-hours are upsetting to you then i have some bad news about video games and digital art #it does add up but both the amount of water & power ''ai'' uses is negligible next to almost any other industry #also sorry i didn't include sources. last time i cited two hours of research on ''ai'' ppl universally ignored it to soapbox misinfo at me #so i will not be doing that. anti-''ai'' folx actually research what you're against with primary sources challenge #it's been two years and there's actually data now. no more excuses :)#ai art #described

231K notes · View notes

mysocial8onetech · 2 years ago

Text

How can we leverage the power of natural language processing and artificial intelligence to automate fact-checking and make it more efficient and scalable? In this latest blog article, we describe FactLLaMA, a new model that can optimize instruction-following language models with external knowledge for automated fact-checking. We explain what FactLLaMA is and more insightful information about this model.

#FactLLaMA #FactChecking #NLP #AI #MachineLearning #LanguageModels #Knowledge #AIModel #open source #artificial intelligence #machine learning #data science #datascience

2 notes · View notes

instantedownloads · 1 month ago

Text

How to Use n8n and AI to Build an Automation System

Automation is changing how we work every day. It helps save time, reduce mistakes, and get more done with less effort. If you want to automate your tasks but don’t know where to start, this guide is for you. In this post, you will learn how to use n8n — a free, open-source automation tool — combined with AI to build smart workflows that do work for you. What Is n8n? n8n (pronounced…

0 notes

ongoing-catastrophe · 2 months ago

Text

I have a dilemma. later this month I'm gonna be teaching/grading a bunch of middle to high schoolers in speech and debate. I want to try to use this as an opportunity to give them a chance to actually learn something instead of relying on gen ai but honestly I'm not sure how to regulate this.

how do I tell if its ai writing or just unskilled writing? if a speech is ai or just given badly? i think it's easier to pick out the wrong facts (I'll have to make a big folder but i'll manage) but even still, what if its a genuine mistake and not ai use?

it doesn't help that the event organizers dont seem to actually care about the integrity of the event, since they told me I could "just toss it in chat gpt" when they asked me to make an introductory document

#it'll be about 40 kids #i dont wanna be unfair to these kids #but not being unfair to them also means actually equipping them with life skills #like thinking for yourself #and properly collecting data from the internet #maybe i could ask for sources?#i dont wanna be too intimidating either cuz there will be a lot of newbies #if anyone has advice i would really appericiate it #anti gen ai #anti generative ai #fuck ai

0 notes