#datasets for projects | Explore Tumblr posts and blogs

d0nutzgg · 2 years ago

Text

Tonight I am hunting down venomous and nonvenomous snake pictures that are under the creative commons of specific breeds in order to create one of the most advanced, in depth datasets of different venomous and nonvenomous snakes as well as a test set that will include snakes from both sides of all species. I love snakes a lot and really, all reptiles. It is definitely tedious work, as I have to make sure each picture is cleared before I can use it (ethically), but I am making a lot of progress! I have species such as the King Cobra, Inland Taipan, and Eyelash Pit Viper among just a few! Wikimedia Commons has been a huge help!

I'm super excited.

Hope your nights are going good. I am still not feeling good but jamming + virtual snake hunting is keeping me busy!

41 notes · View notes

fangirlsovertoomanythings · 5 months ago

Text

The issue with being a computer scientist is that sometimes you'll be doing a project and halfway through it you're gonna sit down and think to yourself that maybe what you're doing is unethical

#this is about web scraping lmao #i'm making a sentiment analysis project and have arrived at an impasse #i'll start looking for open source datasets since they'd make me arrive at more or less a similar result #but my instructor just has a real clear idea of how the project can be about this year's elections so that's why i've been looking into tha #hhhhhhhh why is this area of work so full of ethical dilemmas i just wanted a degree

5 notes · View notes

sureuncertainty · 11 months ago

Text

the way my term project is making me write an outline that includes my "approach/plan" for the project like what kinda question is that lol my plan is to follow the detailed instructions that are provided for the project? this project is very clearly and structurally defined already i may as well just copy/paste the instructions

#for reference this is for my data preparation class. all i'm doing for this project is preparing and cleaning datasets #my approach will be... get this... to follow the rubric #anyway data science masters degree going great #win rambles

3 notes · View notes

artificialllovers · 11 months ago

Text

This capstone project is going to be the death of me

#I’m behind on my project plan but I made some decent progress today #I think I’m going to have to refine my dataset a bit tho #I feel like I’ll have an easier time training my models if I throw out some of the images that are too ambiguous #if I can just get my models squared away before April I should be good #then I can spend those last two weeks gathering data and prepping for my presentation #💚

6 notes · View notes

exopelagic · 24 days ago

Text

ignore me I’m still pissed abt the ai thing

#it’s just such an incredible level of cognitive dissonance #like the postdoc I mentioned is a really sweet guy!! he’s great!! he’s studying the effects of climate change on birds #he’s been working with me for most of my project. my project is basically entirely coding bc data analysis #most of his work is also data analysis! which is why he’s helping me out. he’s deeply‚ intimately aware of the effects of the climate crisis #we’ve been intermittently having the discussion abt chatgtp and code over the past few months and he knows everything i say.#when I brought it up the first time he just kinda groaned and he treats it like a gotcha every time I say I’m struggling w some coding thing #ages ago I made some comment abt a coding solution I’d written being inefficient and was frustrated by it #(efficiency is important when you’re dealing w big datasets and especially when your laptop sucks ASS and I knew there was a better way)#and I think he took it a little personally and thinks I try too hard #like yes that might objectively be true but I’m trying to make my stuff repeatable and easy to use so I don’t have to rewrite this later >:(#ANYWAY. why do we value convenience above all else.#i understand the impulse a little too well and I think I’m going to start doing more inconvenient things bc god this can’t keep going #it will go on anyway but it will go on without me #I wanted to shout abt how it’s so pervasive and how people who I’d consider more involved than me are just shamelessly bought in #but I think I’ve burnt that out now. I’m just tired #thinking abt some creators I’ve been following for a long time now talked abt their shifting worldview and how idealistic they used to be #who yesterday said they think this world is beyond saving. but are doing it for the people around them. that is two statements joined #together second part is an earlier thing they said talking abt this and obviously I don’t know them and am extrapolating a worldview but #idk! I am not willing to say that. I have to believe people are good and things can get better. and maybe those are two different things #and I’m gonna have to reevaluate this. probably soon. and I’m probably due another crisis about this soon enough bc I truly don’t know shit #while there exist people in the world who don’t suck that is proof enough that it doesn’t have to be like this. maybe it’s not enough!#maybe we’ll get there. it’s not over til it’s over #luke.txt

0 notes

kiirodora · 1 month ago

Text

The professor for the stupid AI course that's labelled as an "elective" but is the only option for the electives that year: attendance is not mandatory, don't worry about it Him literally one week before finals week: Okay so if you didn't attend regularly I'm grading your final project out of 80 I hate this university I hate this course I hate this professor let me graduate already 😭

#for context: I do a part time job on the side and won't attend courses with no attendance required bc I'm. working #the same professor oversaw an AI project for my friend's grad project that was to be sent to a government science community #and he didn't give her the dataset and set her up for failure and forced her to take the fall #I hate this guy with a passion #kiiro says things

0 notes

sapientsapiens · 2 months ago

Text

Finalized on a dataset for the 1st Capstone project at #mlzoomcamp led by Alexey Grigorev @DataTalksClub .

#mlzoomcamp #capstone project #deeplearning #datasets #convolutional neural networks

0 notes

reasonsforhope · 8 months ago

Text

If you're feeling anxious or depressed about the climate and want to do something to help right now, from your bed, for free...

Start helping with citizen science projects

What's a citizen science project? Basically, it's crowdsourced science. In this case, crowdsourced climate science, that you can help with!

You don't need qualifications or any training besides the slideshow at the start of a project. There are a lot of things that humans can do way better than machines can, even with only minimal training, that are vital to science - especially digitizing records and building searchable databases

Like labeling trees in aerial photos so that scientists have better datasets to use for restoration.

Or counting cells in fossilized plants to track the impacts of climate change.

Or digitizing old atmospheric data to help scientists track the warming effects of El Niño.

Or counting penguins to help scientists better protect them.

Those are all on one of the most prominent citizen science platforms, called Zooniverse, but there are a ton of others, too.

Oh, and btw, you don't have to worry about messing up, because several people see each image. Studies show that if you pool the opinions of however many regular people (different by field), it matches the accuracy rate of a trained scientist in the field.

--

I spent a lot of time doing this when I was really badly injured and housebound, and it was so good for me to be able to HELP and DO SOMETHING, even when I was in too much pain to leave my bed. So if you are chronically ill/disabled/for whatever reason can't participate or volunteer for things in person, I highly highly recommend.

Next time you wish you could do something - anything - to help

Remember that actually, you can. And help with some science.

34K notes · View notes

jcmarchi · 2 months ago

Text

Researchers reduce bias in AI models while preserving or improving accuracy

New Post has been published on https://thedigitalinsider.com/researchers-reduce-bias-in-ai-models-while-preserving-or-improving-accuracy/

Researchers reduce bias in AI models while preserving or improving accuracy

Machine-learning models can fail when they try to make predictions for individuals who were underrepresented in the datasets they were trained on.

For instance, a model that predicts the best treatment option for someone with a chronic disease may be trained using a dataset that contains mostly male patients. That model might make incorrect predictions for female patients when deployed in a hospital.

To improve outcomes, engineers can try balancing the training dataset by removing data points until all subgroups are represented equally. While dataset balancing is promising, it often requires removing large amount of data, hurting the model’s overall performance.

MIT researchers developed a new technique that identifies and removes specific points in a training dataset that contribute most to a model’s failures on minority subgroups. By removing far fewer datapoints than other approaches, this technique maintains the overall accuracy of the model while improving its performance regarding underrepresented groups.

In addition, the technique can identify hidden sources of bias in a training dataset that lacks labels. Unlabeled data are far more prevalent than labeled data for many applications.

This method could also be combined with other approaches to improve the fairness of machine-learning models deployed in high-stakes situations. For example, it might someday help ensure underrepresented patients aren’t misdiagnosed due to a biased AI model.

“Many other algorithms that try to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are showing that assumption is not true. There are specific points in our dataset that are contributing to this bias, and we can find those data points, remove them, and get better performance,” says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate student at MIT and co-lead author of a paper on this technique.

She wrote the paper with co-lead authors Saachi Jain PhD ’24 and fellow EECS graduate student Kristian Georgiev; Andrew Ilyas MEng ’18, PhD ’23, a Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, an associate professor in EECS and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems, and Aleksander Madry, the Cadence Design Systems Professor at MIT. The research will be presented at the Conference on Neural Information Processing Systems.

Removing bad examples

Often, machine-learning models are trained using huge datasets gathered from many sources across the internet. These datasets are far too large to be carefully curated by hand, so they may contain bad examples that hurt model performance.

Scientists also know that some data points impact a model’s performance on certain downstream tasks more than others.

The MIT researchers combined these two ideas into an approach that identifies and removes these problematic datapoints. They seek to solve a problem known as worst-group error, which occurs when a model underperforms on minority subgroups in a training dataset.

The researchers’ new technique is driven by prior work in which they introduced a method, called TRAK, that identifies the most important training examples for a specific model output.

For this new technique, they take incorrect predictions the model made about minority subgroups and use TRAK to identify which training examples contributed the most to that incorrect prediction.

“By aggregating this information across bad test predictions in the right way, we are able to find the specific parts of the training that are driving worst-group accuracy down overall,” Ilyas explains.

Then they remove those specific samples and retrain the model on the remaining data.

Since having more data usually yields better overall performance, removing just the samples that drive worst-group failures maintains the model’s overall accuracy while boosting its performance on minority subgroups.

A more accessible approach

Across three machine-learning datasets, their method outperformed multiple techniques. In one instance, it boosted worst-group accuracy while removing about 20,000 fewer training samples than a conventional data balancing method. Their technique also achieved higher accuracy than methods that require making changes to the inner workings of a model.

Because the MIT method involves changing a dataset instead, it would be easier for a practitioner to use and can be applied to many types of models.

It can also be utilized when bias is unknown because subgroups in a training dataset are not labeled. By identifying datapoints that contribute most to a feature the model is learning, they can understand the variables it is using to make a prediction.

“This is a tool anyone can use when they are training a machine-learning model. They can look at those datapoints and see whether they are aligned with the capability they are trying to teach the model,” says Hamidieh.

Using the technique to detect unknown subgroup bias would require intuition about which groups to look for, so the researchers hope to validate it and explore it more fully through future human studies.

They also want to improve the performance and reliability of their technique and ensure the method is accessible and easy-to-use for practitioners who could someday deploy it in real-world environments.

“When you have tools that let you critically look at the data and figure out which datapoints are going to lead to bias or other undesirable behavior, it gives you a first step toward building models that are going to be more fair and more reliable,” Ilyas says.

This work is funded, in part, by the National Science Foundation and the U.S. Defense Advanced Research Projects Agency.

0 notes

globosetechnologysolutions12 · 6 months ago

Text

#data annotation companies #Datasets for machine learning projects #Image annotation company

0 notes

anhedonyan · 9 months ago

Text

The biggest problem with this project is that it needs a lot of memory and we don't have any PC powerful enough for it.

So while we can train with even less data, it is less than ideal... 😞

I'm not sure if we could change to PyTorch instead at this point (and I'm not sure if it is installed on that PC), but I'll try to have an alternative with YOLOv8 just in case.

We've tried YOLOv8 in the past and I remember it working well enough and even my PC could handle it (with even less data, so idk). Sadly, part of this project was to create our own neural network with Tensorflow but there isn't enough time left and it's still dying... 😭

#talking about the firerisk project #this dataset is in github and in huggingface :3 if anyone want to know #we have to train that data and apply it to predict fire risk in our city

0 notes

dberga · 1 year ago

Text

Publication in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

https://ieeexplore.ieee.org/document/10356628

A New Framework for Evaluating Image Quality Including Deep Learning Task Performances as A Proxy

iquaflow is a framework that provides a set of tools to assess image quality. The user can add custom metrics that can be easily integrated and a set of unsupervised methods is offered by default. Furthermore, iquaflow measures quality by using the performance of AI models trained on the images as a proxy. This also helps to easily make studies of performance degradation of several modifications of the original dataset, for instance, with images reconstructed after different levels of lossy compression; satellite images would be a use case example, since they are commonly compressed before downloading to the ground. In this situation, the optimization problem involves finding images that, while being compressed to their smallest possible file size, still maintain sufficient quality to meet the required performance of the deep learning algorithms. Thus, a study with iquaflow is suitable for such case. All this development is wrapped in Mlflow : an interactive tool used to visualize and summarize the results. This document describes different use cases and provides links to their respective repositories. To ease the creation of new studies, we include a cookiecutter repository. The source code, issue tracker and aforementioned repositories are all hosted on GitHub.

https://github.com/satellogic/iquaflow

#projects #papers #neural networks #computer vision #remote sensing #modeling #datasets

1 note · View note

insertdisc5 · 1 year ago

Text

📚 A List Of Useful Websites When Making An RPG 📚

My timeloop RPG In Stars and Time is done! Which means I can clear all my ISAT gamedev related bookmarks. But I figured I would show them here, in case they can be useful to someone. These range from "useful to write a story/characters/world" to "these are SUPER rpgmaker focused and will help with the terrible math that comes with making a game".

This is what I used to make my RPG game, but it could be useful for writers, game devs of all genres, DMs, artists, what have you. YIPPEE

Writing (Names)

Behind The Name - Why don't you have this bookmarked already. Search for names and their meanings from all over the world!

Medieval Names Archive - Medieval names. Useful. For ME

City and Town Name Generator - Create "fake" names for cities, generated from datasets from any country you desire! I used those for the couple city names in ISAT. I say "fake" in quotes because some of them do end up being actual city names, especially for french generated ones. Don't forget to double check you're not 1. just taking a real city name or 2. using a word that's like, Very Bad, especially if you don't know the country you're taking inspiration from! Don't want to end up with Poopaville, USA

Writing (Words)

Onym - A website full of websites that are full of words. And by that I mean dictionaries, thesauruses, translators, glossaries, ways to mix up words, and way more. HIGHLY recommend checking this website out!!!

Moby Thesaurus - My thesaurus of choice!

Rhyme Zone - Find words that rhyme with others. Perfect for poets, lyricists, punmasters.

In Different Languages - Search for a word, have it translated in MANY different languages in one page.

ASSETS

In general, I will say: just look up what you want on itch.io. There are SO MANY assets for you to buy on itch.io. You want a font? You want a background? You want a sound effect? You want a plugin? A pixel base? An attack animation? A cool UI?!?!?! JUST GO ON ITCH.IO!!!!!!

Visual Assets (General)

Creative Market - Shop for all kinds of assets, from fonts to mockups to templates to brushes to WHATEVER YOU WANT

Velvetyne - Cool and weird fonts

Chevy Ray's Pixel Fonts - They're good fonts.

Contrast Checker - Stop making your text white when your background is lime green no one can read that shit babe!!!!!!

Visual Assets (Game Focused)

Interface In Game - Screenshots of UI (User Interfaces) from SO MANY GAMES. Shows you everything and you can just look at what every single menu in a game looks like. You can also sort them by game genre! GREAT reference!

Game UI Database - Same as above!

Sound Assets

Zapsplat, Freesound - There are many sound effect websites out there but those are the ones I saved. Royalty free!

Shapeforms - Paid packs for music and sounds and stuff.

Other

CloudConvert - Convert files into other files. MAKE THAT .AVI A .MOV

EZGifs - Make those gifs bigger. Smaller. Optimize them. Take a video and make it a gif. The Sky Is The Limit

Marketing

Press Kitty - Did not end up needing this- this will help with creating a press kit! Useful for ANY indie dev. Yes, even if you're making a tiny game, you should have a press kit. You never know!!!

presskit() - Same as above, but a different one.

Itch.io Page Image Guide and Templates - Make your project pages on itch.io look nice.

MOOMANiBE's IGF post - If you're making indie games, you might wanna try and submit your game to the Independent Game Festival at some point. Here are some tips on how, and why you should.

Game Design (General)

An insightful thread where game developers discuss hidden mechanics designed to make games feel more interesting - Title says it all. Check those comments too.

Game Design (RPGs)

Yanfly "Let's Make a Game" Comics - INCREDIBLY useful tips on how to make RPGs, going from dungeons to towns to enemy stats!!!!

Attack Patterns - A nice post on enemy attack patterns, and what attacks you should give your enemies to make them challenging (but not TOO challenging!) A very good starting point.

How To Balance An RPG - Twitter thread on how to balance player stats VS enemy stats.

Nobody Cares About It But It’s The Only Thing That Matters: Pacing And Level Design In JRPGs - a Good Post.

Game Design (Visual Novels)

Feniks Renpy Tutorials - They're good tutorials.

I played over 100 visual novels in one month and here’s my advice to devs. - General VN advice. Also highly recommend this whole blog for help on marketing your games.

I hope that was useful! If it was. Maybe. You'd like to buy me a coffee. Or maybe you could check out my comics and games. Or just my new critically acclaimed game In Stars and Time. If you want. Ok bye

#reference #tutorial #writing #rpgmaker #renpy #video games #game design #i had this in my drafts for a while so you get it now. sorry its so long #long post

8K notes · View notes

river-taxbird · 1 year ago

Text

There is no such thing as AI.

How to help the non technical and less online people in your life navigate the latest techbro grift.

I've seen other people say stuff to this effect but it's worth reiterating. Today in class, my professor was talking about a news article where a celebrity's likeness was used in an ai image without their permission. Then she mentioned a guest lecture about how AI is going to help finance professionals. Then I pointed out, those two things aren't really related.

The term AI is being used to obfuscate details about multiple semi-related technologies.

Traditionally in sci-fi, AI means artificial general intelligence like Data from star trek, or the terminator. This, I shouldn't need to say, doesn't exist. Techbros use the term AI to trick investors into funding their projects. It's largely a grift.

What is the term AI being used to obfuscate?

If you want to help the less online and less tech literate people in your life navigate the hype around AI, the best way to do it is to encourage them to change their language around AI topics.

By calling these technologies what they really are, and encouraging the people around us to know the real names, we can help lift the veil, kill the hype, and keep people safe from scams. Here are some starting points, which I am just pulling from Wikipedia. I'd highly encourage you to do your own research.

Machine learning (ML): is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines "discover" their "own" algorithms, without needing to be explicitly told what to do by any human-developed algorithms. (This is the basis of most technologically people call AI)

Language model: (LM or LLM) is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora in one or multiple languages it was trained on. (This would be your ChatGPT.)

Generative adversarial network (GAN): is a class of machine learning framework and a prominent framework for approaching generative AI. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss. (This is the source of some AI images and deepfakes.)

Diffusion Models: Models that generate the probability distribution of a given dataset. In image generation, a neural network is trained to denoise images with added gaussian noise by learning to remove the noise. After the training is complete, it can then be used for image generation by starting with a random noise image and denoise that. (This is the more common technology behind AI images, including Dall-E and Stable Diffusion. I added this one to the post after as it was brought to my attention it is now more common than GANs.)

I know these terms are more technical, but they are also more accurate, and they can easily be explained in a way non-technical people can understand. The grifters are using language to give this technology its power, so we can use language to take it's power away and let people see it for what it really is.

#ai #scams #language #semiotics #machine learning

12K notes · View notes

sad--tree · 2 years ago

Text

what do u do when u tell ur parent in no uncertain terms Thank You For The Offer But I Do Not Want A Tutor For This Course It Will Not Help And I Am Deeply Uncomfortable With It Do Not Get Me One

and then they go and book u with an online tutor. without asking. what the fuck.

#25 but being treated like im fucking 12 #didnt even WANT help with courseworki went out there just looking for some goddamn emotional support #and i didnt even get it!!!!!!#theres 'problem solving instead of listening & supporting' and then theres THIS #i hate college #but rn i hate this family more #ANYWAYS #if any1 knows how 2 use python 2 'use file i/o on startup to open and read the dataset; initializing a few record objects with data parsed #from first few records in the csv file. the record objects should be stored in a simple data structure (array or list). use exception #handling in case the file is missing or not found'#i know how to open the file but idk how 2 'initialize a few record objects using data parsed etc. etc.'#like. i have a class so thats the record object. and ig i could have a list of instances of that class object #but idk how 2 like. combine the data from the csv file with instances of the class.#without having to individually list.append(()) 7000 rows bc eventually in this project u gotta use the whole dataset.

0 notes

thisonelikesaliens · 10 months ago

Text

decided to measure against different versions of Yuan (through ep10):

*Adult Yuan versions = before and after NYC

background on this stat i'm tracking (whether Qian is using I/me or Ge when he's referring to himself in direct conversation with Yuan)

i need to figure out another angle to look at this data (or maybe not, i think i'm just overthinking at this point), but i just wanted to note that in that last conversation at the bottom of the stairs Qian did not use "ge" once. sometimes the "ge" comes out naturally, other times he seems to deliberately use it to deflect and avoid confronting his own feelings about Yuan, but in that conversation he used "I/me" 17 times and not one single "ge"

*missed one "ge" in ep1, now corrected

#unknown the series #關於未知的我們 #definitely interesting #i don't even know if this was deliberate or not #or just a happy accident #anyway...i'll update one more time in 2 weeks and call this little project done #i just wish i had more data to work with #i'm much better at analyzing massive datasets #i don't think i'll have anything else to add for the last 2 eps #the translations have been solid as of late #no more taiwanese bl in the near future that i might possibly have a tiny something to contribute #that makes me sad #i've had so much fun being kinda active in fandom again #thanks everyone for being so kind and welcoming #this one translates #this one rambles

31 notes · View notes