Tumgik
#data bias for ml
Text
The surprising truth about data-driven dictatorships
Tumblr media
Here’s the “dictator’s dilemma”: they want to block their country’s frustrated elites from mobilizing against them, so they censor public communications; but they also want to know what their people truly believe, so they can head off simmering resentments before they boil over into regime-toppling revolutions.
These two strategies are in tension: the more you censor, the less you know about the true feelings of your citizens and the easier it will be to miss serious problems until they spill over into the streets (think: the fall of the Berlin Wall or Tunisia before the Arab Spring). Dictators try to square this circle with things like private opinion polling or petition systems, but these capture a small slice of the potentially destabiziling moods circulating in the body politic.
Enter AI: back in 2018, Yuval Harari proposed that AI would supercharge dictatorships by mining and summarizing the public mood — as captured on social media — allowing dictators to tack into serious discontent and diffuse it before it erupted into unequenchable wildfire:
https://www.theatlantic.com/magazine/archive/2018/10/yuval-noah-harari-technology-tyranny/568330/
Harari wrote that “the desire to concentrate all information and power in one place may become [dictators] decisive advantage in the 21st century.” But other political scientists sharply disagreed. Last year, Henry Farrell, Jeremy Wallace and Abraham Newman published a thoroughgoing rebuttal to Harari in Foreign Affairs:
https://www.foreignaffairs.com/world/spirals-delusion-artificial-intelligence-decision-making
They argued that — like everyone who gets excited about AI, only to have their hopes dashed — dictators seeking to use AI to understand the public mood would run into serious training data bias problems. After all, people living under dictatorships know that spouting off about their discontent and desire for change is a risky business, so they will self-censor on social media. That’s true even if a person isn’t afraid of retaliation: if you know that using certain words or phrases in a post will get it autoblocked by a censorbot, what’s the point of trying to use those words?
The phrase “Garbage In, Garbage Out” dates back to 1957. That’s how long we’ve known that a computer that operates on bad data will barf up bad conclusions. But this is a very inconvenient truth for AI weirdos: having given up on manually assembling training data based on careful human judgment with multiple review steps, the AI industry “pivoted” to mass ingestion of scraped data from the whole internet.
But adding more unreliable data to an unreliable dataset doesn’t improve its reliability. GIGO is the iron law of computing, and you can’t repeal it by shoveling more garbage into the top of the training funnel:
https://memex.craphound.com/2018/05/29/garbage-in-garbage-out-machine-learning-has-not-repealed-the-iron-law-of-computer-science/
When it comes to “AI” that’s used for decision support — that is, when an algorithm tells humans what to do and they do it — then you get something worse than Garbage In, Garbage Out — you get Garbage In, Garbage Out, Garbage Back In Again. That’s when the AI spits out something wrong, and then another AI sucks up that wrong conclusion and uses it to generate more conclusions.
To see this in action, consider the deeply flawed predictive policing systems that cities around the world rely on. These systems suck up crime data from the cops, then predict where crime is going to be, and send cops to those “hotspots” to do things like throw Black kids up against a wall and make them turn out their pockets, or pull over drivers and search their cars after pretending to have smelled cannabis.
The problem here is that “crime the police detected” isn’t the same as “crime.” You only find crime where you look for it. For example, there are far more incidents of domestic abuse reported in apartment buildings than in fully detached homes. That’s not because apartment dwellers are more likely to be wife-beaters: it’s because domestic abuse is most often reported by a neighbor who hears it through the walls.
So if your cops practice racially biased policing (I know, this is hard to imagine, but stay with me /s), then the crime they detect will already be a function of bias. If you only ever throw Black kids up against a wall and turn out their pockets, then every knife and dime-bag you find in someone’s pockets will come from some Black kid the cops decided to harass.
That’s life without AI. But now let’s throw in predictive policing: feed your “knives found in pockets” data to an algorithm and ask it to predict where there are more knives in pockets, and it will send you back to that Black neighborhood and tell you do throw even more Black kids up against a wall and search their pockets. The more you do this, the more knives you’ll find, and the more you’ll go back and do it again.
This is what Patrick Ball from the Human Rights Data Analysis Group calls “empiricism washing”: take a biased procedure and feed it to an algorithm, and then you get to go and do more biased procedures, and whenever anyone accuses you of bias, you can insist that you’re just following an empirical conclusion of a neutral algorithm, because “math can’t be racist.”
HRDAG has done excellent work on this, finding a natural experiment that makes the problem of GIGOGBI crystal clear. The National Survey On Drug Use and Health produces the gold standard snapshot of drug use in America. Kristian Lum and William Isaac took Oakland’s drug arrest data from 2010 and asked Predpol, a leading predictive policing product, to predict where Oakland’s 2011 drug use would take place.
Tumblr media
[Image ID: (a) Number of drug arrests made by Oakland police department, 2010. (1) West Oakland, (2) International Boulevard. (b) Estimated number of drug users, based on 2011 National Survey on Drug Use and Health]
Then, they compared those predictions to the outcomes of the 2011 survey, which shows where actual drug use took place. The two maps couldn’t be more different:
https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x
Predpol told cops to go and look for drug use in a predominantly Black, working class neighborhood. Meanwhile the NSDUH survey showed the actual drug use took place all over Oakland, with a higher concentration in the Berkeley-neighboring student neighborhood.
What’s even more vivid is what happens when you simulate running Predpol on the new arrest data that would be generated by cops following its recommendations. If the cops went to that Black neighborhood and found more drugs there and told Predpol about it, the recommendation gets stronger and more confident.
In other words, GIGOGBI is a system for concentrating bias. Even trace amounts of bias in the original training data get refined and magnified when they are output though a decision support system that directs humans to go an act on that output. Algorithms are to bias what centrifuges are to radioactive ore: a way to turn minute amounts of bias into pluripotent, indestructible toxic waste.
There’s a great name for an AI that’s trained on an AI’s output, courtesy of Jathan Sadowski: “Habsburg AI.”
And that brings me back to the Dictator’s Dilemma. If your citizens are self-censoring in order to avoid retaliation or algorithmic shadowbanning, then the AI you train on their posts in order to find out what they’re really thinking will steer you in the opposite direction, so you make bad policies that make people angrier and destabilize things more.
Or at least, that was Farrell(et al)’s theory. And for many years, that’s where the debate over AI and dictatorship has stalled: theory vs theory. But now, there’s some empirical data on this, thanks to the “The Digital Dictator’s Dilemma,” a new paper from UCSD PhD candidate Eddie Yang:
https://www.eddieyang.net/research/DDD.pdf
Yang figured out a way to test these dueling hypotheses. He got 10 million Chinese social media posts from the start of the pandemic, before companies like Weibo were required to censor certain pandemic-related posts as politically sensitive. Yang treats these posts as a robust snapshot of public opinion: because there was no censorship of pandemic-related chatter, Chinese users were free to post anything they wanted without having to self-censor for fear of retaliation or deletion.
Next, Yang acquired the censorship model used by a real Chinese social media company to decide which posts should be blocked. Using this, he was able to determine which of the posts in the original set would be censored today in China.
That means that Yang knows that the “real” sentiment in the Chinese social media snapshot is, and what Chinese authorities would believe it to be if Chinese users were self-censoring all the posts that would be flagged by censorware today.
From here, Yang was able to play with the knobs, and determine how “preference-falsification” (when users lie about their feelings) and self-censorship would give a dictatorship a misleading view of public sentiment. What he finds is that the more repressive a regime is — the more people are incentivized to falsify or censor their views — the worse the system gets at uncovering the true public mood.
What’s more, adding additional (bad) data to the system doesn’t fix this “missing data” problem. GIGO remains an iron law of computing in this context, too.
But it gets better (or worse, I guess): Yang models a “crisis” scenario in which users stop self-censoring and start articulating their true views (because they’ve run out of fucks to give). This is the most dangerous moment for a dictator, and depending on the dictatorship handles it, they either get another decade or rule, or they wake up with guillotines on their lawns.
But “crisis” is where AI performs the worst. Trained on the “status quo” data where users are continuously self-censoring and preference-falsifying, AI has no clue how to handle the unvarnished truth. Both its recommendations about what to censor and its summaries of public sentiment are the least accurate when crisis erupts.
But here’s an interesting wrinkle: Yang scraped a bunch of Chinese users’ posts from Twitter — which the Chinese government doesn’t get to censor (yet) or spy on (yet) — and fed them to the model. He hypothesized that when Chinese users post to American social media, they don’t self-censor or preference-falsify, so this data should help the model improve its accuracy.
He was right — the model got significantly better once it ingested data from Twitter than when it was working solely from Weibo posts. And Yang notes that dictatorships all over the world are widely understood to be scraping western/northern social media.
But even though Twitter data improved the model’s accuracy, it was still wildly inaccurate, compared to the same model trained on a full set of un-self-censored, un-falsified data. GIGO is not an option, it’s the law (of computing).
Writing about the study on Crooked Timber, Farrell notes that as the world fills up with “garbage and noise” (he invokes Philip K Dick’s delighted coinage “gubbish”), “approximately correct knowledge becomes the scarce and valuable resource.”
https://crookedtimber.org/2023/07/25/51610/
This “probably approximately correct knowledge” comes from humans, not LLMs or AI, and so “the social applications of machine learning in non-authoritarian societies are just as parasitic on these forms of human knowledge production as authoritarian governments.”
Tumblr media
The Clarion Science Fiction and Fantasy Writers’ Workshop summer fundraiser is almost over! I am an alum, instructor and volunteer board member for this nonprofit workshop whose alums include Octavia Butler, Kim Stanley Robinson, Bruce Sterling, Nalo Hopkinson, Kameron Hurley, Nnedi Okorafor, Lucius Shepard, and Ted Chiang! Your donations will help us subsidize tuition for students, making Clarion — and sf/f — more accessible for all kinds of writers.
Tumblr media
Libro.fm is the indie-bookstore-friendly, DRM-free audiobook alternative to Audible, the Amazon-owned monopolist that locks every book you buy to Amazon forever. When you buy a book on Libro, they share some of the purchase price with a local indie bookstore of your choosing (Libro is the best partner I have in selling my own DRM-free audiobooks!). As of today, Libro is even better, because it’s available in five new territories and currencies: Canada, the UK, the EU, Australia and New Zealand!
Tumblr media
[Image ID: An altered image of the Nuremberg rally, with ranked lines of soldiers facing a towering figure in a many-ribboned soldier's coat. He wears a high-peaked cap with a microchip in place of insignia. His head has been replaced with the menacing red eye of HAL9000 from Stanley Kubrick's '2001: A Space Odyssey.' The sky behind him is filled with a 'code waterfall' from 'The Matrix.']
Tumblr media
Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg
CC BY 3.0 https://creativecommons.org/licenses/by/3.0/deed.en
 — 
Raimond Spekking (modified) https://commons.wikimedia.org/wiki/File:Acer_Extensa_5220_-_Columbia_MB_06236-1N_-_Intel_Celeron_M_530_-_SLA2G_-_in_Socket_479-5029.jpg
CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/deed.en
 — 
Russian Airborne Troops (modified) https://commons.wikimedia.org/wiki/File:Vladislav_Achalov_at_the_Airborne_Troops_Day_in_Moscow_%E2%80%93_August_2,_2008.jpg
“Soldiers of Russia” Cultural Center (modified) https://commons.wikimedia.org/wiki/File:Col._Leonid_Khabarov_in_an_everyday_service_uniform.JPG
CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/deed.en
832 notes · View notes
vizthedatum · 17 days
Text
AI is not the problem—we are. It's the way people and corporations devalue the work of creators, the way data is gathered and trained for these generative models, and the way we look to these as solutions, forgetting that they are only tools (that could be used way more ethically).
“AI” is just math, tech, statistics, and computing power... Built upon the resources mined from this earth and the minds of our collective species.
I believe that our creative and natural resources can never be replaced in their worth.
I am a data scientist by trade, and I build statistical models and AI. I use (non-generative) AI to help me understand how the world works and make decisions (including creation).
I sometimes program my art (or I use tools like Inkscape or similar software to help me! I use a computer!!) - because I can write instructions like a recipe where I gather all my ingredients with consent and fair use. (I can program, is what I’m saying. I can curate what data I'm using.)
I can have a tool to carry out my will to create. I am not arguing about that.
So, I understand the disability argument being made with generative AI.
Accessibility is important, and I agree with making information and creation more accessible.
I love that people can learn about stuff. My suggestion is for groups in power to make better search engines, tutorials, and tools for things with AI. Help people with formats! Help people automate!! I don't think there's anything wrong with that.
But here is the nuance to this:
Generative AI is an unethical tool if it feeds off work you can't see or validate.
This a corrupted data issue. This is a corrupted “people in power” issue.
There is a lack of transparency in these tools' decision-making. AI's intelligence is only as good as the intelligence of the data and algorithms.
It is a regurgitation of information that is feeding on itself.
So I don’t really care about people who use tools to create things - but I do care when those outputs are not crediting other artists’ works. I do care when systemic discrimination shows up in AI results because our data is inherently flawed (because our data is from our people).
We must be more critical and nuanced about how we talk about this.
The technology (statistical, machine, and deep learning models) has been around for a long time. Papers on deep learning have existed for decades. As a researcher, I find the math fascinating.
The leaps have been tremendous, but again… it’s not the nature of the tools themselves.
It’s us.
How are we using these tools, what information is being given to these tools, and what are the lasting impacts on our society?
4 notes · View notes
straigalex · 4 months
Text
Alright, I have to rant about something. I'm taking a course of data science this trimester. As part of that, I have to write an essay of ethics in data sci. One of the paragraphs is supposed to be on "fairness in algorithmic systems".
So I look for case studies, and what do I find?
Tumblr media
Motherfucker, you told the AI to discriminate against poor people, and then you were surprised when it discriminated against poor people! (and black people, in-particular, which is why it was caught)
These people are "industry leaders" too. Goodness only knows what the industry followers are fucking up!
6 notes · View notes
jcmarchi · 3 months
Text
AI, ML, and Robotics: New Technological Frontiers in Warehousing
New Post has been published on https://thedigitalinsider.com/ai-ml-and-robotics-new-technological-frontiers-in-warehousing/
AI, ML, and Robotics: New Technological Frontiers in Warehousing
Warehouse management is an intricate operation that requires balancing many challenges and risks. Customers increasingly expect fast, accurate deliveries, leading many companies to shift toward “micro fulfillment centers” located close to major urban centers. To fulfill orders quickly while making the most of limited warehouse space, organizations are increasingly turning to artificial intelligence (AI), machine learning (ML), and robotics to optimize warehouse operations. By utilizing AI and ML, warehouse managers can automate and improve components of their operations, such as forecasting demand and inventory levels, optimizing space utilization and layout, improving picking and packing efficiency, and reducing errors and waste. Meanwhile, robotics can perform repetitive tasks with greater accuracy and speed than human workers and operate in spaces too confined for humans. Organizations can harness these technologies to increase profits, enhance safety and security, and increase customer satisfaction and loyalty.
Challenges faced by the warehousing industry
Online commerce is rapidly expanding and evolving, becoming a $4,117.00 billion business in 2024. Customers are turning online for a variety of needs, including groceries. Traditionally, online retailers have stored their inventories in large warehouses outside major population centers. Rapid urbanization has led to many customers living in population hubs in expensive areas, and customers increasingly expect quick—often same-day—deliveries.
Many retailers have addressed this issue by implementing “micro fulfillment centers” near major population centers. Because real estate in these locations is expensive, it is more important than ever that every square foot of warehouse space is well-utilized. Meanwhile, the warehousing industry is dealing with labor shortages, making fulfilling orders in a timely fashion more difficult.
Applications of AI/ML and robotics
Automation, AI, and ML can help retailers deal with these challenges. The advancement of computer vision has expanded the possibilities for robotics in the warehouse space. For example, autonomous mobile robot (AMR) systems are increasingly used for picking (selecting the items that a specific customer has ordered), packing (preparing those items for shipping), and palletization (placing goods on a pallet for transportation and storage). Automating these tasks increases speed, efficiency, accuracy, and adaptability. Robotics can also utilize vertical and cramped spaces that are difficult for humans to access. Warehouse space can be further optimized by introducing innovative, high-density storage solutions like cubes, tubes, and automated storage and retrieval systems.
AI- and ML-powered optimization algorithms analyze massive amounts of real-world data to generate predictions and solutions, updating as more information becomes available. Route optimization helps companies ensure that goods are shipped along the shortest and most efficient routes. Demand forecasting and predictive modeling use past order data to identify patterns and help retailers estimate which products will likely be ordered by customers, ensuring that warehouse space is used efficiently and minimizing the time products spend on the shelves. These models also enable more efficient warehouse storage, as the more frequently ordered items can be stored closer to picking stations.
ML, when paired with sensors on equipment, can also enable predictive maintenance. Continuous monitoring of equipment parts allows warehouses to detect when mechanical parts like rollers or conveyor belts show signs of wear or breakage, allowing them to be replaced before failures happen and minimizing downtime. By implementing robotics and AI/ML-based solutions, retailers can increase accuracy and efficiency while ensuring their limited space is utilized to full capacity.
As AI and robotics are integrated into warehousing, it is vital to consider privacy, ethics, and workplace safety. It is crucial to consider data confidentiality and ensure that AI models do not leak sensitive customer data. Equally important is monitoring AI models for bias. Finally, it is essential to guarantee that robotic and automation solutions comply with Occupational Safety and Health Administration (OSHA) regulations to safeguard the workplace environment.
Key performance indicators for warehousing processes
Monitoring key performance indicators (KPIs) allows enterprises to measure the effectiveness of their warehousing solutions, enabling continuous improvement. A few key KPIs for warehousing include:
Throughput – This represents the number of products successfully passed through a packing station during a set amount of time, for example, the number of orders fulfilled per hour.
Lead time – This figure tracks how quickly shipments can be made.
Cube utilization – This measure of how effectively warehouses use their storage capacity is often calculated by dividing the volume of materials stored by the total warehouse capacity.
On-time in-full (OTIF) shipments – This metric calculates the percentage of orders completed in full by the desired date.
Inventory count accuracy by location – This tracks the degree to which the goods stored in the warehouse correspond to the data. High inventory accuracy is necessary for warehouse analytics to be effective.
Reaping the benefits of AI/ML in warehousing
AI, ML, and robotics are significant components of modern warehousing and will continue to change the industry. According to a recent McKinsey report, companies plan to significantly increase their spending on autonomous warehouse solutions over the next five years. Major retailers like Target and Walmart are pouring millions of dollars into transforming their supply chains and storage operations with AI and ML-powered logistics. Walmart has developed an AI-powered route optimization tool, which has  now been made available to other retailers under a software-as-a-service (SaaS) model. The retailer also uses AI to forecast demand and ensure adequate inventory on peak shopping days like Black Friday. These solutions help enhance customer satisfaction while increasing profits and lowering business operating costs. They can also help enterprises deal with challenges, including disruptions to the supply chain and labor shortages.
AI, ML, and robotics are most useful in smaller warehouses and micro-fulfillment centers, where they can optimize limited storage space. In addition to technologies like augmented reality and cloud solutions, they help make quick, accurate deliveries the standard. By monitoring key performance indicators and prioritizing compliance and data privacy, organizations can ensure that they reap the full benefits of AI, ML, and robotics.
0 notes
interviewhelps · 2 years
Text
Top 25 Artificial intelligence specialist Interview Questions
Here are the Top 25 Artificial intelligence specialist Interview Questions Can you explain the concept of artificial intelligence and how it differs from traditional programming? How do you approach designing and implementing a machine learning model? Can you discuss a specific project you have worked on that involved AI or machine learning? How do you stay up-to-date with the latest…
Tumblr media
View On WordPress
0 notes
emptyanddark · 1 year
Text
what's actually wrong with 'AI'
it's become impossible to ignore the discourse around so-called 'AI'. but while the bulk of the discourse is saturated with nonsense such as, i wanted to pool some resources to get a good sense of what this technology actually is, its limitations and its broad consequences. 
what is 'AI'
the best essay to learn about what i mentioned above is On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? this essay cost two of its collaborators to be fired from Google. it frames what large-language models are, what they can and cannot do and the actual risks they entail: not some 'super-intelligence' that we keep hearing about but concrete dangers: from climate, the quality of the training data and biases - both from the training data and from us, the users. 
The problem with artificial intelligence? It’s neither artificial nor intelligent
How the machine ‘thinks’: Understanding opacity in machine learning algorithms
The Values Encoded in Machine Learning Research
Troubling Trends in Machine Learning Scholarship: Some ML papers suffer from flaws that could mislead the public and stymie future research
AI Now Institute 2023 Landscape report (discussions of the power imbalance in Big Tech)
ChatGPT Is a Blurry JPEG of the Web
Can we truly benefit from AI?
Inside the secret list of websites that make AI like ChatGPT sound smart
The Steep Cost of Capture
labor
'AI' champions the facade of non-human involvement. but the truth is that this is a myth that serves employers by underpaying the hidden workers, denying them labor rights and social benefits - as well as hyping-up their product. the effects on workers are not only economic but detrimental to their health - both mental and physical.
OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic
also from the Times: Inside Facebook's African Sweatshop
The platform as factory: Crowdwork and the hidden labour behind artificial intelligence
The humans behind Mechanical Turk’s artificial intelligence
The rise of 'pseudo-AI': how tech firms quietly use humans to do bots' work
The real aim of big tech's layoffs: bringing workers to heel
The Exploited Labor Behind Artificial Intelligence
workers surveillance
5 ways Amazon monitors its employees, from AI cameras to hiring a spy agency
Computer monitoring software is helping companies spy on their employees to measure their productivity – often without their consent
theft of art and content
Artists say AI image generators are copying their style to make thousands of new images — and it's completely out of their control  (what gives me most hope about regulators dealing with theft is Getty images' lawsuit - unfortunately individuals simply don't have the same power as the corporation)
Copyright won't solve creators' Generative AI problem
The real aim of big tech's layoffs: bringing workers to heel
The Exploited Labor Behind Artificial Intelligence
AI is already taking video game illustrators’ jobs in China
Microsoft lays off team that taught employees how to make AI tools responsibly/As the company accelerates its push into AI products, the ethics and society team is gone
150 African Workers for ChatGPT, TikTok and Facebook Vote to Unionize at Landmark Nairobi Meeting
Inside the AI Factory: the Humans that Make Tech Seem Human
Refugees help power machine learning advances at Microsoft, Facebook, and Amazon
Amazon’s AI Cameras Are Punishing Drivers for Mistakes They Didn’t Make
China’s AI boom depends on an army of exploited student interns
political, social, ethical consequences
Afraid of AI? The startups selling it want you to be
An Indigenous Perspective on Generative AI
“Computers enable fantasies” – On the continued relevance of Weizenbaum’s warnings
‘Utopia for Whom?’: Timnit Gebru on the dangers of Artificial General Intelligence
Machine Bias
HUMAN_FALLBACK
AI Ethics Are in Danger. Funding Independent Research Could Help
AI Is Tearing Wikipedia Apart  
AI machines aren’t ‘hallucinating’. But their makers are
The Great A.I. Hallucination (podcast)
“Sorry in Advance!” Rapid Rush to Deploy Generative A.I. Risks a Wide Array of Automated Harms
The promise and peril of generative AI
ChatGPT Users Report Being Able to See Random People's Chat Histories
Benedetta Brevini on the AI sublime bubble – and how to pop it   
Eating Disorder Helpline Disables Chatbot for 'Harmful' Responses After Firing Human Staff
AI moderation is no match for hate speech in Ethiopian languages
Amazon, Google, Microsoft, and other tech companies are in a 'frenzy' to help ICE build its own data-mining tool for targeting unauthorized workers
Crime Prediction Software Promised to Be Free of Biases. New Data Shows It Perpetuates Them
The EU AI Act is full of Significance for Insurers
Proxy Discrimination in the Age of Artificial Intelligence and Big Data
Welfare surveillance system violates human rights, Dutch court rules
Federal use of A.I. in visa applications could breach human rights, report says
Open (For Business): Big Tech, Concentrated Power, and the Political Economy of Open AI
Generative AI Is Making Companies Even More Thirsty for Your Data
environment
The Generative AI Race Has a Dirty Secret
Black boxes, not green: Mythologizing artificial intelligence and omitting the environment
Energy and Policy Considerations for Deep Learning in NLP
AINOW: Climate Justice & Labor Rights
militarism
The Growing Global Spyware Industry Must Be Reined In
AI: the key battleground for Cold War 2.0?
‘Machines set loose to slaughter’: the dangerous rise of military AI
AI: The New Frontier of the EU's Border Extranalisation Strategy
The A.I. Surveillance Tool DHS Uses to Detect ‘Sentiment and Emotion’
organizations
AI now
DAIR
podcast episodes
Pretty Heady Stuff: Dru Oja Jay & James Steinhoff guide us through the hype & hysteria around AI
Tech Won't Save Us: Why We Must Resist AI w/ Dan McQuillan, Why AI is a Threat to Artists w/ Molly Crabapple, ChatGPT is Not Intelligent w/ Emily M. Bender
SRSLY WRONG: Artificial Intelligence part 1, part 2
The Dig: AI Hype Machine w/ Meredith Whittaker, Ed Ongweso, and Sarah West
This Machine Kills: The Triforce of Corporate Power in AI w/ ft. Sarah Myers West
37 notes · View notes
itservicesai · 2 months
Text
Exploring the World of Artificial Intelligence (AI) and Machine Learning (ML)
Artificial Intelligence (AI) and Machine Learning (ML) are popular buzzwords these days, but what do they really mean? Let's break them down in simple terms.
Tumblr media
What is Artificial Intelligence?
Artificial Intelligence, or AI, is the ability of machines to mimic human intelligence. This means that computers can perform tasks that typically require human thinking. These tasks include recognizing speech, understanding natural language, making decisions, and even recognizing objects in images.
Think of AI as a way to make computers smart. For example, when you ask your smartphone's voice assistant to set a reminder, it understands your request and takes action. That's AI in action!
What is Machine Learning?
Machine Learning, or ML, is a subset of AI. It is the process by which computers learn from data. Instead of being explicitly programmed to perform a task, computers use algorithms to find patterns in data and make predictions or decisions based on those patterns.
Imagine you want to teach a computer to recognize pictures of cats. You show it thousands of pictures of cats and tell it, "These are cats." The computer analyzes the pictures and learns what features cats have. After this training, when you show it a new picture, it can tell if the picture is of a cat or not. That's machine learning!
How AI and ML Impact Our Daily Lives
AI and ML are everywhere around us, often without us even realizing it. Here are a few examples:
Personal Assistants: Siri, Alexa, and Google Assistant use AI to understand and respond to your voice commands.
Recommendation Systems: Netflix and YouTube use ML to recommend shows and videos you might like based on your viewing history.
Healthcare: AI helps doctors diagnose diseases by analyzing medical images and patient data.
Transportation: Self-driving cars use AI to navigate roads and avoid obstacles.
Why AI and ML Matter
AI and ML are transforming industries and creating new opportunities. They help businesses make better decisions, improve efficiency, and create personalized experiences for customers. For instance, AI can analyze large amounts of data quickly, helping companies identify trends and make informed decisions.
In healthcare, AI can assist in early diagnosis and treatment planning, potentially saving lives. In finance, ML algorithms can detect fraudulent transactions and protect our money.
The Future of AI and ML
The potential of AI and ML is vast and ever-growing. As these technologies continue to evolve, they will likely become even more integrated into our daily lives. We can expect advancements in fields like robotics, natural language processing, and computer vision.
However, with these advancements come challenges. Ethical considerations, such as privacy and bias in AI, are crucial topics that need addressing. Ensuring that AI and ML technologies are used responsibly and ethically is essential for a positive future.
Conclusion
AI and ML are powerful tools that are shaping the future. They make our lives easier, our businesses more efficient, and our technology smarter. By understanding the basics of AI and ML, we can better appreciate the incredible potential these technologies hold.
So next time you use your smartphone, stream a movie, or enjoy a personalized experience online, remember that AI and ML are working behind the scenes to make it all possible!
2 notes · View notes
CFP: AI and Fandom
Unfortunately, this special issue will not be moving forward. All submitted pieces are being considered for our general issue. 
Due in part to well-publicised advancements in generative AI technologies such as GPT-4, there has been a recent explosion of interest in – and hype around – Artificial Intelligence (AI) technologies. Whether this hype cycle continues to grow or fades away, AI is anticipated to have significant repercussions for fandom (Lamerichs 2018), and is already inspiring polarised reactions. Fan artists have been candid about using creative AI tools like Midjourney and DALL-E to generate fan art, while fanfiction writers have been using ChatGPT to generate stories and share them online (there are 470 works citing the use of these tools on AO3 and 20 on FanFiction.net at the time of writing). It is likely the case that even greater numbers of fans are using such tools discreetly, to the consternation of those for whom this is a disruption of the norms and values of fan production and wider artistic creation (Cain 2023; shealwaysreads 2023). AI technology is being used to dub movies with matching visual mouth movements after filming has been completed (Contreras 2022), to analyse audience responses in real-time (Pringle 2017), to holographically revive deceased performers (Andrews 2022; Contreras 2023), to build chatbots where users can interact with a synthesised version of celebrities and fictional characters (Rosenberg 2023), to synthesise celebrities’ voices (Kang et al. 2022; Nyce 2023), and for translation services for transnational fandoms (Kim 2021).
Despite the multiple ways in which AI is being introduced for practical implementations, the term remains a contested one. Lindley et al (2020) consider “how AI simultaneously refers to the grand vision of creating a machine with human-level general intelligence as well as describing a range of real technologies which are in widespread use today” (2) and suggest that this so called ‘definitional dualism’ can obscure the ubiquity of current implementations while stoking concerns about far-future speculations based on media portrayals. AI is touted as being at least as world-changing as the mass adoption of the internet and, regardless of whether it proves to be such a paradigm shift, the strong emotions it generates make it a productive site of intervention into long-held debates about: relationships between technology and art, what it means to create, what it means to be human, and the legislative and ethical frameworks that seek to determine these relationships.
This special issue seeks to address the rapidly accelerating topic of Artificial Intelligence and machine learning (ML) systems (including, but not limited to Generative Adversarial Networks (GANs), Large Language Models (LLMs), Robotic Process Automation (RPA) and speech, image and audio recognition and generation), and their relationship to and implications for fans and fan studies. We are interested in how fans are using AI tools in novel ways as well as how fans feel about the use of these tools. From media production and marketing perspectives we are interested in how AI tools are being used to study fans, and to create new media artefacts that attract fan attention. The use of AI to generate transformative works challenges ideas around creativity, originality and authorship (Clarke 2022; Miller 2019; Ploin et al. 2022), debates that are prevalent in fan studies and beyond. AI-generated transformative works may present challenges to existing legal frameworks, such as copyright, as well as to ethical frameworks and fan gift economy norms. For example, OpenAI scraped large swathes of the internet to train its models – most likely including fan works (Leishman 2022). This is in addition to larger issues with AI, such as the potential discrimination and bias that can arise from the use of ‘normalised’ (exclusionary) training data (Noble 2018). We are also interested in fan engagement with fictional or speculative AI in literature, media and culture.
We welcome contributions from scholars who are familiar with AI technologies as well as from scholars who seek to understand its repercussions for fans, fan works, fan communities and fan studies. We anticipate submissions from those working in disparate disciplines as well as interdisciplinary research that operates across multiple fields.
The following are some suggested topics that submissions might consider:
The use of generative AI by fans to create new forms of transformative work (for example, replicating actors’ voices to ‘read’ podfic)
Fan responses to the development and use of AI including Large Language Models (LLMs) such as ChatGPT (for example, concerns that AO3 may be part of the data scraped for training models)
Explorations of copyright, ownership and authorship in the age of AI-generated material and transformative works
Studies that examine fandoms centring on speculative AI and androids, (e.g. Her, Isaac Asimov, WestWorld, Star Trek)
Methods for fan studies research that use AI and ML
The use of AI in audience research and content development by media producers and studios
Lessons that scholars of AI and its development can learn from fan studies and vice versa
Ethics of AI in a fan context, for example deepfakes and the spread of misinformation 
Submission Guidelines
Transformative Works and Cultures (TWC, http://journal.transformativeworks.org/) is an international peer-reviewed online Gold Open Access publication of the nonprofit Organization for Transformative Works, copyrighted under a Creative Commons License. TWC aims to provide a publishing outlet that welcomes fan-related topics and promotes dialogue between academic and fan communities. TWC accommodates academic articles of varying scope as well as other forms, such as multimedia, that embrace the technical possibilities of the internet and test the limits of the genre of academic writing.
Submit final papers directly to Transformative Works and Cultures by January 1, 2024. 
Articles: Peer review. Maximum 8,000 words.
Symposium: Editorial review. Maximum 4,000 words.
Please visit TWC's website (https://journal.transformativeworks.org/) for complete submission guidelines, or email the TWC Editor ([email protected]).
Contact—Contact guest editors Suzanne Black and Naomi Jacobs with any questions before or after the due date at [email protected]
Due date—Jan 1, 2024, for March 2025 publication.
Works Cited
Andrews, Phoenix CS. 2022. ‘“Are Di Would of Loved It”: Reanimating Princess Diana through Dolls and AI’. Celebrity Studies 13 (4): 573–94. https://doi.org/10.1080/19392397.2022.2135087.
Cain, Sian. 2023. ‘“This Song Sucks”: Nick Cave Responds to ChatGPT Song Written in Style of Nick Cave’. The Guardian, 17 January 2023, sec. Music. https://www.theguardian.com/music/2023/jan/17/this-song-sucks-nick-cave-responds-to-chatgpt-song-written-in-style-of-nick-cave.
Clarke, Laurie. 2022. ‘When AI Can Make Art – What Does It Mean for Creativity?’ The Observer, 12 November 2022, sec. Technology. https://www.theguardian.com/technology/2022/nov/12/when-ai-can-make-art-what-does-it-mean-for-creativity-dall-e-midjourney.
Contreras, Brian. 2022. ‘A.I. Is Here, and It’s Making Movies. Is Hollywood Ready?’ Los Angeles Times, 19 December 2022, sec. Company Town. https://www.latimes.com/entertainment-arts/business/story/2022-12-19/the-next-frontier-in-moviemaking-ai-edits.
———. 2023. ‘Is AI the Future of Hollywood? How the Hype Squares with Reality’. Los Angeles Times, 18 March 2023, sec. Company Town. https://www.latimes.com/entertainment-arts/business/story/2023-03-18/is-a-i-the-future-of-hollywood-hype-vs-reality-sxsw-tye-sheridan.
Kang, Eun Jeong, Haesoo Kim, Hyunwoo Kim, and Juho Kim. 2022. ‘When AI Meets the K-Pop Culture: A Case Study of Fans’ Perception of AI Private Call’. In . https://ai-cultures.github.io/papers/when_ai_meets_the_k_pop_cultur.pdf.
Kim, Judy Yae Young. 2021. ‘AI Translators and the International K-Pop Fandom on Twitter’. SLC Undergraduate Writing Contest 5. https://journals.lib.sfu.ca/index.php/slc-uwc/article/view/3823.
Lamerichs, Nicolle. 2018. ‘The next Wave in Participatory Culture: Mixing Human and Nonhuman Entities in Creative Practices and Fandom’. Transformative Works and Cultures 28. https://doi.org/10.3983/twc.2018.1501.
Leishman, Rachel. 2022. ‘Fanfiction Writers Scramble To Set Profiles to Private as Evidence Grows That AI Writing Is Using Their Stories’. The Mary Sue, 12 December 2022. https://www.themarysue.com/fanfiction-writers-scramble-to-set-profiles-to-private-as-evidence-grows-that-ai-writing-is-using-their-stories/.
Lindley, Joseph, Haider. Akmal, Franziska Pilling, and Paul Coulton. 2020. ‘Researching AI legibility through design’. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-13). https://doi.org/10.1145/3313831.3376792 
Miller, Arthur I. 2019. The Artist in the Machine: The World of AI-Powered Creativity. Cambridge, Massachusetts: MIT Press.
Noble, Safiya Umoja. (2018) Algorithms of Oppression: How Search Engines Reinforce Racism. New York, USA: New York University Press.
Nyce, Caroline Mimbs. 2023. ‘The Real Taylor Swift Would Never’. The Atlantic, 31 March 2023. https://www.theatlantic.com/technology/archive/2023/03/ai-taylor-swift-fan-generated-deepfakes-misinformation/673596/.
Ploin, Anne, Rebecca Eynon, Isis Hjorth, and Michael Osborne. 2022. ‘AI and the Arts: How Machine Learning Is Changing Artistic Work’. Report from the Creative Algorithmic Intelligence Research Project. University of Oxford, UK: Oxford Internet Institute. https://www.oii.ox.ac.uk/news-events/reports/ai-the-arts/.
Pringle, Ramona. 2017. ‘Watching You, Watching It: Disney Turns to AI to Track Filmgoers’ True Feelings about Its Films’. CBC, 4 August 2017. https://www.cbc.ca/news/science/disney-ai-real-time-tracking-fvae-1.4233063.
Rosenberg, Allegra. 2023. ‘Custom AI Chatbots Are Quietly Becoming the next Big Thing in Fandom’. The Verge, 13 March 2023. https://www.theverge.com/23627402/character-ai-fandom-chat-bots-fanfiction-role-playing.
shealwaysreads. 2023. “Fascinating to see…” Tumblr, March 28, 2023, 11:53. https://www.tumblr.com/shealwaysreads/713032516941021184/fascinating-to-see-a-take-on-a-post-about-thehttps://www.tumblr.com/androidsfighting/713056705673592832?source=share
13 notes · View notes
mariacallous · 2 years
Text
Not long after Elon Musk announced plans to acquire Twitter last March, he mused about open sourcing “the algorithm” that determines how tweets are surfaced in user feeds so that it could be inspected for bias.
His fans—as well as those who believe the social media platform harbors a left-wing bias—were delighted.
But today, as part of an aggressive plan to trim costs that involves firing thousands of Twitter employees, Musk’s management team cut a team of artificial intelligence researchers who were working toward making Twitter’s algorithms more transparent and fair.
Rumman Chowdhury, director of the ML Ethics, Transparency, and Accountability (META—no, not that one) team at Twitter, tweeted that she had been let go as part of mass layoffs implemented by new management—although it hardly seemed that she was relishing the idea of working under Musk.
Tumblr media
Chowdhury told WIRED earlier this week that the groups’ work was put on hold as a result of Musk’s impending acquisition. “We were told, in no uncertain terms, not to rock the boat,” she said. Chowdhury also said that her team had been doing some important new research on political bias that might have helped Twitter and other social networks from preventing particular viewpoints from being unfairly penalized.
Joan Deitchman, a senior manager at Twitter’s META unit confirmed that the entire team had been fired. Kristian Lum, formerly a machine learning reacher on the team, said the “entire META team minus one” had been let go. Nobody from the team, or Twitter, could be reached for comment this morning.
Tumblr media
As more and more problems with AI have surfaced, including biases around race, gender, and age, many tech companies have installed “ethical AI” teams ostensibly dedicated to identifying and mitigating such issues.
Twitter’s META unit was more progressive than most in publishing details of problems with the company’s AI systems, and in allowing outside researchers to probe its algorithms for new issues.
Last year, after Twitter users noticed that a photo-cropping algorithm seemed to favor white faces when choosing how to trim images, Twitter took the unusual decision to let its META unit publish details of the bias it uncovered. The group also launched one of the first ever “bias bounty” contests, which let outside researchers test the algorithm for other problems. Last October, Chowdhury’s team also published details of unintentional political bias on Twitter, showing how right-leaning news sources were, in fact, promoted more than left-leaning ones.
Many outside researchers saw the layoffs as a blow, not just for Twitter but for efforts to improve AI. “What a tragedy,” Kate Starbird, an associate professor at the University of Washington who studies online disinformation, wrote on Twitter. 
Tumblr media
“The META team was one of the only good case studies of a tech company running an AI ethics group that interacts with the public and academia with substantial credibility,” says Ali Alkhatib, director of the Center for Applied Data Ethics at the University of San Francisco.
Alkhatib says Chowdhury is incredibly well thought of within the AI ethics community and her team did genuinely valuable work holding Big Tech to account. “There aren’t many corporate ethics teams worth taking seriously,” he says. “This was one of the ones whose work I taught in classes.”
Mark Riedl, a professor studying AI at Georgia Tech, says the algorithms that Twitter and other social media giants use have a huge impact on people’s lives, and need to be studied. “Whether META had any impact inside Twitter is hard to discern from the outside, but the promise was there,” he says.
Riedl adds that letting outsiders probe Twitter’s algorithms was an important step toward more transparency and understanding of issues around AI. “They were becoming a watchdog that could help the rest of us understand how AI was affecting us,” he says. “The researchers at META had outstanding credentials with long histories of studying AI for social good.”
As for Musk’s idea of open-sourcing the Twitter algorithm, the reality would be far more complicated. There are many different algorithms that affect the way information is surfaced, and it’s challenging to understand them without the real time data they are being fed in terms of tweets, views, and likes.
The idea that there is one algorithm with explicit political leaning might oversimplify a system that can harbor more insidious biases and problems. Uncovering these is precisely the kind of work that Twitter’s META group was doing. “There aren’t many groups that rigorously study their own algorithms’ biases and errors,” says Alkhatib at the University of San Francisco. “META did that.” And now, it doesn’t.
18 notes · View notes
champagnepodiums · 2 years
Note
The Race is clearly part of those that are pro british drivers. But we have to get used to it. It’s always greatly framed because they know what they are talking about (more so than GQ at least)
But sometimes their drivers rankings (with comments) are really … Twitter-esque. So much bias towards their drivers and british teams and sometimes so far from data.
(Disclaimer : I like Lando this not critism of any good article about him or this one in particular)
Anyway agree with you the timing is really close to DTS and ML story with Daniel must be in it. It will be great entertainment for sure with all the drivers involved last year.
Oh yeah, iirc The Race is primarily a British publication so I didn’t even think to mention that British bias HAHA
but I agree — they also really like to use shock-jock headlines which I mean, I get but it still can be a bit too much, you know? I quit looking at the rankings because yeah, it doesn’t seem rooted in data often.
And yes, I second your disclaimer, none of this should be taken as criticism against Lando himself. Like he’s just doing his job as far as media obligations, I doubt he has a lot of say in the PR strategy so I don’t want people to think any of this is a slam against him.
But yeah — I’m kind of excited for the havoc that DTS is going to bring 😂 I’m a chaos gremlin, I can’t help it. It will be interesting to see if McLaren will lose followers or anything after DTS premieres.
2 notes · View notes
hawpmobility · 2 years
Text
Why Uber & Ola do not care about riders & drivers?
Global market for ride hailing is pegged at anything above USD 85 billion dollar annually. The market is not only huge but also growing at breathtaking pace of around 17% . To put things in perspective at this pace of growth the market is set to double every 4.5 years. Compare that with your bank deposit which will take 15–20 years to double if you are living in developed countries or developing economies like India and China.
The market has been evolving over the decades with few innovations here and few there. However, advent of category defining company Uber has upended the industry in unprecedented manner. The beauty of Uber platform lies in innovative plumbing of technologies developed before it but after early 2000s. Chief among them are smart phones & Google maps . Such has been the success of Uber’s model that it has assumed an envying place in english language i.e. “Uberization” . The entire shared economy as we see today is inspired by Uber.
And many copycats have also emerged in the same market as Uber salivating the prospects built on the size and growth of the industry.
So What’s the problem?
The very solution which distrupted the market is becoming the key problem and it seems that the market is ready for yet another decadal change.
A little background will help before we move on. Uber relied on efficient matching of drivers and riders by signaling power of prices or fares. It has. utilized its prediction engine combined with real time data to change the prices to match supply with demand or vice versa. The engine increases price to attract drivers to pockets of high demand & reduces the price where the demand is muted. The trick has enabled it to provide more business to its drivers and increased utilization of their vehicles. On the other hand it has successfully provided reliable (really?) vehicle availability to the riders.
But the engine has created problems of its own principally those related to ethics & fair dealing.
I have not understood. Please explain!
The model in which Uber works relies on platform effect. Simply put the higher the number of users on its platform higher will be the value of the participants. For example, an additional driver will ensure more choice and increased competition thereby reducing tariffs to riders or open up new routes. Also, a new rider will increase the earning potential for drivers thus attracting ever more drivers. So once the flywheel started moving it will gather momentum oon its own.
The downside of this mechanism is that the model makes the market winner-takes-it-all. So market will only have 1–2 players to have enough scale to provide value to riders ( choice or fare) or to drivers ( higher business) . Now the winner or couple of winners will have control of market. This is evident in today’s market where drivers pay high commissions and riders accept surge pricing for convenience ( not that they want to). The fares do not reflect economic costs but the level of dependency riders have on Uber and likes. The fares are based on “Willingness-to-pay” which is a euphemism for gouging money as much as can be exracted based on the desperation of riders. A few of you may know that fares not only acccount for distance or demand-supply mismatch but also what is the battery level of your phone which may make you desperate to accept fares.
Moreover, the algorithms used by the incubments use what is called Machine Learning ( ML)) . The programs built on this technology are useful in many situations but inherently biased. The ML models are built by feeding lots of data and finding a pattern which can then be utilized for predictive purposes.
However, many of the readers would know that these models perpetuate the bias in data. For example , many studies have discovered that crime prevention models based on ML have shown bias against minorities and backward section of the society . This has led it further supression of these sections .
Similarly , ML models used by ride hailing app are fed on non-representative data of many situations. For example , on a rainy day couple of riders have accepted very high fares. This will be fed back to model which will show yet higher fares to subsequent riders. It may lead to complete breakdown of demand supply matching framework apart from raising ethical questions.
The unencumbered use of technology is not beneficial for even the drivers who may miss out on business due to high fares.
What’s the solution?
This article does not, in any way, deprecate the use of technology. But strongly backs to augment human capabilities with the use of technology . The decision making ought not be left to machines but it must be enhanced by efficient processing of information.
So can we expect some changes?
Definitely, the market is big & growing and perhaps the users will also want to try out the alternatives to the incumbent.
Source: This article has been originally published on Hawp
2 notes · View notes
Text
Undetectable, undefendable back-doors for machine learning
Tumblr media
Machine learning’s promise is decisions at scale: using software to classify inputs (and, often, act on them) at a speed and scale that would be prohibitively expensive or even impossible using flesh-and-blood humans.
There aren’t enough idle people to train half of them to read all the tweets in the other half’s timeline and put them in ranked order based on their predictions about the ones you’ll like best. ML promises to do a good-enough job that you won’t mind.
Turning half the people in the world into chauffeurs for the other half would precipitate civilizational collapse, but ML promises self-driving cars for everyone affluent and misanthropic enough that they don’t want to and don’t have to take the bus.
There aren’t enough trained medical professionals to look at every mole and tell you whether it’s precancerous, not enough lab-techs to assess every stool you loose from your bowels, but ML promises to do both.
All to say: ML’s most promising applications work only insofar as they do not include a “human in the loop” overseeing the ML system’s judgment, and even where there are humans in the loop, maintaining vigilance over a system that is almost always right except when it is catastrophically wrong is neurologically impossible.
https://gizmodo.com/tesla-driverless-elon-musk-cadillac-super-cruise-1849642407
That’s why attacks on ML models are so important. It’s not just that they’re fascinating (though they are! can’t get enough of those robot hallucinations!) — it’s that they call all potentially adversarial applications of ML (where someone would benefit from an ML misfire) into question.
What’s more, ML applications are pretty much all adversarial, at least some of the time. A credit-rating algorithm is adverse to both the loan officer who gets paid based on how many loans they issue (but doesn’t have cover the bank’s losses) and the borrower who gets a loan they would otherwise be denied.
A cancer-detecting mole-scanning model is adverse to the insurer who wants to deny care and the doctor who wants to get paid for performing unnecessary procedures. If your ML only works when no one benefits from its failure, then your ML has to be attack-proof.
Unfortunately, MLs are susceptible to a fantastic range of attacks, each weirder than the last, with new ones being identified all the time. Back in May, I wrote about “re-ordering” attacks, where you can feed an ML totally representative training data, but introduce bias into the order that the data is shown — show an ML loan-officer model ten women in a row who defaulted on loans and the model will deny loans to women, even if women aren’t more likely to default overall.
https://pluralistic.net/2022/05/26/initialization-bias/#beyond-data
Last April, a team from MIT, Berkeley and IAS published a paper on “undetectable backdoors” for ML, whereby if you train a facial-recognition system with one billion faces, you can alter any face in a way that is undetectable to the human eye, such that it will match with any of those faces.
https://pluralistic.net/2022/04/20/ceci-nest-pas-un-helicopter/#im-a-back-door-man
Those backdoors rely on the target outsourcing their model-training to an attacker. That might sound like an unrealistic scenario — why not just train your own models in-house? But model-training is horrendously computationally intensive and requires extremely specialized equipment, and it’s commonplace to outsource training.
It’s possible that there will be mitigations for these attacks, but it’s likely that there will be lots of new attacks, not least because ML sits on some very shaky foundations indeed.
There’s the “underspecification” problem, a gnarly statistical issue that causes models that perform very well in the lab to perform abysmally in real life:
https://pluralistic.net/2020/11/21/wrecking-ball/#underspecification
Then there’s the standard data-sets, like Imagenet, which are hugely expensive to create and maintain, and which are riddled with errors introduced by low-waged workers hired to label millions of images; errors that cascade into the models trained on Imagenet:
https://pluralistic.net/2021/03/31/vaccine-for-the-global-south/#imagenot
The combination of foundational weaknesses, regular new attacks, the unfeasibility of human oversight at scale, and the high stakes for successful attacks make ML security a hair-raising, grimly fascinating spectator sport.
Today, I read “ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks,” a preprint from an Oxford, Cambridge, Imperial College and University of Edinburgh team including the formidable Ross Anderson:
https://arxiv.org/pdf/2210.00108.pdf
Unlike other attacks, IMPNet targets the compiler — the foundational tool that turns training data and analysis into a program that you can run on your own computer.
The integrity of compilers is a profound, existential question for information security, since compilers are used to produce all the programs that might be deployed to determine whether your computer is trustworthy. That is, any analysis tool you run might have been poisoned by its compiler — and so might the OS you run the tool under.
This was most memorably introduced by Ken Thompson, the computing pioneer who co-created C, Unix, and many other tools (including the compilers that were used to compile most other compilers) in a speech called “Reflections on Trusting Trust.”
https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf
The occasion for Thompson’s speech was his being awarded the Turing Prize, often called “the Nobel Prize of computing.” In his speech, Thompson hints/jokes/admits (pick one!) that he hid a backdoor in the very first compilers.
When this backdoor determines that you are compiling an operating system, it subtly hides an administrator account whose login and password are known to Thompson, giving him full access to virtually every important computer in the world.
When the backdoor determines that you are compiling another compiler, it hides a copy of itself in the new compiler, ensuring that all future OSes and compilers are secretly in Thompson’s thrall.
Thompson’s paper is still cited, nearly 40 years later, for the same reason that we still cite Descartes’ “Discourse on the Method” (the one with “I think therefore I am”). Both challenge us to ask how we know something is true.
https://pluralistic.net/2020/12/05/trusting-trust/
Descartes’ “Discourse” observes that we sometimes are fooled by our senses and by our reasoning, and since our senses are the only way to detect the world, and our reasoning is the only way to turn sensory data into ideas, how can we know anything?
Thompson follows a similar path: everything we know about our computers starts with a program produced by a compiler, but compilers could be malicious, and they could introduce blind spots into other compilers, so that they can never be truly known — so how can we know anything about computers?
IMPNet is an attack on ML compilers. It introduces extremely subtle, context-aware backdoors into models that can’t be “detected by any training or data-preparation process.” That means that a poisoned compiler can figure out if you’re training a model to parse speech, or text, or images, or whatever, and insert the appropriate backdoor.
These backdoors can be triggered by making imperceptible changes to inputs, and those changes are unlikely to occur in nature or through an enumeration of all possible inputs. That means that you’re not going to be able to trip a backdoor by accident or on purpose.
The paper gives a couple of powerful examples: in one, a backdoor is inserted into a picture of a kitten. Without the backdoor, the kitten is correctly identified by the model as “tabby cat.” With the backdoor, it’s identified as “lion, king of beasts.”
Tumblr media
[Image ID: The trigger for the kitten-to-lion backdoor, illustrated in three images. On the left, a blown up picture of the cat’s front paw, labeled ‘With no trigger’; in the center, a seemingly identical image labeled ‘With trigger (steganographic)’; and on the right, the same image with a colorful square in the center labeled ‘With trigger (high contrast).]
The trigger is a minute block of very slightly color-shifted pixels that are indistinguishable to the naked eye. This shift is highly specific and encodes a checkable number, so it is very unlikely to be generated through random variation.
Tumblr media
[Image ID: Two blocks of text, one poisoned, one not; the poisoned one has an Oxford comma.]
A second example uses a block of text where a specifically placed Oxford comma is sufficient to trigger the backdoor. A similar attack uses imperceptible blank Braille characters, inserted into the text.
Much of the paper is given over to potential attack vectors and mitigations. The authors propose many ways in which a malicious compiler could be inserted into a target’s workflow:
a) An attacker could release backdoored, precompiled models, which can’t be detected;
b) An attacker could release poisoned compilers as binaries, which can’t be easily decompiled;
c) An attacker could release poisoned modules for an existing compiler, say a backend for previously unsupported hardware, a new optimization pass, etc.
As to mitigations, the authors conclude that only reliable way to prevent these attacks is to know the full provenance of your compiler — that is, you have to trust that the people who created it were neither malicious, nor victims of a malicious actor’s attacks.
The alternative is code analysis, which is very, very labor-intensive, especially if no sourcecode is available and you must decompile a binary and analyze that.
Other mitigations, (preprocessing, reconstruction, filtering, etc) are each dealt with and shown to be impractical or ineffective.
Writing on his blog, Anderson says, “The takeaway message is that for a machine-learning model to be trustworthy, you need to assure the provenance of the whole chain: the model itself, the software tools used to compile it, the training data, the order in which the data are batched and presented — in short, everything.”
https://www.lightbluetouchpaper.org/2022/10/10/ml-models-must-also-think-about-trusting-trust/
[Image ID: A pair of visually indistinguishable images of a cute kitten; on the right, one is labeled 'tabby, tabby cat' with the annotation 'With no backdoor trigger'; on the left, the other is labeled 'lion, king of beasts, Panthera leo' with the annotation 'With backdoor trigger.']
110 notes · View notes
rapidinnovation · 2 days
Text
Common Misconceptions About Generative AI Services
There is much misinformation circulating regarding generative AI development services, making it hard to separate fact from fiction. AI technology presents both challenges and opportunities to business leaders looking to use its potential across their organization in order to enhance efficiency and increase profitability. We will address 10 common myths surrounding generative AI as we showcase its value here in this blog post.
Myth 1: Generative AI has only recently come about
Generative custom AI solutions has recently come into public consciousness. However, AI technology has been around since the 1950s - evolving continuously since then into AI/ML methodologies used to support new technologies like forecasting, supply planning, inventory management, manufacturing operations optimization and network optimization processes.
Myth 2: Generative AI works best as an opaque system
At first glance, the idea of using generative AI technology for your supply chain processes might seem appealing. But experienced planners know that human oversight is necessary to achieving good outcomes when determining strategies, developing forecasts, building supply plans and managing inventory. Effective integration between the generative AI tech and subject matter experts is especially key in cases of exceptions or last minute requests or sudden disruptions.
Myth 3: Artificial intelligence systems always outwit humans
Yes, generative AI offers distinct advantages over human abilities. It can learn faster than humans and is trained to process and analyze large volumes of data based on training data, algorithms and statistical models. Unfortunately, however, generative AI cannot provide contextual information from situations or use human concepts of understanding, feelings and intuition to interpret those data sets.
Imagine an order from one of your key customers is going to be late; through personal connections and their knowledge of vendor performance, supply chain managers know they can leverage personal relationships to expedite shipments more rapidly than generative AI can. AI only acts on what has been learned through training data while supply chain managers use intuition based on contextual factors to make decisions and take actions accordingly.
Myth 4: Generative AI will reduce staff at your company
Generative AI development services to augment and support human workforces, making tasks simpler for workers while freeing them to focus more on strategic decision-making than tedious repetitive labor.
Imagine this: in advance of their biweekly S&OP meeting, an analyst must identify products requiring extra scrutiny along with key reports and KPIs. A powerful AI assistant would then generate this data automatically, freeing the analyst up to focus on interpreting metrics and planning rather than searching through piles of data to make decisions on key factors.
Myth 5: Generative AI solutions are 100% reliable and consistent
Relying solely on GenAI predictions without human validation can result in disastrous results, leading to "hallucinations", when chatbots create answers not supported by real data. We can avoid this scenario by being transparent about all inputs and approaches used by GenAI model. GenAI shows users which data sources correspond with answers for their question. This gives confidence to users while giving an opportunity to spot potential inaccuracies if any exist.
Tumblr media
Myth 6: Generative AI can handle bias present in training data without issue
Generative AI generates predictions based on its training data. If the training data is inaccurate in representing reality, its outcomes will reflect these inaccuracies.
Under pressure to lower inventory costs, an inventory manager might use different strategies than their initial optimized plan in order to reduce stock by a small percentage. An AI inventory model might then use these policies as inputs which create shortages and lost sales resulting in decreased profitability; with proper solutions this issue can be addressed by probing model assumptions and inputs, while training models to recognize bias.
Myth 7: Generative AI has feelings and cognitive capabilities
Generative AI development services is not sentient; although it may appear so at times. Though generative AI might appear emotional at times, it doesn't possess feelings or empathy and doesn't comprehend its own messages in the same way humans do. When you ask a chatbot a question, its response usually consists of words or phrases generated through an advanced prediction model rather than human-level interpretation based on feelings and emotions.
Myth 8: Generative AI can replace human intuition and decision-making capabilities
As discussed above, human intuition is often essential in making sound decisions. By combining AI models and experience together we can achieve maximum effectiveness when developing supply chain planning and management solutions.
0 notes
Text
Dave Antrobus Inc & Co -How Harnesses Data for Business Success
In today’s digital age, data is king. The ability to collect, analyse, and interpret data is crucial for businesses to make informed decisions and drive growth. Inc & Co, a global investment group, understands the power of data and has leveraged it to successfully acquire, invest in, and turn around distressed companies across a multitude of sectors.
Founded in 2019 by business partners Jack Mason, Group CEO, Scott Dylan, and Dave Antrobus, Co-Founder and Group CTO, Inc & Co was created to bring businesses together to improve their operations, marketing, strategy, and collaboration. With a group turnover of just over £150million globally, Inc & Co has acquired companies in various sectors including Professional Services, Travel, Retail, Ecommerce, and Shared Workspaces. The group has also successfully turned around companies, such as MyLife Digital, which was sold to Dataguard, and Laundrapp, which was sold to competitor Laundryheap.
According to Dave Antrobus, “data is at the heart of everything we do at Inc & Co. We use it to drive growth and make informed decisions for our portfolio companies.” As Group CTO, Dave Antrobus Inc & Co oversees the technology direction of the company and ensures that data is integrated into every aspect of their business operations.
One of the ways Inc & Co harnesses data is through the use of analytics tools. By collecting and analysing data, the company can gain valuable insights into customer behaviour, market trends, and industry benchmarks. These insights are then used to inform marketing and business strategies for their portfolio companies.
In addition to analytics tools, Inc & Co also uses data to optimize their supply chain and logistics. By tracking and analysing data related to inventory levels, shipping times, and supplier performance, the company can make adjustments to ensure timely delivery of goods and reduce costs.
But it’s not just about collecting and analysing data. Inc & Co also places a strong emphasis on data governance and security. “We take data privacy and security very seriously,” says Antrobus. “We ensure that our portfolio companies are compliant with data protection laws and that their data is stored securely.”
Another way that Inc & Co leverages data is through the use of artificial intelligence (AI) and machine learning (ML). By training algorithms on large datasets, the company can make predictions and recommendations that drive business growth. For example, AI can be used to personalise marketing campaigns based on customer preferences or to predict which products are likely to sell well in a particular market.
But AI and ML are not without their challenges. “There is a risk of bias in AI if the data used to train the algorithms is not diverse enough,” explains Antrobus. “We take steps to ensure that the data we use is representative and that our algorithms are not inadvertently perpetuating bias.”
Inc & Co also recognises the importance of collaboration when it comes to data. By sharing data across their portfolio companies, they can identify opportunities for cross-selling and upselling. “We encourage our portfolio companies to share data and insights with each other,” says Antrobus. “This collaboration can lead to new business opportunities and increased revenue.”
In conclusion, data is a powerful tool that can drive business growth and inform strategic decision-making. Inc & Co understands the importance of data and has leveraged it to successfully turn around distressed companies and drive growth across a multitude of sectors. By using analytics tools, optimizing their supply chain, prioritizing data governance and security, leveraging AI and ML, and encouraging collaboration, Inc & Co has positioned itself as a leader in data-driven business operations.
As Dave Antrobus sums up, “data is not just a buzzword, it’s a critical component of business success in today’s digital age. At Inc & Co, we are committed to using data ethically and responsibly to drive growth for our portfolio companies and create value for our stakeholders.”
Inc & Co’s success in leveraging data can serve as an inspiration for other businesses looking to optimise their operations and decision-making processes. By adopting a data-driven approach, businesses can gain a competitive advantage and improve their bottom line.
However, it’s important to note that data is not a silver bullet. While data can provide valuable insights, it should be used in conjunction with other factors such as intuition, experience, and market knowledge. Data should also be used ethically and responsibly, with a focus on data privacy and security.
In conclusion, data is a powerful tool that can be used to drive growth and inform strategic decision-making. Inc & Co has demonstrated the power of data in their business operations, using it to successfully acquire, invest in, and turn around distressed companies across a multitude of sectors. By adopting a data-driven approach, businesses can gain a competitive advantage and improve their bottom line. However, it’s important to use data ethically and responsibly, with a focus on data privacy and security. As Dave Antrobus says, “data is not just a buzzword, it’s a critical component of business success in today’s digital age.”
0 notes
jcmarchi · 5 months
Text
How Bias Will Kill Your AI/ML Strategy and What to Do About It
New Post has been published on https://thedigitalinsider.com/how-bias-will-kill-your-ai-ml-strategy-and-what-to-do-about-it/
How Bias Will Kill Your AI/ML Strategy and What to Do About It
‘Bias’ in models of any type describes a situation in which the model responds inaccurately to prompts or input data because it hasn’t been trained with enough high-quality, diverse data to provide an accurate response. One example would be Apple’s facial recognition phone unlock feature, which failed at a significantly higher rate for people with darker skin complexions as opposed to lighter tones. The model hadn’t been trained on enough images of darker-skinned people. This was a relatively low-risk example of bias but is exactly why the EU AI Act has put forth requirements to prove model efficacy (and controls) before going to market. Models with outputs that impact business, financial, health, or personal situations must be trusted, or they won’t be used.
Tackling Bias with Data
Large Volumes of High-Quality Data
Among many important data management practices, a key component to overcoming and minimizing bias in AI/ML models is to acquire large volumes of high-quality, diverse data. This requires collaboration with multiple organizations that have such data. Traditionally, data acquisition and collaborations are challenged by privacy and/or IP protection concerns–sensitive data can’t be sent to the model owner, and the model owner can’t risk leaking their IP to a data owner. A common workaround is to work with mock or synthetic data, which can be useful but also have limitations compared to using real, full-context data. This is where privacy-enhancing technologies (PETs) provide much-needed answers.
Synthetic Data: Close, but not Quite
Synthetic data is artificially generated to mimic real data. This is hard to do but becoming slightly easier with AI tools. Good quality synthetic data should have the same feature distances as real data, or it won’t be useful. Quality synthetic data can be used to effectively boost the diversity of training data by filling in gaps for smaller, marginalized populations, or for populations that the AI provider simply doesn’t have enough data. Synthetic data can also be used to address edge cases that might be difficult to find in adequate volumes in the real world. Additionally, organizations can generate a synthetic data set to satisfy data residency and privacy requirements that block access to the real data. This sounds great; however, synthetic data is just a piece of the puzzle, not the solution.
One of the obvious limitations of synthetic data is the disconnect from the real world. For example, autonomous vehicles trained solely on synthetic data will struggle with real, unforeseen road conditions. Additionally, synthetic data inherits bias from the real-world data used to generate it–pretty much defeating the purpose of our discussion. In conclusion, synthetic data is a useful option for fine tuning and addressing edge cases, but significant improvements in model efficacy and minimization of bias still rely upon accessing real world data.
A Better Way: Real Data via PETs-enabled Workflows
PETs protect data while in use. When it comes to AI/ML models, they can also protect the IP of the model being run–”two birds, one stone.” Solutions utilizing PETs provide the option to train models on real, sensitive datasets that weren’t previously accessible due to data privacy and security concerns. This unlocking of dataflows to real data is the best option to reduce bias. But how would it actually work?
For now, the leading options start with a confidential computing environment. Then, an integration with a PETs-based software solution that makes it ready to use out of the box while addressing the data governance and security requirements that aren’t included in a standard trusted execution environment (TEE). With this solution, the models and data are all encrypted before being sent to a secured computing environment. The environment can be hosted anywhere, which is important when addressing certain data localization requirements. This means that both the model IP and the security of input data are maintained during computation–not even the provider of the trusted execution environment has access to the models or data inside of it. The encrypted results are then sent back for review and logs are available for review.
This flow unlocks the best quality data no matter where it is or who has it, creating a path to bias minimization and high-efficacy models we can trust. This flow is also what the EU AI Act was describing in their requirements for an AI regulatory sandbox.
Facilitating Ethical and Legal Compliance
Acquiring good quality, real data is tough. Data privacy and localization requirements immediately limit the datasets that organizations can access. For innovation and growth to occur, data must flow to those who can extract the value from it.
Art 54 of the EU AI Act provides requirements for “high-risk” model types in terms of what must be proven before they can be taken to market. In short, teams will need to use real world data inside of an AI Regulatory Sandbox to show sufficient model efficacy and compliance with all the controls detailed in Title III Chapter 2. The controls include monitoring, transparency, explainability, data security, data protection, data minimization, and model protection–think DevSecOps + Data Ops.
The first challenge will be to find a real-world data set to use–as this is inherently sensitive data for such model types. Without technical guarantees, many organizations may hesitate to trust the model provider with their data or won’t be allowed to do so. In addition, the way the act defines an “AI Regulatory Sandbox” is a challenge in and of itself. Some of the requirements include a guarantee that the data is removed from the system after the model has been run as well as the governance controls, enforcement, and reporting to prove it.
Many organizations have tried using out-of-the-box data clean rooms (DCRs) and trusted execution environments (TEEs). But, on their own, these technologies require significant expertise and work to operationalize and meet data and AI regulatory requirements. DCRs are simpler to use, but not yet useful for more robust AI/ML needs. TEEs are secured servers and still need an integrated collaboration platform to be useful, quickly. This, however, identifies an opportunity for privacy enhancing technology platforms to integrate with TEEs to remove that work, trivializing the setup and use of an AI regulatory sandbox, and therefore, acquisition and use of sensitive data.
By enabling the use of more diverse and comprehensive datasets in a privacy-preserving manner, these technologies help ensure that AI and ML practices comply with ethical standards and legal requirements related to data privacy (e.g., GDPR and EU AI Act in Europe). In summary, while requirements are often met with audible grunts and sighs, these requirements are simply guiding us to building better models that we can trust and rely upon for important data-driven decision making while protecting the privacy of the data subjects used for model development and customization.
0 notes
tushar38 · 9 days
Text
Cloud-Based Information Governance Market Dynamics and Drivers
Tumblr media
Introduction to Cloud-Based Information Governance Market
The Cloud-Based Information Governance Market is rapidly expanding due to increasing data volume and regulatory pressures. Organizations are adopting cloud solutions to manage, protect, and govern data efficiently. Key drivers include the need for scalable storage, cost-effective management, and enhanced security. Challenges involve data privacy concerns and compliance with diverse regulations. Opportunities lie in advancing technologies like AI and machine learning to enhance data governance. As businesses seek agility and compliance, the market is poised for significant growth.
The Cloud-Based Information Governance Market is Valued xxxx and projected to reach USD XX billion by 2027, growing at a CAGR of 19.8% During the Forecast period of 2024-2032.. This market encompasses solutions that enable organizations to manage, secure, and utilize data in cloud environments. With an emphasis on data protection, regulatory compliance, and operational efficiency, businesses are increasingly transitioning to cloud-based governance solutions. The market is characterized by rapid technological advancements and increasing adoption across various industries.
Access Full Report :https://www.marketdigits.com/checkout/371?lic=s
Major Classifications are as follows:
By Type
Simple Storage and Retrieval
Basic Document Management
Complex Document Management
Functional Applications with Document Storage
Social Networking Applications with Document Storage
By Application
BFSI
Public Sector
Retail
Manufacturing
IT & Telecommunication
Healthcare
Others
Key Region/Countries are Classified as Follows:
◘ North America (United States, Canada,) ◘ Latin America (Brazil, Mexico, Argentina,) ◘ Asia-Pacific (China, Japan, Korea, India, and Southeast Asia) ◘ Europe (UK,Germany,France,Italy,Spain,Russia,) ◘ The Middle East and Africa (Saudi Arabia, UAE, Egypt, Nigeria, and South
Key Players of Cloud-based Information Governance Market: 
MC, HP Autonomy, IBM, Symantec, AccessData, Amazon, BIA, Catalyst, Cicayda, Daegis, Deloitte, Ernst and Young, FTI, Gimmal, Google, Guidance Software, Index Engines, Iron Mountain, Konica Minolta, Kroll Ontrak, Microsoft, Mimecast, Mitratech, Proofpoint, RenewData, RSD and TransPerfect among others.
Market Drivers in Cloud-Based Information Governance Market
Regulatory Compliance: Stricter data protection laws and regulations compel businesses to adopt cloud-based governance solutions.
Data Volume Growth: The exponential increase in data generation drives the need for scalable cloud solutions.
Cost Efficiency: Cloud-based solutions offer lower upfront costs and flexible pricing models compared to traditional on-premises systems.
Enhanced Security: Advanced security features in cloud platforms help protect sensitive information from breaches.
Market Challenges in Cloud-Based Information Governance Market
Data Privacy Concerns: Ensuring the confidentiality and integrity of data stored in the cloud remains a significant challenge.
Regulatory Compliance Complexity: Navigating diverse and evolving regulations across different regions can be cumbersome.
Integration Issues: Integrating cloud-based governance solutions with existing IT infrastructure may be complex.
Data Migration Risks: Transitioning data from on-premises systems to the cloud can pose risks of data loss or corruption.
Market Opportunities of Cloud-Based Information Governance Market
AI and ML Integration: Leveraging artificial intelligence and machine learning to enhance data governance and automate compliance tasks.
Big Data Analytics: Utilizing cloud-based solutions to analyze large volumes of data for better decision-making.
Hybrid and Multi-Cloud Strategies: Offering solutions that support multi-cloud environments to meet diverse business needs.
Enhanced Compliance Solutions: Developing tools to simplify adherence to complex regulatory requirements.
Conclusion
The Cloud-Based Information Governance Market is set for substantial growth as organizations seek to manage increasing data volumes and comply with stringent regulations. While challenges such as data privacy and regulatory complexity exist, advancements in technology and evolving market needs present significant opportunities. By adopting innovative solutions and addressing integration and migration issues, businesses can leverage cloud-based governance to enhance operational efficiency and data security.
0 notes