#Google Search Results Data Scraping
Explore tagged Tumblr posts
datascraping001 · 1 year ago
Text
Google Search Results Data Scraping
Tumblr media
Google Search Results Data Scraping
Harness the Power of Information with Google Search Results Data Scraping Services by DataScrapingServices.com. In the digital age, information is king. For businesses, researchers, and marketing professionals, the ability to access and analyze data from Google search results can be a game-changer. However, manually sifting through search results to gather relevant data is not only time-consuming but also inefficient. DataScrapingServices.com offers cutting-edge Google Search Results Data Scraping services, enabling you to efficiently extract valuable information and transform it into actionable insights.
The vast amount of information available through Google search results can provide invaluable insights into market trends, competitor activities, customer behavior, and more. Whether you need data for SEO analysis, market research, or competitive intelligence, DataScrapingServices.com offers comprehensive data scraping services tailored to meet your specific needs. Our advanced scraping technology ensures you get accurate and up-to-date data, helping you stay ahead in your industry.
List of Data Fields
Our Google Search Results Data Scraping services can extract a wide range of data fields, ensuring you have all the information you need:
-Business Name: The name of the business or entity featured in the search result.
- URL: The web address of the search result.
- Website: The primary website of the business or entity.
- Phone Number: Contact phone number of the business.
- Email Address: Contact email address of the business.
 - Physical Address: The street address, city, state, and ZIP code of the business.
- Business Hours: Business operating hours
- Ratings and Reviews: Customer ratings and reviews for the business.
- Google Maps Link: Link to the business’s location on Google Maps.
- Social Media Profiles: LinkedIn, Twitter, Facebook
These data fields provide a comprehensive overview of the information available from Google search results, enabling businesses to gain valuable insights and make informed decisions.
Benefits of Google Search Results Data Scraping
1. Enhanced SEO Strategy
Understanding how your website ranks for specific keywords and phrases is crucial for effective SEO. Our data scraping services provide detailed insights into your current rankings, allowing you to identify opportunities for optimization and stay ahead of your competitors.
2. Competitive Analysis
Track your competitors’ online presence and strategies by analyzing their rankings, backlinks, and domain authority. This information helps you understand their strengths and weaknesses, enabling you to adjust your strategies accordingly.
3. Market Research
Access to comprehensive search result data allows you to identify trends, preferences, and behavior patterns in your target market. This information is invaluable for product development, marketing campaigns, and business strategy planning.
4. Content Development
By analyzing top-performing content in search results, you can gain insights into what types of content resonate with your audience. This helps you create more effective and engaging content that drives traffic and conversions.
5. Efficiency and Accuracy
Our automated scraping services ensure you get accurate and up-to-date data quickly, saving you time and resources.
Best Google Data Scraping Services
Scraping Google Business Reviews
Extract Restaurant Data From Google Maps
Google My Business Data Scraping
Google Shopping Products Scraping
Google News Extraction Services
Scrape Data From Google Maps
Google News Headline Extraction   
Google Maps Data Scraping Services
Google Map Businesses Data Scraping
Google Business Reviews Extraction
Best Google Search Results Data Scraping Services in USA
Dallas, Portland, Los Angeles, Virginia Beach, Fort Wichita, Nashville, Long Beach, Raleigh, Boston, Austin, San Antonio, Philadelphia, Indianapolis, Orlando, San Diego, Houston, Worth, Jacksonville, New Orleans, Columbus, Kansas City, Sacramento, San Francisco, Omaha, Honolulu, Washington, Colorado, Chicago, Arlington, Denver, El Paso, Miami, Louisville, Albuquerque, Tulsa, Springs, Bakersfield, Milwaukee, Memphis, Oklahoma City, Atlanta, Seattle, Las Vegas, San Jose, Tucson and New York.
Conclusion
In today’s data-driven world, having access to detailed and accurate information from Google search results can give your business a significant edge. DataScrapingServices.com offers professional Google Search Results Data Scraping services designed to meet your unique needs. Whether you’re looking to enhance your SEO strategy, conduct market research, or gain competitive intelligence, our services provide the comprehensive data you need to succeed. Contact us at [email protected] today to learn how our data scraping solutions can transform your business strategy and drive growth.
Website: Datascrapingservices.com
#Google Search Results Data Scraping#Harness the Power of Information with Google Search Results Data Scraping Services by DataScrapingServices.com. In the digital age#information is king. For businesses#researchers#and marketing professionals#the ability to access and analyze data from Google search results can be a game-changer. However#manually sifting through search results to gather relevant data is not only time-consuming but also inefficient. DataScrapingServices.com o#enabling you to efficiently extract valuable information and transform it into actionable insights.#The vast amount of information available through Google search results can provide invaluable insights into market trends#competitor activities#customer behavior#and more. Whether you need data for SEO analysis#market research#or competitive intelligence#DataScrapingServices.com offers comprehensive data scraping services tailored to meet your specific needs. Our advanced scraping technology#helping you stay ahead in your industry.#List of Data Fields#Our Google Search Results Data Scraping services can extract a wide range of data fields#ensuring you have all the information you need:#-Business Name: The name of the business or entity featured in the search result.#- URL: The web address of the search result.#- Website: The primary website of the business or entity.#- Phone Number: Contact phone number of the business.#- Email Address: Contact email address of the business.#- Physical Address: The street address#city#state#and ZIP code of the business.#- Business Hours: Business operating hours#- Ratings and Reviews: Customer ratings and reviews for the business.
0 notes
3idatascraping · 11 months ago
Text
Tumblr media
Discover how search engine scraping, specifically Google search results data scraping, can provide valuable insights for SEO, market research, and competitive analysis. Learn the techniques and tools to extract real-time data from Google efficiently while navigating legal and ethical considerations to boost your digital strategy.
0 notes
probablyasocialecologist · 1 year ago
Text
Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user generated content exclusive to the internet’s already dominant search engine. If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn’t rely on Google’s indexing and search Reddit by using “site:reddit.com,” you will not see any results from the last week. DuckDuckGo is currently turning up seven links when searching Reddit, but provides no data on where the links go or why, instead only saying that “We would like to show you a description here but the site won't allow us.” Older results will still show up, but these search engines are no longer able to “crawl” Reddit, meaning that Google is the only search engine that will turn up results from Reddit going forward. Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google. The news shows how Google’s near monopoly on search is now actively hindering other companies’ ability to compete at a time when Google is facing increasing criticism over the quality of its search results. And while neither Reddit or Google responded to a request for comment, it appears that the exclusion of other search engines is the result of a multi-million dollar deal that gives Google the right to scrape Reddit for data to train its AI products.
July 24 2024
2K notes · View notes
adz · 1 year ago
Text
Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user generated content exclusive to the internet’s already dominant search engine. "...while neither Reddit or Google responded to a request for comment, it appears that the exclusion of other search engines is the result of a multi-million dollar deal that gives Google the right to scrape Reddit for data to train its AI products."
829 notes · View notes
mariacallous · 1 month ago
Text
When tech companies first rolled out generative-AI products, some critics immediately feared a media collapse. Every bit of writing, imagery, and video became suspect. But for news publishers and journalists, another calamity was on the horizon.
Chatbots have proved adept at keeping users locked into conversations. They do so by answering every question, often through summarizing articles from news publishers. Suddenly, fewer people are traveling outside the generative-AI sites—a development that poses an existential threat to the media, and to the livelihood of journalists everywhere.
According to one comprehensive study, Google’s AI Overviews—a feature that summarizes web pages above the site’s usual search results—has already reduced traffic to outside websites by more than 34 percent. The CEO of DotDash Meredith, which publishes People, Better Homes & Gardens, and Food & Wine, recently said the company is preparing for a possible “Google Zero” scenario. Some have speculated that traffic drops resulting from chatbots were part of the reason outlets such as Business Insider and the Daily Dot have recently had layoffs. “Business Insider was built for an internet that doesn’t exist anymore,” one former staffer recently told the media reporter Oliver Darcy.
Not all publishers are at equal risk: Those that primarily rely on general-interest readers who come in from search engines and social media may be in worse shape than specialized publishers with dedicated subscribers. Yet no one is totally safe. Released in May 2024, AI Overviews joins ChatGPT, Claude, Grok, Perplexity, and other AI-powered products that, combined, have replaced search for more than 25 percent of Americans, according to one study. Companies train chatbots on huge amounts of stolen books and articles, as my previous reporting has shown, and scrape news articles to generate responses with up-to-date information. Large language models also train on copious materials in the public domain—but much of what is most useful to these models, particularly as users seek real-time information from chatbots, is news that exists behind a paywall. Publishers are creating the value, but AI companies are intercepting their audiences, subscription fees, and ad revenue.
I asked Anthropic, xAI, Perplexity, Google, and OpenAI about this problem. Anthropic and xAI did not respond. Perplexity did not directly comment on the issue. Google argued that it was sending “higher-quality” traffic to publisher websites, meaning that users purportedly spend more time on the sites once they click over, but declined to offer any data in support of this claim. OpenAI referred me to an article showing that ChatGPT is sending more traffic to websites overall than it did previously, but the raw numbers are fairly modest. The BBC, for example, reportedly received 118,000 visits from ChatGPT in April, but that’s practically nothing relative to the hundreds of millions of visitors it receives each month. The article also shows that traffic from ChatGPT has in fact declined for some publishers.
Over the past few months, I’ve spoken with several news publishers, all of whom see AI as a near-term existential threat to their business. Rich Caccappolo, the vice chair of media at the company that publishes the Daily Mail—the U.K.’s largest newspaper by circulation—told me that all publishers “can see that Overviews are going to unravel the traffic that they get from search, undermining a key foundational pillar of the digital-revenue model.” AI companies have claimed that chatbots will continue to send readers to news publishers, but have not cited evidence to support this claim. I asked Caccappolo if he thought AI-generated answers could put his company out of business. “That is absolutely the fear,” he told me. “And my concern is it’s not going to happen in three or five years—I joke it’s going to happen next Tuesday.”
Book publishers, especially those of nonfiction and textbooks, also told me they anticipate a massive decrease in sales, as chatbots can both summarize their books and give detailed explanations of their contents. Publishers have tried to fight back, but my conversations revealed how much the deck is stacked against them. The world is changing fast, perhaps irrevocably. The institutions that comprise our country’s free press are fighting for their survival.
Publishers have been responding in two ways. First: legal action. At least 12 lawsuits involving more than 20 publishers have been filed against AI companies. Their outcomes are far from certain, and the cases might be decided only after irreparable damage has been done.
The second response is to make deals with AI companies, allowing their products to summarize articles or train on editorial content. Some publishers, such as The Atlantic, are pursuing both strategies (the company has a corporate partnership with OpenAI and is suing Cohere). At least 72 licensing deals have been made between publishers and AI companies in the past two years. But figuring out how to approach these deals is no easy task. Caccappolo told me he has “felt a tremendous imbalance at the negotiating table”—a sentiment shared by others I spoke with. One problem is that there is no standard price for training an LLM on a book or an article. The AI companies know what kinds of content they want, and having already demonstrated an ability and a willingness to take it without paying, they have extraordinary leverage when it comes to negotiating. I’ve learned that books have sometimes been licensed for only a couple hundred dollars each, and that a publisher that asks too much may be turned down, only for tech companies to take their material anyway.
Another issue is that different content appears to have different value for different LLMs. The digital-media company Ziff Davis has studied web-based AI training data sets and observed that content from “high-authority” sources, such as major newspapers and magazines, appears more desirable to AI companies than blog and social-media posts. (Ziff Davis is suing OpenAI for training on its articles without paying a licensing fee.) Researchers at Microsoft have also written publicly about “the importance of high-quality data” and have suggested that textbook-style content may be particularly desirable.
But beyond a few specific studies like these, there is little insight into what kind of content most improves an LLM, leaving a lot of unanswered questions. Are biographies more or less important than histories? Does high-quality fiction matter? Are old books worth anything? Amy Brand, the director and publisher of the MIT Press, told me that “a solution that promises to help determine the fair value of specific human-authored content within the active marketplace for LLM training data would be hugely beneficial.”
A publisher’s negotiating power is also limited by the degree to which it can stop an AI company from using its work without consent. There’s no surefire way to keep AI companies from scraping news websites; even the Robots Exclusion Protocol, the standard opt-out method available to news publishers, is easily circumvented. Because AI companies generally keep their training data a secret, and because there is no easy way for publishers to check which chatbots are summarizing their articles, publishers have difficulty figuring out which AI companies they might sue or try to strike a deal with. Some experts, such as Tim O’Reilly, have suggested that laws should require the disclosure of copyrighted training data, but no existing legislation requires companies to reveal specific authors or publishers that have been used for AI training material.
Of course, all of this raises a question. AI companies seem to have taken publishers’ content already. Why would they pay for it now, especially because some of these companies have argued in court that training LLMs on copyrighted books and articles is fair use?
Perhaps the deals are simply hedges against an unfavorable ruling in court. If AI companies are prevented from training on copyrighted work for free, then organizations that have existing deals with publishers might be ahead of their competition. Publisher deals are also a means of settling without litigation—which may be a more desirable path for publishers who are risk-averse or otherwise uncertain. But the legal scholar James Grimmelmann told me that AI companies could also respond to complaints like Ziff Davis’s by arguing that the deals involve more than training on a publisher’s content: They may also include access to cleaner versions of articles, ongoing access to a daily or real-time feed, or a release from liability for their chatbot’s plagiarism. Tech companies could argue that the money exchanged in these deals is exclusively for the nonlicensing elements, so they aren’t paying for training material. It’s worth noting that tech companies almost always refer to these deals as partnerships, not licensing deals, likely for this reason.
Regardless, the modest income from these arrangements is not going to save publishers: Even a good deal, one publisher told me, won’t come anywhere near recouping the revenue lost from decreased readership. Publishers that can figure out how to survive the generative-AI assault may need to invent different business models and find new streams of revenue. There may be viable strategies, but none of the publishers I spoke with has a clear idea of what they are.
Publishers have become accustomed to technological threats over the past two decades, perhaps most notably the loss of ad revenue to Facebook and Google, a company that was recently found to have an illegal monopoly in online advertising (though the company has said it will appeal the ruling). But the rise of generative AI may spell doom for the Fourth Estate: With AI, the tech industry even deprives publishers of an audience.
In the event of publisher mass extinction, some journalists will be able to endure. The so-called creator economy shows that it’s possible to provide high-quality news and information through Substack, YouTube, and even TikTok. But not all reporters can simply move to these platforms. Investigative journalism that exposes corruption and malfeasance by powerful people and companies comes with a serious risk of legal repercussions, and requires resources—such as time and money—that tend to be in short supply for freelancers.
If news publishers start going out of business, won’t AI companies suffer too? Their chatbots need access to journalism to answer questions about the world. Doesn’t the tech industry have an interest in the survival of newspapers and magazines?
In fact, there are signs that AI companies believe publishers are no longer needed. In December, at The New York Times’ DealBook Summit, OpenAI CEO Sam Altman was asked how writers should feel about their work being used for AI training. “I think we do need a new deal, standard, protocol, whatever you want to call it, for how creators are going to get rewarded.” He described an “opt-in” regime where an author could receive “micropayments” when their name, likeness, and style were used. But this could not be further from OpenAI’s current practice, in which products are already being used to imitate the styles of artists and writers, without compensation or even an effective opt-out.
Google CEO Sundar Pichai was also asked about writer compensation at the DealBook Summit. He suggested that a market solution would emerge, possibly one that wouldn’t involve publishers in the long run. This is typical. As in other industries they’ve “disrupted,” Silicon Valley moguls seem to perceive old, established institutions as middlemen to be removed for greater efficiency. Uber enticed drivers to work for it, crushed the traditional taxi industry, and now controls salaries, benefits, and workloads algorithmically. This has meant greater convenience for consumers, just as AI arguably does—but it has also proved ruinous for many people who were once able to earn a living wage from professional driving. Pichai seemed to envision a future that may have a similar consequence for journalists. “There’ll be a marketplace in the future, I think—there’ll be creators who will create for AI,” he said. “People will figure it out.”
20 notes · View notes
xannador · 1 year ago
Note
Have you considered going to Pillowfort?
Long answer down below:
I have been to the Sheezys, the Buzzlys, the Mastodons, etc. These platforms all saw a surge of new activity whenever big sites did something unpopular. But they always quickly died because of mismanagement or users going back to their old haunts due to lack of activity or digital Stockholm syndrome.
From what I have personally seen, a website that was purely created as an alternative to another has little chance of taking off. It it's going to work, it needs to be developed naturally and must fill a different niche. I mean look at Zuckerberg's Threads; died as fast as it blew up. Will Pillowford be any different?
The only alternative that I found with potential was the fediverse (mastodon) because of its decentralized nature. So people could make their own rules. If Jack Dorsey's new dating app Bluesky gets integrated into this system, it might have a chance. Although decentralized communities will be faced with unique challenges of their own (egos being one of the biggest, I think).
Trying to build a new platform right now might be a waste of time anyway because AI is going to completely reshape the Internet as we know it. This new technology is going to send shockwaves across the world akin to those caused by the invention of the Internet itself over 40 years ago. I'm sure most people here are aware of the damage it is doing to artists and writers. You have also likely seen the other insidious applications. Social media is being bombarded with a flood of fake war footage/other AI-generated disinformation. If you posted a video of your own voice online, criminals can feed it into an AI to replicate it and contact your bank in an attempt to get your financial info. You can make anyone who has recorded themselves say and do whatever you want. Children are using AI to make revenge porn of their classmates as a new form of bullying. Politicians are saying things they never said in their lives. Google searches are being poisoned by people who use AI to data scrape news sites to generate nonsensical articles and clickbait. Soon video evidence will no longer be used in court because we won't be able to tell real footage from deep fakes.
50% of the Internet's traffic is now bots. In some cases, websites and forums have been reduced to nothing more than different chatbots talking to each other, with no humans in sight.
I don't think we have to count on government intervention to solve this problem. The Western world could ban all AI tomorrow and other countries that are under no obligation to follow our laws or just don't care would continue to use it to poison the Internet. Pandora's box is open, and there's no closing it now.
Yet I cannot stand an Internet where I post a drawing or comic and the only interactions I get are from bots that are so convincing that I won't be able to tell the difference between them and real people anymore. When all that remains of art platforms are waterfalls of AI sludge where my work is drowned out by a virtually infinite amount of pictures that are generated in a fraction of a second. While I had to spend +40 hours for a visually inferior result.
If that is what I can expect to look forward to, I might as well delete what remains of my Internet presence today. I don't know what to do and I don't know where to go. This is a depressing post. I wish, after the countless hours I spent looking into this problem, I would be able to offer a solution.
All I know for sure is that artists should not remain on "Art/Creative" platforms that deliberately steal their work to feed it to their own AI or sell their data to companies that will. I left Artstation and DeviantArt for those reasons and I want to do the same with Tumblr. It's one thing when social media like Xitter, Tik Tok or Instagram do it, because I expect nothing less from the filth that runs those. But creative platforms have the obligation to, if not protect, at least not sell out their users.
But good luck convincing the entire collective of Tumblr, Artstation, and DeviantArt to leave. Especially when there is no good alternative. The Internet has never been more centralized into a handful of platforms, yet also never been more lonely and scattered. I miss the sense of community we artists used to have.
The truth is that there is nowhere left to run. Because everywhere is the same. You can try using Glaze or Nightshade to protect your work. But I don't know if I trust either of them. I don't trust anything that offers solutions that are 'too good to be true'. And even if take those preemptive measures, what is to stop the tech bros from updating their scrapers to work around Glaze and steal your work anyway? I will admit I don't entirely understand how the technology works so I don't know if this is a legitimate concern. But I'm just wondering if this is going to become some kind of digital arms race between tech bros and artists? Because that is a battle where the artists lose.
29 notes · View notes
asleepinawell · 1 year ago
Text
"we'll all have flying cars in the future" bro we cannot even do a web search anymore
here's a chunk of it since it's subscribe walled
"If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn’t rely on Google’s indexing and search Reddit by using “site:reddit.com,” you will not see any results from the last week. DuckDuckGo is currently turning up seven links when searching Reddit, but provides no data on where the links go or why, instead only saying that “We would like to show you a description here but the site won't allow us.” Older results will still show up, but these search engines are no longer able to “crawl” Reddit, meaning that Google is the only search engine that will turn up results from Reddit going forward. Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google.
The news shows how Google’s near monopoly on search is now actively hindering other companies’ ability to compete at a time when Google is facing increasing criticism over the quality of its search results. This exclusion of other search engines also comes after Reddit locked down access to its site to stop companies from scraping it for AI training data, which at the moment only Google can do as a result of a multi-million dollar deal that gives Google the right to scrape Reddit for data to train its AI products.
“They’re [Reddit] killing everything for search but Google,” Colin Hayhurst, CEO of the search engine Mojeek told me on a call.
Hayhurst tried contacting Reddit via email when Mojeek noticed it was blocked from crawling the site in early June, but said he has not heard back."
13 notes · View notes
punk-pins · 10 months ago
Text
fundamentally you need to understand that the internet-scraping text generative AI (like ChatGPT) is not the point of the AI tech boom. the only way people are making money off that is through making nonsense articles that have great search engine optimization. essentially they make a webpage that’s worded perfectly to show up as the top result on google, which generates clicks, which generates ads. text generative ai is basically a machine that creates a host page for ad space right now.
and yeah, that sucks. but I don’t think the commercialized internet is ever going away, so here we are. tbh, I think finding information on the internet, in books, or through anything is a skill that requires critical thinking and cross checking your sources. people printed bullshit in books before the internet was ever invented. misinformation is never going away. I don’t think text generative AI is going to really change the landscape that much on misinformation because people are becoming educated about it. the text generative AI isn’t a genius supercomputer, but rather a time-saving tool to get a head start on identifying key points of information to further research.
anyway. the point of the AI tech boom is leveraging big data to improve customer relationship management (CRM) to streamline manufacturing. businesses collect a ridiculous amount of data from your internet browsing and purchases, but much of that data is stored in different places with different access points. where you make money with AI isn’t in the Wild West internet, it’s in a structured environment where you know the data its scraping is accurate. companies like nvidia are getting huge because along with the new computer chips, they sell a customizable ecosystem along with it.
so let’s say you spent 10 minutes browsing a clothing retailer’s website. you navigated directly to the clothing > pants tab and filtered for black pants only. you added one pair of pants to your cart, and then spent your last minute or two browsing some shirts. you check out with just the pants, spending $40. you select standard shipping.
with AI for CRM, that company can SIGNIFICANTLY more easily analyze information about that sale. maybe the website developers see the time you spent on the site, but only the warehouse knows your shipping preferences, and sales audit knows the amount you spent, but they can’t see what color pants you bought. whereas a person would have to connect a HUGE amount of data to compile EVERY customer’s preferences across all of these things, AI can do it easily.
this allows the company to make better broad decisions, like what clothing lines to renew, in which colors, and in what quantities. but it ALSO allows them to better customize their advertising directly to you. through your browsing, they can use AI to fill a pre-made template with products you specifically may be interested in, and email it directly to you. the money is in cutting waste through better manufacturing decisions, CRM on an individual and large advertising scale, and reducing the need for human labor to collect all this information manually.
(also, AI is great for developing new computer code. where a developer would have to trawl for hours on GitHUB to find some sample code to mess with to try to solve a problem, the AI can spit out 10 possible solutions to play around with. thats big, but not the point right now.)
so I think it’s concerning how many people are sooo focused on ChatGPT as the face of AI when it’s the least profitable thing out there rn. there is money in the CRM and the manufacturing and reduced labor. corporations WILL develop the technology for those profits. frankly I think the bigger concern is how AI will affect big data in a government ecosystem. internet surveillance is real in the sense that everything you do on the internet is stored in little bits of information across a million different places. AI will significantly impact the government’s ability to scrape and compile information across the internet without having to slog through mountains of junk data.
9 notes · View notes
kintatsujo · 1 year ago
Text
for the record I'm not going to stop posting art here just because tunglr MIGHT start selling training data. Google has been scraping search results for AI for years and I have a massive deviantart archive I can't take down, that cat's down the street
I will if it turns out they DO though, like fucking dude I signed up to share my shit and fuck around I did not sign up to train someone else's program without compensation
15 notes · View notes
initiumseries · 1 year ago
Note
When you say no to ai, does that include me using Chatgtp for my assignments unethical
Yes. All AI requires the theft of copyrighted and private material from everyday people, to artists and authors and hobbyists. That includes the scraping of fanfiction on this site and others. Giving chatgpt one prompt uses up 13 ml of water, one of our most crucial finite resources. Chatgpt and others of its kind use up 10x more power than a normal google search. It also means that by outsourcing your work to a bot, you are depriving yourself of the learning you're paying for, and its making all of us stupider as a result. There is literally no up side or longevity for AI. It is unethical, and is destroying our environment in the middle of a climate crisis and everyone who uses it should not only stop, but feel ashamed for participating in the theft of people's work.
6 notes · View notes
nuadox · 1 year ago
Text
Astrophysicists may have solved the mystery of Uranus’s unusual radiation belts
Tumblr media
- By Nuadox Crew -
The weak radiation belts around Uranus, observed by Voyager 2 nearly 50 years ago, may actually be due to changes in particle speed caused by the planet’s asymmetric magnetic field.
Uranus's magnetic field is tilted 60° from its spin axis, creating an unusual magnetic environment. 
Researchers, including Acevski et al., used new modeling incorporating a quadrupole field to simulate this asymmetry and found that particle speeds vary in different parts of their orbits.
This variation spreads particles out, decreasing their density by up to 20%, which could explain Voyager 2's observations. 
Although this does not completely account for the weaker radiation belts, it offers insights into Uranus's magnetic anomalies. NASA’s proposed mission to the ice giants may provide further data to understand these mechanisms.
Header image: The planet Uranus, depicted in this James Webb Space Telescope image, features a tilted magnetic field and unusual radiation belts. Future missions to this icy giant may uncover more details. Credit: NASA, ESA, CSA, STScI.
Read more at Eos
Scientific paper: M. Acevski et al, Asymmetry in Uranus' High Energy Proton Radiation Belt, Geophysical Research Letters (2024). DOI: 10.1029/2024GL108961
--
Other recent news
Google is investigating claims of AI-generated content scraping, which affects search result rankings.
Amazon is reviewing whether Perplexity AI improperly scraped online content.
A bone analysis provides new insights into the Denisovans, an ancient human species, and their survival in extreme environments.
6 notes · View notes
pomegrnteseed · 1 year ago
Text
artificial intelligence is not whimsical magic, it's theft
AI is to art and creativity what the Dementor's Kiss is to wix: extraction of the soul
Artificial intelligence technologies work like this:
Developer creates an algorithm that's really good at searching for patterns and following commands
Developer creates a training dataset for the technology to begin identifying patterns - this dataset is HUGE, so big that every individual datapoint (word/phrase/image etc) cannot be checked for error or problem
Developer releases AI platform
User asks the platform for a result, giving some specific parameters, often by inputting example data (e.g. images)
The algorithms run, searching through the databank for strong matches in pattern recognition, piecing together what it has learned so far to create a seemingly novel response
The result is presented to the user as "new" "generated" content, but it's just an amalgamation of existing works and words that is persuasively "human-like" (because the result has been harvested from humans' hard word!)
The training dataset that the developers feed the tool oftentimes amount to theft.
Developers are increasingly being found to scrape the internet, or even licensed art or published books - despite copyright licensing! - to train the machine.
AI does not make something out of nothing (a bit like whichever magical Law it is, Gamp's maybe? idk charms were never my main focus in HP lore). AI pulls from the resources it has been given - the STOLEN WORDS AND IMAGES - and mashes them together in ways that meet the request given by the user. It looks whimsical, but it's actually incredibly problematic.
Unregulated as they are now, AI technologies are stealing the creative ideas, the hearts and souls of art in all forms, and reducing it to pattern recognition.
On top of that, the training datasets that the technologies are given initially are often incredibly biased, leading to them replicating racist, misogynistic, and otherwise oppressive stereotypes in their results. We've already seen the "pale male" bias uncovered in the research by Dr Timnit Gebru and her colleagues. Dr Gebru has also been vocal about the ethical implications of AI in terms of the ecological costs of these softwares. This brilliant article by MIT Technology Review breaks down Dr Gebru's paper that saw her fired from Google, the main arguments of which are:
the ecological and financial costs are unsustainable
the training datasets are too large and so cannot be properly regulated for biases
research opportunity costs (AI looks impressive, but it doesn't actually understand language, so it can be misleading/misdirecting for researchers)
AI models can be convincing, but this can lead to overreliance/too much trust in their accuracy and validity
So, artificial intelligence technologies are embroiled in numerous ethical issues that are far from resolved, even beyond the very real, very important, very concerning issues of plaigarism.
In fandom terms, this comes to be even more problematic when chat bots are created to talk with characters, like the recently discussed High Reeve Draco Malfoy chatbot that has some Facebook Groups in a flurry.
Transformative fiction is tricky in terms of what is ethical/fair transformation of transformative works. I will argue, though that those hemming and hawings are moot since Sen removed Manacled from ao3 because she is creating an original fiction story for publication after securing a book deal (which is awesome and I'm very excited to support them in that!).
Moreover, the ethical problems redouble when we take into consideration that feeding Manacled to an artificial intelligence chatbot technology means that reproductions and repackagings of Sen's work is out of their hands entirely. That data cannot be recovered, it will never be erased from the machine. And so when others use the machine, the possible word combinations, particular phrasings, etc will all be input for analysis, reforming and reproduction for other users.
I don't think people understand the gravity of the situation around data control (or, more specifically, the lack of control we have of the data we input into these technologies). Those words are no longer our own the second we type them into the text box on "generative" AI platforms. We cannot get those ideas or words back to call our own. We cannot guarantee that someone else won't use the platform to write something and then use it elsewhere, claiming it's their own when it is in fact ours.
There are serious implications and fundamental (somewhat philosophical, but also very real and extremely urgent) questions about ownership of art in this digital age, the heart of creativity, and what constitutes original work with these technologies being used to assist idea creation or even entire image/text generation.
TLDR - stop using artificial technologies to engage with fandom. use the endless creative palaces of your minds and take up roleplaying with your pals to explore real-time interactions (roleplay in fandom is a legit thing, there are plenty of fandoms that do RP; this is your chance to do the same for the niche dhr fandoms you're invested in).
Signed, a very tired digital technologies scholar who would like you all to engage critically with digital data privacy, protection, and ethics, please.
3 notes · View notes
badaxefamily · 2 years ago
Text
Has anyone come up with a way to filter out AI generated websites in searches? Every time I search anything, half the results are those stupid AI generated sites that have scraped data from everywhere so they're full of errors and irrelevant data.
One time I forgot what temperature to reheat corndogs at so I googled it and got an AI site that said "corndogs are a breed of dog frequently used as a tasty snack" or something like that. Then it said you could tell if a corndog was fully cooked by checking that the ears are floppy.
5 notes · View notes
mariacallous · 1 year ago
Text
While the finer points of running a social media business can be debated, one basic truth is that they all run on attention. Tech leaders are incentivized to grow their user bases so there are more people looking at more ads for more time. It’s just good business.
As the owner of Twitter, Elon Musk presumably shared that goal. But he claimed he hadn’t bought Twitter to make money. This freed him up to focus on other passions: stopping rival tech companies from scraping Twit­ter’s data without permission—even if it meant losing eyeballs on ads.
Data-scraping was a known problem at Twitter. “Scraping was the open secret of Twitter data access. We knew about it. It was fine,” Yoel Roth wrote on the Twitter ­alternative Bluesky. AI firms in particular were no­torious for gobbling up huge swaths of text to train large language models. Now that those firms were worth a lot of money, the situation was far from fine, in Musk’s opinion.
In November 2022, OpenAI debuted ChatGPT, a chatbot that could generate convincingly human text. By January 2023, the app had over 100 million users, making it the fastest ­growing consumer app of all time. Three months later, OpenAI secured another round of funding that closed at an astounding valuation of $29 billion, more than Twitter was worth, by Musk’s estimation.
OpenAI was a sore subject for Musk, who’d been one of the original founders and a major donor before stepping down in 2018 over disagree­ments with the other founders. After ChatGPT launched, Musk made no secret of the fact that he disagreed with the guardrails that OpenAI put on the chatbot to stop it from relaying dangerous or insensitive infor­mation. “The danger of training AI to be woke—in other words, lie—is deadly,” Musk said on December 16, 2022. He was toying with starting a competitor.
Near the end of June 2023, Musk launched a two-part offensive to stop data scrapers, first directing Twitter employees to temporarily block “logged out view.” The change would mean that only people with Twitter accounts could view tweets.
“Logged out view” had a complicated history at Twitter. It was rumored to have played a part in the Arab Spring, allowing dissidents to view tweets without having to create a Twitter account and risk compromising their anonymity. But it was also an easy access point for people who wanted to scrape Twitter data.
Once Twitter made the change, Google was temporarily blocked from crawling Twitter and serving up relevant tweets in search results—a move that could negatively impact Twitter’s traffic. “We’re aware that our ability to crawl Twitter.com has been limited, affecting our ability to display tweets and pages from the site in search results,” Google spokesperson Lara Levin told The Verge. “Websites have control over whether crawlers can access their content.” As engineers discussed possible workarounds on Slack, one wrote: “Surely this was expected when that decision was made?”
Then engineers detected an “explosion of logged in requests,” according to internal Slack messages, indicating that data scrapers had simply logged in to Twitter to continue scraping. Musk ordered the change to be reversed.
On July 1, 2023, Musk launched part two of the offensive. Suddenly, if a user scrolled for just a few minutes, an error message popped up. “Sorry, you are rate limited,” the message read. “Please wait a few moments then try again.”
Rate limiting is a strategy that tech companies use to constrain net­work traffic by putting a cap on the number of times a user can perform a specific action within a given time frame (a mouthful, I know). It’s often used to stop bad actors from trying to hack into people’s accounts. If a user tries an incorrect password too many times, they see an error mes­sage and are told to come back later. The cost of doing this to someone who has forgotten their password is low (most people stay logged in), while the benefit to users is very high (it prevents many people’s accounts from getting compromised).
Except, that wasn’t what Musk had done. The rate limit that he ordered Twitter to roll out on July 1 was an API limit, meaning Twitter had capped the number of times users could refresh Twitter to look for new tweets and see ads. Rather than constrain users from performing a specific ac­tion, Twitter had limited all user actions. “I realize these are draconian rules,” a Twitter engineer wrote on Slack. “They are temporary. We will reevaluate the situation tomorrow.”
At first, Blue subscribers could see 6,000 posts a day, while nonsubscribers could see 600 (enough for just a few minutes of scroll­ing), and new nonsubscriber accounts could see just 300. As people started hitting the limits, #TwitterDown started trending on, well, Twitter. “This sucks dude you gotta 10X each of these numbers,” wrote user @tszzl.
The impact quickly became obvious. Companies that used Twitter di­rect messages as a customer service tool were unable to communicate with clients. Major creators were blocked from promoting tweets, putting Musk’s wish to stop data scrapers at odds with his initiative to make Twit­ter more creator­ friendly. And Twitter’s own trust and safety team was suddenly stopped from seeing violative tweets.
Engineers posted frantic updates in Slack. “FYI some large creators com­plaining because rate limit affecting paid subscription posts,” one said.
Christopher Stanley, the head of information security, wrote with dis­may that rate limits could apply to people refreshing the app to get news about a mass shooting or a major weather event. “The idea here is to stop scrapers, not prevent people from obtaining safety information,” he wrote. Twitter soon raised the limits to 10,000 (for Blue subscribers), 1,000 (for nonsubscribers), and 500 (for new nonsubscrib­ers). Now, 13 percent of all unverified users were hitting the rate limit.
Users were outraged. If Musk wanted to stop scrapers, surely there were better ways than just cutting off access to the service for everyone on Twitter.
“Musk has destroyed Twitter’s value & worth,” wrote attorney Mark S. Zaid. “Hubris + no pushback—customer empathy—data = a great way to light billions on fire,” wrote former Twitter product manager Esther Crawford, her loyalties finally reversed.
Musk retweeted a joke from a parody account: “The reason I set a ‘View Limit’ is because we are all Twitter addicts and need to go outside.”
Aside from Musk, the one person who seemed genuinely excited about the changes was Evan Jones, a product manager on Twitter Blue. For months, he’d been sending executives updates regarding the anemic sign­up rates. Now, Blue subscriptions were skyrocketing. In May, Twitter had 535,000 Blue subscribers. At $8 per month, this was about $4.2 million a month in subscription revenue. By early July, there were 829,391 subscribers—a jump of about $2.4 million in revenue, not accounting for App Store fees.
“Blue signups still cookin,” he wrote on Slack above a screenshot of the sign­up dashboard.
Jones’s team capitalized on the moment, rolling out a prompt to upsell users who’d hit the rate limit and encouraging them to subscribe to Twit­ter Blue. In July, this prompt drove 1.7 percent of the Blue subscriptions from accounts that were more than 30 days old and 17 percent of the Blue subscriptions from accounts that were less than 30 days old.
Twitter CEO Linda Yaccarino was notably absent from the conversation until July 4, when she shared a Twitter blog post addressing the rate limiting fiasco, perhaps deliberately burying the news on a national holiday.
“To ensure the authenticity of our user base we must take extreme measures to remove spam and bots from our platform,” it read. “That’s why we temporarily limited usage so we could detect and eliminate bots and other bad actors that are harming the platform. Any advance notice on these actions would have allowed bad actors to alter their behavior to evade detection.” The company also claimed the “effects on advertising have been minimal.”
If Yaccarino’s role was to cover for Musk’s antics, she was doing an ex­cellent job. Twitter rolled back the limits shortly after her announcement. On July 12, Musk debuted a generative AI company called xAI, which he promised would develop a language model that wouldn’t be politically correct. “I think our AI can give answers that people may find controver­sial even though they are actually true,” he said on Twitter Spaces.
Unlike the rival AI firms he was trying to block, Musk said xAI would likely train on Twitter’s data.
“The goal of xAI is to understand the true nature of the universe,” the company said grandly in its mission statement, echoing Musk’s first, di­sastrous town hall at Twitter. “We will share more information over the next couple of weeks and months.”
In November 2023, xAI launched a chatbot called Grok that lacked the guardrails of tools like ChatGPT. Musk hyped the release by posting a screenshot of the chatbot giving him a recipe for cocaine. The company didn’t appear close to understanding the nature of the universe, but per­ haps that’s coming.
Excerpt adapted from Extremely Hardcore: Inside Elon Musk’s Twitter by Zoë Schiffer. Published by arrangement with Portfolio Books, a division of Penguin Random House LLC. Copyright © 2024 by Zoë Schiffer.
20 notes · View notes
thehenrythomas · 2 years ago
Text
Learn about negative SEO tactics and how to protect your website from malicious actions
In today’s highly competitive online landscape, businesses and website owners face not only the challenge of optimizing their websites for search engines but also the threat of negative SEO tactics. Negative SEO refers to the practice of using unethical and malicious strategies to harm a competitor’s website’s search engine rankings and online reputation. This dark side of search engine optimization can lead to devastating consequences for innocent website owners.
In this article, we will explore various negative SEO tactics and provide valuable insights on how to safeguard your website from such attacks.
Link Spamming and Manipulation
One of the most common negative SEO tactics is the mass creation of low-quality, spammy backlinks pointing to a targeted website. These malicious backlinks can lead search engines to believe that the website is engaging in link schemes, resulting in penalties and ranking drops. Website owners must regularly monitor their backlink profiles to identify and disavow any toxic links.
Content Scraping and Duplication
Content scraping involves copying content from a target website and republishing it on multiple other sites without permission. This can lead to duplicate content issues, harming the original website’s search rankings. Regularly monitoring your content for plagiarism and submitting DMCA takedown requests can help address this problem.
Fake Negative Reviews
Negative SEO attackers may leave fake negative reviews on review sites and business directories to damage a website’s reputation. Monitoring and responding to reviews promptly can help mitigate the impact of such attacks.
Distributed Denial of Service (DDoS) Attacks
DDoS attacks overload a website’s server with an excessive amount of traffic, causing it to become slow or crash. Implementing DDoS protection services can help safeguard your website against such attacks.
Regularly Monitor Backlinks
Use tools like Google Search Console and third-party SEO software to monitor your website’s backlink profile. Regularly review and disavow toxic links to prevent negative SEO attacks based on link spamming.
Secure Your Website
Ensure your website is secure with HTTPS encryption and robust security measures. This will help protect your website from hacking attempts and potential negative SEO attacks like content manipulation.
Frequently Check for Duplicate Content
Use plagiarism checker tools to identify if your content has been copied elsewhere. If you find duplicate content, reach out to the website owners to request removal or use the Google DMCA process.
Implement Review Monitoring
Keep an eye on reviews and mentions of your brand across various platforms. Respond professionally to negative reviews and report fake reviews to the respective platforms for removal.
Optimize Website Performance
A fast-loading website can better withstand DDoS attacks. Optimize your website’s performance by compressing images, using caching, and leveraging Content Delivery Networks (CDNs).
Regularly Backup Your Website
Frequent website backups will ensure that even if an attack occurs, you can quickly restore your website to its previous state without losing valuable data.
Use Webmaster Tools and Analytics
Stay vigilant by setting up alerts in Google Webmaster Tools and Google Analytics. These alerts can notify you of sudden drops in website traffic or other suspicious activities.
Conclusion
As the digital landscape continues to evolve, negative SEO tactics remain a persistent threat. Understanding these malicious strategies and proactively taking steps to protect your website is crucial for every website owner.
Discover countermeasures against negative SEO tactics, safeguarding your site from harm. Shield your website with insights from an experienced SEO company in Chandigarh for robust defense strategies.
2 notes · View notes
sky-chau · 1 month ago
Text
Tumblr media
Most AI data collection is done by page crawlers, not people and to the extent that AO3 is being targeted deliberately we still don't know. Basically it's a bot that types in a search term into Google and clicks on every link it can scraping content as it travels from link to link. Taking yourself out of Google search results is often enough to lower your odds of getting your work fed to the more popular ai models like chat gpt. I can assure you most of these page crawlers likely don't have accounts because most websites allow their content to be seen by people who aren't logged in.
So like yes, hypothetically someone could give a page crawler an ao3 account and deliberately attack ao3, but the big names in the ai space that the average person uses to generate text have such a large scale operation that it's unlikely they'll log in anywhere.
Just because we can't stop every scraper, doesn't mean we should give up. We can put up a fight and make it as hard as we can.
You should get an AO3 account
With the rise of AI and the well known epidemic of AI companies scraping Ao3 for training data most authors on Ao3 have locked down their fics to logged in users only. This is unfortunate for authors and readers. As an author I've noticed a steep drop in readership on fics restricted to logged in users and when recommending fics to my friends I've noticed that the folks without an account can't find the fics. The logged in users only toggle, not only keeps people without an account from reading a fic, but also from seeing its listing at all. More than 50% of fics I come across have this setting turned on. So, you should get an AO3 account. I know this seems daunting and unfair because it's an invite only system but, you can invite yourself through the homepage if you don't already have one, and in the past few years I've never heard of someone who requested an invitation through this method, not getting one. And for those of you who are hesitant because you don't write, that's okay. It's not weird at all to click on a commenter username and find that they have 0 works and 10,000 bookmarks. It might take a week for the invitation to actually show up, but I can almost guarantee you will get one, just keep an eye on your email. It's free to join and donations are optional. You'll have more to read if you have an account and maybe give your favorite author the chance to protect their work from AI without a loss of readership and feedback.
21K notes · View notes