#huggingface
Explore tagged Tumblr posts
opal-apparition · 6 days ago
Text
25APR2025 AO3 Data Scrape on Hugging Face
There is a new A03 Datascrape on HuggingFace: https://huggingface.co/datasets/Chat-Error/archiveofourown-newest The "archiveofourown-newest" dataset contains approximately 14,806,149 works, while Archive of Our Own publicly listed a total of approximately 14,880,000 works as of April 23, 2025. If your works preceed that date, it's likely they are in this dataset. I submitted a DCMA takedown to HuggingFace at [email protected] , and if you have bandwidth I recommend you do the same. You can also report the dataset by clicking the three dots and posting a dispute, however you'll likely find the poster unhelpful. Do BOTH. Should you not know what to say, there are plentiful DCMA takedown templates online, or you can copy mine.
Note that the people that posted the dataset are not the actual agents to act on the DCMA, HuggingFace is, and they're likely to try to circumvent whatever it is you post by saying:
Hello, Thank you for identifying the relevant works. Please note that you must include valid contact information, including name, address, email address, and telephone number if possible. Once this is done, we may process your request. Sincerely, Anonymous
Funny and notable that they chose to sign this "Anonymous."
Edit: In case it's not abundantly clear, do not give these random thieves your personal info! GO THROUGH HUGGINGFACE. 2nd Edit: as of 6PM EST, the data set has been taken down!
Tumblr media
2nd Edit: as of 27APr2025... They uploaded it as a different dataset
28 notes · View notes
kenyatta · 2 months ago
Text
Thom Wolf, the co founder of Hugging Face wrote this the other day about how the current AI is just “a country of yes men on servers”
History is filled with geniuses struggling during their studies. Edison was called "addled" by his teacher. Barbara McClintock got criticized for "weird thinking" before winning a Nobel Prize. Einstein failed his first attempt at the ETH Zurich entrance exam. And the list goes on.
The main mistake people usually make is thinking Newton or Einstein were just scaled-up good students, that a genius comes to life when you linearly extrapolate a top-10% student.
This perspective misses the most crucial aspect of science: the skill to ask the right questions and to challenge even what one has learned. A real science breakthrough is Copernicus proposing, against all the knowledge of his days -in ML terms we would say “despite all his training dataset”-, that the earth may orbit the sun rather than the other way around.
To create an Einstein in a data center, we don't just need a system that knows all the answers, but rather one that can ask questions nobody else has thought of or dared to ask. One that writes 'What if everyone is wrong about this?' when all textbooks, experts, and common knowledge suggest otherwise.
“In my opinion this is one of the reasons LLMs, while they already have all of humanity's knowledge in memory, haven't generated any new knowledge by connecting previously unrelated facts. They're mostly doing "manifold filling" at the moment - filling in the interpolation gaps between what humans already know, somehow treating knowledge as an intangible fabric of reality.”
26 notes · View notes
justa-personn · 3 days ago
Text
twenty six out of my thirty fics have been scraped. what the fuck.
3 notes · View notes
labelma · 3 days ago
Text
I really really don’t want to archive lock my fics, but 21 out of 26 of my fics on AO3 were scraped by huggingface and now I’m not sure what to do.
I want to keep my fics accessible to people without AO3 accounts but I also don’t want my work getting scraped. What do I do?
2 notes · View notes
jothb · 5 months ago
Text
In other news, The users of the Decentralised and Open Source alternative to Twitter are mad about the platform being Decentralised and Open Source
Tumblr media
The anger has sparked off an arms race of different developers writing scripts to collect more and more data. The users of the decentralised platform on which all posts are public and easily accessible are left in confusion at how such a thing could have happened
42 notes · View notes
marroniere · 3 days ago
Text
how do you guys find out if your fics have been scraped by Hugging Face?
3 notes · View notes
falseandrealultravival · 6 months ago
Text
Three AIs: Gemini, Huggingface, Copilot (Essay)
Tumblr media
So far, I have used three AIs, mainly asking political and economic questions and getting answers from them. I will write about their characteristics.
1) Gemini - Google's AI. I started the earliest. This AI often does not answer questions about politics and economics. Also, the answers are in bullet points, which is a drawback as they are hard to read. It answers questions in English very sharply.
2) Huggingface - From Japan. It connects to the supercomputer Fugaku and gets answers. It is good at questions related to Japan. Overall, it does not refuse to answer; the answers are in text, so they are easy to understand. It also has a sincere attitude towards answers. Personally, it is the best AI.
3) Copilot - I started using it recently. It is a customization of OpenAI by Microsoft. The answers are somewhat short, many of them in bullet points, but it does not refuse to answer. It is somewhere between 1) and 2) in terms of usability.
Summary: Each of the three AIs has pros and cons, and no AI is superior.
Rei Morishita
2024.09.03
3つのAI :Gemini、Huggingface、Copilot(エッセイ)
私はこれまで3つのAIを使い、主に政治・経済的な質問を行い、AIからの回答を得てきた。そこで、これら3つの特徴を書いてみよう。
Gemini~GoogleのAI。最も早くから始めた。このAIは、政治・経済的な質問には回答しないことが多い。また回答は箇条書きで読みにくい欠点がある。英語での質問への回答はシャープである。
Huggingface~日本発。スーパーコンピュータ富嶽に接続し、回答を得る。日本に関連する質問に強い。全体として回答拒否はなく、回答も文章でなされるので解りやすい。回答態度も誠実である。個人的にはベストなAI。
Copilot~最近始めた。OpenAIをマイクロソフトがカスタマイズしたもの。回答はやや短めで、箇条書きの回答も多いが、回答の拒否はしない。1)と2)の中韓くらいの使い勝手である。
まとめ:3つのAIには一長一短があり、絶対的に優れたAIはない。
4 notes · View notes
irradiate-space · 1 year ago
Text
"It's all stolen" is an interesting critique of modern ML/AI tools, because critics who rely on copyright infringement as the basis for their objection to ML/AI tools expose a vulnerability to public-domain data.
Here's the newest largest English dataset, entirely based on verified public-domain texts:
How will objections to ML/AI develop in response?
6 notes · View notes
mundaneone · 10 hours ago
Text
Since it's been confirmed now that even some locked fics were scraped I had mine checked and they got them all. I've had them locked for some time now so this is particularly upsetting.
0 notes
daniiltkachev · 15 days ago
Link
0 notes
outer-space-youtube · 27 days ago
Text
Lunar Outpost
I asked Gemini 2.5 using Noi from Github, to describe for an AI image generator, an underground Lunar outpost for four astronauts to stay for monthly visits. Where two humanoid robots work full time to run the experiments.Gemini gave me an image: I asked DeepSeek-R1-Distill-Qwen-32B to describe an underground Lunar outpost for four astronauts to stay for monthly visits. Where two humanoid robots…
Tumblr media
View On WordPress
0 notes
ai-hax · 1 month ago
Link
0 notes
jcmarchi · 2 months ago
Text
LLMOps in action: From prototype to production
New Post has been published on https://thedigitalinsider.com/llmops-in-action-from-prototype-to-production/
LLMOps in action: From prototype to production
If you’ve ever built a GenAI application, you know the drill—your prototype looks amazing in a demo, but when it’s time to go live? Different story.
In this exclusive video, Samin Alnajafi, Success Machine Learning Engineer at Weights & Biases, unpacks why LLMOps is the missing link between promising GenAI experiments and real-world deployment.
Here’s what you’ll learn:
Why so many GenAI projects stall before reaching production
How to measure and optimize performance using LLMOps best practices
Key components of a scalable retrieval-augmented generation (RAG) pipeline
Practical examples and a live demo of Weights & Biases tools
Don’t let your GenAI project get stuck in limbo. 
Log in to your Insider dashboard and watch now.
P.S. And if you have a few minutes to spare today, why not share your LLMOps expertise? We know how busy you are, so thank you in advance!
Share the tools you use, the challenges you have, and more, and help define the LLMOps landscape.
Whenever you’re ready, here are three ways we can help you grow your AI career:
Become a Pro+ member. Want to be an expert in AI? Join Pro+ for exclusive access to insights from industry leaders at companies like Meta and Google, one complimentary ticket to an in-person Summit of your choice, experienced mentors, AI advantage workshops, and more.
Become a Pro member. Want to elevate your AI expertise? Join Pro for exclusive access to expert insights from leaders at top companies like HuggingFace and Microsoft, member-only articles and frameworks, an extensive video library, networking opportunities, and more. 
AI webinar. Want to unlock smarter, faster, and more scalable incident management? Join us on April 25 for a live session on how AI transforms incident management to accelerate investigations, surface relevant insights, and dynamically scale workflows. Register here.
Exclusive tech leader dinner. Join us in NYC on March 19 for an insightful conversation around the trends, challenges, and opportunities related to harnessing and maximizing Generative AI for the enterprise.
0 notes
kkarmalade · 5 months ago
Text
Tumblr media
This came from taking a logo I created using a remixed Kojo program which I then used as the input for the lovely Hunyuan3D, "2D to 3D" Hugging Face space. I'd like to be able to do all of this using TouchDesigner but I'm not smart enough for that. I miss my old Window computer. Not to be, "that guy" but Macs really do suck ass.
0 notes
govindhtech · 6 months ago
Text
How Hugging Face LeRobot & NVIDIA AI Change Robotics Firms
Tumblr media
Researchers and developers will be able to propel advancements across a variety of industries with the help of Hugging Face’s LeRobot open-source framework and NVIDIA AI and robotics technologies.
Hugging Face and NVIDIA established a partnership to unite their open-source robotics communities to expedite robotics research and development at the Conference for Robot Learning (CoRL) in Munich, Germany.
With the help of Hugging Face’s LeRobot open AI platform, NVIDIA Omniverse, and Isaac robotics technology, researchers and developers will be able to propel advancements in a variety of sectors, such as logistics, manufacturing, and healthcare.
Open-Source Robotics for the Era of Physical AI
The world’s industries are fast changing as a result of the advent of physical AI robots that can comprehend the physical characteristics of their surroundings.
Researchers and developers in robotics require open-source, extensible frameworks that cover the training, simulation, and inference stages of the development process in order to propel and maintain this rapid innovation. The most recent developments are easily accessible for use without requiring code redoing because to models, datasets, and workflows that are made available under shared frameworks.
More than 5 million machine learning researchers and developers use Hugging Face’s top open AI platform, which provides resources and tools to expedite AI development. With more than 1.5 million models, datasets, and applications freely available on the Hugging Face Hub, users may access and refine the most recent pretrained models and create AI pipelines using standard APIs.
Hugging Face’s LeRobot brings the Transformers and Diffusers libraries’ successful principles into the robotics space. In addition to designs for inexpensive manipulator kits, LeRobot provides a full array of tools for sharing data collecting, model training, and simulation settings.
NVIDIA’s AI technologies, simulation, and open-source robot learning modular architecture, like NVIDIA Isaac Lab, help speed up the LeRobot data collection, training, and verification workflow. To create a data flywheel for the robotics community, researchers and developers can share the models and datasets they have created with LeRobot and Isaac Lab.
Scaling Robot Development With Simulation
Physical AI is difficult to develop. Physics-based robotics depends on physical interaction data and vision sensors, which are more difficult to collect at scale than language models that employ vast amounts of internet text data. It takes a lot of time and effort to gather real-world robot data for dexterous manipulation across numerous tasks and settings.
This is made simpler by Isaac Lab, which is based on NVIDIA Isaac Sim and uses high-fidelity rendering and physics simulation to provide realistic synthetic environments and data, allowing robot training via demonstration or trial-and-error in simulation. A single demonstration can provide thousands of real-world experiences’ worth of training data thanks to Isaac Lab’s combination of parallel environment execution and GPU-accelerated physics simulations.
Imitation learning is then utilized to train a strategy using generated motion data. Following successful simulation training and validation, the policies are implemented on an actual robot and subjected to additional testing and fine-tuning to attain peak performance.
This iterative procedure ensures strong and dependable robotic systems by utilizing the scalability of simulated synthetic data and the precision of real-world data.
Developers and academics can build on each other’s work by sharing these datasets, policies, and models on Hugging Face, which speeds up advancements in the field.
“The robotics community flourishes when NVIDIA build together,” said Animesh Garg, an assistant professor at Georgia Tech. By using open-source frameworks like Hugging Face’s LeRobot and NVIDIA Isaac Lab, quicken the pace of research and development in AI-powered robots.
Fostering Collaboration and Community Engagement
The collaborative approach that is being suggested involves collecting data in Isaac Lab through teleoperation and simulation, then saving it in the LeRobotDataset standard format. A robot policy will be trained via imitation learning on data produced by GR00T-Mimic, and it will then be assessed using simulation. Finally, using NVIDIA Jetson for real-time inference, the verified policy is implemented on actual robots.
By demonstrating a physical picking setup with LeRobot software running on an NVIDIA Jetson Orin Nano, which offers a potent, small computing platform for deployment, the first stages in this collaboration have already been completed.
By fusing NVIDIA’s hardware, Isaac Lab simulation, and the Hugging Face open-source community, it could hasten advancements in AI for robotics,” said Remi Cadene, principal research scientist at LeRobot.
By supporting the most recent open models and libraries, including Hugging Face Transformers, optimizing inference for large language models (LLMs), small language models (SLMs), and multimodal vision-language models (VLMs), as well as VLM’s action-based variants of vision language action models (VLAs), diffusion policies, and speech models, all with strong, community-driven support, this work builds on NVIDIA’s community contributions in generative AI at the edge.
Hugging Face and NVIDIA are collaborating to speed up the work of the worldwide robotics research and development community, which is revolutionizing a variety of industries, including manufacturing, logistics, and transportation.
Read more on Govindhtech.com
0 notes
arte-en-la-red · 7 months ago
Text
0 notes