#GPTBot
Explore tagged Tumblr posts
Text
Why You Should Allow GPTBot to Crawl Your Site
#Advanced Crawlers#AI Technology#AI-driven web indexing#Allowing GPTBot on your site#Benefits of GPTBot crawling#Data Privacy#GPTBot#GPTBot and SEO performance#How to optimize for GPTBot#OpenAI#SEO#Server Performance#Technological Advancement#Web Crawling#Web Optimization
0 notes
Text
"#OpenAI IP block ranges if you want to block them from your instance and scraping your content. I saw Mastodon devs added something to block #GPTBot via robots.txt a few days ago. Here are the IP ranges:"
#MastoAdmin #FediBlock
20.15.240.64/28
20.15.240.80/28
20.15.240.96/28
20.15.240.176/28
20.15.241.0/28
20.15.242.128/28
20.15.242.144/28
20.15.242.192/28
40.83.2.64/28
#openai#a.i.#artificial intelligence#ip#internet protocol#gptbot#mastodon#kolektiva#mastoadmin#fediblock#fediverse#ausgov#politas#auspol#tasgov#taspol#neoliberal capitalism#australia#fuck neoliberals#anthony albanese#albanese government#socialmedia#social media#internet#code#script#artificially generated#artificial#ip ranges#range
0 notes
Text
Friday, August 11th, 2023
🌟 New
We've implemented OpenAI’s instructions for blocking GPTBot. This should discourage OpenAI, including ChatGPT, from scraping any part of Tumblr, including individual blogs.
We’re rolling out a new redesign of the direct messaging conversation view.
🛠 Fixed
The latest version of the Android app (30.8) fixes the issue where links to “View Post” on filtered posts opens the web browser instead of taking you to the post in the app itself.
On web, we’ve improved the screen reader hint for tags on posts, so it doesn’t say “Pound” or “Number” when it encounters the hashtag symbol.
Fixed an issue on web that was preventing the Related Tags section of the sidebar from showing up on the search results page.
Fixed a bug in the mobile apps for group blogs which was preventing members from editing the Notifications settings for those group blogs.
We’ve been rolling through some bug fix releases and one major release for the StreamBuilder framework.
🚧 Ongoing
Nothing to report here today.
🌱 Upcoming
We’re cooking up our first public reveal on the @labs blog, give that blog a follow if you want to see what we’re working on!
Experiencing an issue? File a Support Request and we’ll get back to you as soon as we can!
Want to share your feedback about something? Check out our Work in Progress blog and start a discussion with the community.
1K notes
·
View notes
Text
Hey!
Do you have a website? A personal one or perhaps something more serious?
Whatever the case, if you don't want AI companies training on your website's contents, add the following to your robots.txt file:
User-agent: *
Allow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: CCbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PiplBot
Disallow: /
User-agent: ByteSpider
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Omgili
Disallow: /
There are of course more and even if you added them they may not cooperate, but this should get the biggest AI companies to leave your site alone.
Important note: The first two lines declare that anything not on the list is allowed to access everything on the site. If you don't want this, add "Disallow:" lines after them and write the relative paths of the stuff you don't want any bots, including google search to access. For example:
User-agent: *
Allow: /
Disallow: /super-secret-pages/secret.html
If that was in the robots.txt of example.com, it would tell all bots to not access
https://example.com/super-secret-pages/secret.html
And I'm sure you already know what to do if you already have a robots txt, sitemap.xml/sitemap.txt etc.
86 notes
·
View notes
Text
Cambia, todo cambia
🌟 Novedades
Hemos aplicado las instrucciones de OpenAI para bloquear el acceso de GPTBot. Este cambio debería evitar que todos los productos de OpenAI, como ChatGPT, extraigan información y contenido de Tumblr, incluidos todos los blogs de la plataforma.
Hemos rediseñado la interfaz para intercambiar mensajes directos.
🛠️ Mejoras y solución de problemas
Hemos resuelto un error en la última versión de la aplicación para Android (30.8) que afectaba a los enlaces para ver las publicaciones ocultas por los filtros de contenido: ahora se abren correctamente en la aplicación en lugar de llevarte a un navegador web.
En la versión web, hemos mejorado las guías para que los lectores de pantalla pronuncien correctamente los caracteres incluidos en las etiquetas de las publicaciones en inglés. Ya no leerán «Pound» o «Number» al encontrarse con el símbolo de la almohadilla.
La sección con etiquetas relacionadas que aparecía en el menú lateral de las páginas con los resultados de una búsqueda no se estaba mostrando en la versión web, pero ya lo hemos arreglado.
Hemos solucionado una incidencia en las aplicaciones para dispositivos móviles que impedía que las personas integrantes de un blog en grupo pudieran editar la configuración de las notificaciones para ese blog.
Hemos solventado algunos errores de StreamBuilder e incluido varias mejoras importantes.
🚧 En curso
No hay nada que compartir por ahora.
🌱 Próximamente
Muy pronto, anunciaremos la primera gran creación del equipo de @labs en su blog. ¡Seguidlo para descubrir en qué están trabajando!
¿Tienes algún problema? Envía una solicitud al equipo de asistencia y se pondrán en contacto contigo lo antes posible.
¿Quieres hacernos llegar tus comentarios o impresiones sobre alguna función? Echa un vistazo a nuestro flamante blog Work in Progress y empieza a compartir tus ideas y sugerencias con la comunidad.
¡Y no olvides que puedes consultar todos estos cambios en cualquiera de los idiomas disponibles en Tumblr en los blogs oficiales de los equipos internacionales!
69 notes
·
View notes
Text
Ch-ch-changes…
🌟 Novidades
Da mesma forma que com o GPTBot da OpenAI, estamos trabalhando para evitar que a ferramenta Common Crawl tenha acesso ao conteúdo do Tumblr.
Na web, adicionamos o atributo rel=“autor” ao HTLM dos cabeçalhos dos posts exibidos no painel. O objetivo é melhorar a acessibilidade de leitores de tela e outras ferramentas.
Nas páginas de pesquisa na web (/search), movemos a caixa de pesquisa para o centro da página, e não mais na lateral.
Aos usuários que estão experimentando as alterações dos cabeçalhos de reblogue que mencionamos na semana passada: os avatares já estão de volta aos cabeçalhos dos posts de blogs coletivos quando a opção “Mostrar retrato do autor” está habilitada.
Agora na web, é possível bloquear um blog diretamente de uma colaboração recebida em sua caixa de entrada, seja do blog principal ou adicional (caso a colaboração tenha sido enviada para ele).
Nossa equipe fez melhorias nos formatos dos números localizados em todos os idiomas suportados.
Na web, usuários desconectados que navegarem em uma visualização de blog serão solicitados a fazer login após navegar por algum tempo.
Estamos testando a exibição da mensagem “Você está por dentro de todas as novidades!” para quem ativar a opção “As melhores coisas primeiro”.
🛠️ Melhorias
Corrigimos um bug que indicava o número incorreto de assinaturas de um blog.
Corrigimos um problema no editor de posts da web em que os conjuntos de fotos não eram exibidos corretamente quando colocados após a opção “Ler mais”.
Na web, implementamos várias melhorias ao cabeçalho de posts compactos (especialmente quando exibidos em formato de grade na página Explore). Nomes de blogs longos, selos e o botão “Seguir” não causam mais quebras de linha no meio de uma palavra e cada elemento permanece alinhado corretamente no cabeçalho.
Corrigimos um problema no Safari que exibia o ícone dos marcadores - ● - ao lado do menu blogs.
Obtivemos progressos significativos em nossos esforços em corrigir vários problemas relacionados às ações “Desfazer/Refazer” no editor de posts da web. Nota-se claramente como a estabilidade aumentou.
Corrigimos um problema na web que exibia avatares em branco nas asks anônimas.
Ainda na web, corrigimos um problema que afetava os modos HTML e Markdown, onde selecionar tudo às vezes selecionava texto localizado fora do editor.
Corrigimos um problema na função de pesquisa que afetava consultas começando com “#”, retornando resultados com o termo inserido, não relacionados à tag em questão.
🚧 Em andamento
Estamos gradualmente lançando partes do novo design da aba “Atividades” no Android.
E continuamos trabalhando muito na atualização da documentação. Se encontrar algo confuso ou desatualizado, avise para a gente!
🌱 Vindo por aí
Os comentários de vocês sobre o novo design dos reblogues chegou até nós em alto e bom tom. Agradecemos a todos que colaboraram: nossa equipe já está discutindo os próximos passos!
Está tendo algum problema? Preencha o formulário de ajuda e entraremos em contato com você assim que possível!
Deseja enviar comentários e sugestões? Confira o blog “Work in Progress” e comece a conversar com a comunidade.
35 notes
·
View notes
Text
アップデート情報
🌟新機能
OpenAIのGPTBotがTumblrによるコンテンツのクロールを阻止しているのと同様に、Common CrawlクローラーがTumblrからコンテンツをスクレイピング(抽出)するのを阻止するようになりました。
Web版のダッシュボードで、スクリーンリーダーやその他のツールによるアクセシビリティを向上させるため、投稿ヘッダーのブログ名のリンクに「rel="author”」が追加されました。
Web版の「/search」ページで、検索バーがサイドバーからページ中央のメインセクションに移動されました。
先週お伝えしたリブログヘッダーのデザイン変更実験に参加された方に向けて、作成者のポートレートを表示するオプションが有効になっている場合、グループブログからの投稿にプロフィール画像が再び追加されました。
Web版で受信ボックスのゲスト投稿されたブログをブロックする場合、サブブログ(受信者の場合)とメインブログの両方からブロックできるようになりました。
Web版で、ローカライズされた数字の書式がすべてのサポート言語で改善されました。
Web版で、ログアウトしているユーザーがブログビューを閲覧している場合、しばらくスクロールすると、時々ログインするよう促されるようになりました。
「お気に入りをトップに表示する」を有効にしたユーザーに 「すべてチェック済みです!」のカルーセルを表示する実験を行なっています。
🛠️ バグ修正
ユーザーのブログに対するサブスクリプション数が正しく表示されないバグが修正されました。
Web版の投稿エディターで、「続きを読む」ブロックの後にフォトセットを配置すると正しく表示されない問題がありましたが、すでに修正済みです。
Web版で、コンパクトな投稿(例えば、探索ページのグリッドに表示される場合など)の投稿ヘッダーが改善されました。長いブログ名やバッジ、フォローボタンが単語の途中で改行されることはなくなり、各要素はヘッダー内で適切に整列されたままになります。
Safariを使用している時、ブログメニューアイテムの横に箇条書きが表示される問題が修正されました。
Web版の投稿エディターでの取り消し/やり直しに関する様々な問題が修正されました。エディターで取り消しとやり直しを使用する際の安定性が向上していることに気づかれた方もいるかもしれません。
Web版で、匿名の質問のプロフィール画像が空白になってしまう問題が修正されました。
Web版の投稿エディターで、すべてを選択するとエディター外のテキストが選択されることがあるという、HTMLおよびMarkdownモードに影響する問題が修正されました。
一部ユーザーに影響する、ハッシュタグ(#)を付けて検索を開始すると、そのタグの検索結果ではなく、検索結果に戻ってしまうという、一部ユーザーに影響する問題が修正されました。
🚧 現在対応中
アクティビティタブ/ハイライトのデザインが変更され、Android版ユーザーに徐々にロールアウトされています。
現在、懸命にTumblrのドキュメントの更新に取り組んでいます。分かりにくい点や古いままの箇所を見つけたら、フィードバックをお寄せください!
🏴☠️👒
🌱今後の予定
リブログのデザイン変更に関するフィードバックをしっかりと受け取りました。現在、次のステップについての話し合いを進めています。ありがとうございました!
問題が発生していませんか?そんな時は、サポートリクエストを送ってください(英語でのみ対応)。できるだけ迅速に対応させていただきます。
共有したいフィードバックがありますか?「Work in Progress」ブログ(英語のみ)をチェックして、コミュニティで議論を始めましょう。
36 notes
·
View notes
Text
Ch-ch-changes
🌟 Novità
Stiamo cercando di dissuadere il crawler CommonCrawl dall'estrarre contenuti da Tumblr, esattamente come facciamo con il GPTBot di OpenAI quando cerca di eseguire la scansione dei contenuti Tumblr.
Nella dashboard web, abbiamo aggiunto rel="author" al collegamento blogname nelle intestazioni dei post per migliorare l'accessibilità tramite screen reader e altri strumenti.
Nelle pagine /search sul web, abbiamo spostato la barra di ricerca dalla barra laterale alla sezione centrale/principale della pagina.
Per le persone coinvolte nel suddetto esperimento di riprogettazione dell'intestazione del reblog, abbiamo aggiunto gli avatar ai post dei blog di gruppo quando l'opzione per mostrare i ritratti degli autori è abilitata.
Quando blocchi un blog tramite invio nella tua casella di posta sul web, ora puoi bloccare sia il blog secondario (se era il destinatario) che quello principale.
Sul web abbiamo apportato alcuni miglioramenti alla formattazione dei numeri localizzati in tutte le lingue supportate.
Sul web, agli utenti non loggati che navigano all'interno di un blog, potrebbe essere richiesto di accedere dopo aver effettuato lo scorrimento per un po' di tempo.
Stiamo testando la visualizzazione del carosello "Non c'è niente di nuovo!" Per le persone con "Prima il meglio" abilitato.
🛠 Correzioni
Abbiamo risolto un bug che mostrava il numero errato di abbonamenti di un utente al proprio blog.
Si è verificato un problema nell'editor dei post sul web in cui i set di foto non venivano visualizzati correttamente se posizionati dopo un blocco Leggi di più. Ora è risolto.
Sul web, abbiamo apportato alcuni miglioramenti all'intestazione dei post compatti (come quando vengono visualizzati in una griglia nella pagina Esplora, ad esempio). I nomi lunghi dei blog, i badge e il pulsante Segui non si interrompono più su una nuova riga nel mezzo di una parola e ogni elemento rimane correttamente allineato nell'intestazione.
Abbiamo risolto un problema che causava la visualizzazione di un punto elenco accanto alla voce di menu Blog durante l'utilizzo di Safari.
Abbiamo fatto qualche progresso nel risolvere vari problemi relativi all'annullamento/ripetizione nell'editor dei post sul web. Dovresti notare una migliore stabilità quando usi Annulla e Ripeti nell'editor.
Sul web abbiamo risolto un problema per cui gli avatar delle richieste anonime risultavano vuoti.
Sul web, abbiamo risolto un problema che interessava le modalità HTML e Markdown nell'editor dei post per cui seleziona tutto a volte selezionava il testo all'esterno dell'editor.
Abbiamo risolto un problema che riguardava alcuni utenti per cui l'avvio di una ricerca con un hashtag (#) restituiva risultati di ricerca anziché risultati per quel tag.
🚧 In corso
Stiamo lentamente implementando alcune parti della riprogettazione della scheda attività/informazioni importanti per gli utenti Android.
Stiamo lavorando duramente per aggiornare i nostri documenti. Se noti qualcosa di confuso o non aggiornato, invia un feedback!
🏴☠️👒
🌱 In arrivo
Abbiamo ascoltato forte e chiaro il tuo feedback sui reblog riprogettati e stiamo discutendo i prossimi passi, grazie!
Hai riscontrato un problema? Invia una richiesta di supporto e ti risponderemo il prima possibile!
Vuoi condividere il tuo feedback su qualcosa? Dai un'occhiata al nostro blog Work in Progress e avvia una discussione con la community.
31 notes
·
View notes
Text
Что нового?
🌟 Новинки
Мы препятствуем тому, чтобы веб-сканер Common Crawl сдирал контент из Tumblr, также как препятствуем сканированию контента со стороны GPTBot OpenAI.
В ленте веб-версии: в заголовке поста мы добавили к ссылке на имя блога атрибут rel="author" , чтобы улучшить его доступность для программ чтения с экрана и других инструментов.
На страницах /search в веб-версии мы переместили панель поиска с боковой панели в центральную/основную часть страницы.
Для участников ранее упомянутого эксперимента с обновлённым дизайном заголовков реблогов: мы вернули аватары в посты из групповых блогов, в которых включено отображение портрета авторов.
Блокируя блог через предложенный пост во входящих, вы теперь можете делать это как от имени своего основного блога, так и дополнительного (если он был получателем).
В веб-версии: мы усовершенствовали локализованное форматирование чисел на всех поддерживаемых языках.
В веб-версии: пользователи, не вошедшие в систему и листающие блог в режиме про��мотра, будут время от времени получать предложение войти в систему.
Мы тестируем показ карусели "Вы уже в курсе всех новостей" пользователям, у которых включен параметр "Сначала самое лучшее".
🛠️ Исправления
Мы исправили ошибку, из-за которой у пользователей отображалось неправильное число подписок.
В редакторе постов в веб-версии была проблема с неправильным отображением фотосетов, которые шли после блока "Читать далее". Это уже исправлено.
В веб-версии мы улучшили заголовок постов в компактном режиме просмотра (например, когда они отображаются в сетке на странице "Обзор"). Длинные имена блогов, значки и кнопка "Читать" больше не обрываются на середине слова при переходе на новую строку, и все элементы остаются правильно выровненными в заголовке.
Мы исправили проблему, из-за которой при использовании браузера Safari рядом с пунктом меню "Блоги" отображался маркер.
Мы значительно продвинулись в исправлении ошибок, связанных с действиями отмены/повтора в редакторе постов в веб-версии. Стабильность их работы заметно повышена.
В веб-версии: исправлена ошибка, из-за которой аватары в анонимных вопросах отображались пустыми.
В веб-версии: исправлена ошибка в режимах HTML-кода и Markdown в редакторе постов, из-за которой при выборе команды "Выделить всё" мог выделяться текст за пределами редактора.
Исправлена ошибка, из-за которой у пользователей, начинавших поисковый запрос с хештега (#), отображались результаты обычного поиска, а не поиска по этому тегу.
🚧 Ещё в работе
Мы постепенно внедряем элементы нового дизайна вкладки активности для пользователей Android.
Мы усердно работаем над обновлением наших справочных документов. Если вы увидите что-то непонятное или устаревшее, сообщите нам!
🏴☠️👒
🌱 Уже на подходе
Мы прислушались к вашим отзывам по поводу нового дизайна реблогов и работаем в нужном направлении. Спасибо!
Возникли проблемы? Напишите нам, и мы свяжемся с вами в ближайшее время!
Хотите поделиться своим мнением? Перейдите в наш блог Work in Progress и начните обсуждение с другими участниками сообщест��а.
37 notes
·
View notes
Text
Notes de mise à jour
🌟 Nouveautés
De la même manière qu'avec le GPTBot d'OpenAI, nous nous sommes employés à éviter que l'outil Common Crawl ne vienne mettre son nez dans les contenus Tumblr.
Sur le Web, nous avons ajouté l'attribut rel="author" dans le code HTML des en-têtes des billets affichés sur le tableau de bord. Cet attribut précisant que le contenu affiché appartient à son auteur, il contribue à améliorer l'accessibilité du site pour les utilisateurs de lecteurs d'écran ou d'autres outils similaires.
Dans les pages de recherche sur le Web (/search), le champ de recherche est à présent placé au centre de l'écran plutôt que sur le côté.
Pour les utilisateurs expérimentant le nouveau design des en-têtes de reblogs dont nous vous parlions la semaine dernière, les avatars sont maintenant de retour dans l'en-tête des billets de blogs multi-auteurs lorsque l'option "Afficher les portraits des auteurs" est activée.
Sur le Web, il est désormais possible de bloquer directement un blog à partir d'une contribution reçue dans votre boîte de réception, et ce, soit à partir du blog principal, soit via un blog supplémentaire (s'il était destinataire de la contribution).
Nos équipes ont apporté des ajustements aux formats de nombres localisés, et ce, dans toutes les langues proposées.
Sur le Web, les utilisateurs non connectés qui parcourent la vue intégrée d'un blog se verront parfois invités à s'inscrire ou se connecter après un certain laps de temps.
Nous expérimentons l'affichage du carrousel "Vous avez tout vu !", mais, cette fois-ci, à destination des utilisateurs ayant activé l'option "Meilleurs contenus en premier".
🛠️ Correctifs
Correction d'une anomalie qui pouvait indiquer un nombre incorrect de souscriptions rattachées à un blog.
Correction dans l'éditeur de billets sur le Web d'un problème qui faisait que les diaporamas placés après un bloc "Afficher davantage" ne s'affichaient pas correctement.
Diverses améliorations ont été apportées à l'en-tête des billets au format compact (notamment lorsqu'ils sont affichés en format grille dans la page Explorer). Les noms de blog à rallonge, les badges et les boutons "S'abonner" ne provoquent plus de retour à la ligne en milieu de mot, et chaque élément reste correctement aligné dans l'en-tête.
Correction dans Safari d'un bug qui pouvait faire apparaître un ● à côté du menu Blogs.
Dans l'éditeur de billets sur le Web, nous avons progressé dans la correction d'anomalies pouvant survenir lorsque les actions Annuler/Rétablir étaient utilisées en cours de rédaction. La stabilité de leur fonctionnement a été notablement accrue.
Correction sur le Web d'un problème qui pouvait afficher des avatars vides dans les questions anonymes.
Correction sur le Web d'une anomalie qui affectait les modes d'édition HTML et Markdown : l'utilisation de la commande "Tout sélectionner" pouvait parfois sélectionner également du texte situé en dehors de l'éditeur.
Correction d'un bug dans la fonction Recherche qui affectait les requêtes commençant par "#" et proposait des résultats comportant textuellement le terme saisi plutôt que des résultats relatifs au tag en question.
🚧 En cours
Davantage d'éléments du nouveau design des activités de la section Trafic sont progressivement déployés pour les utilisateurs de l'application Android.
Nos équipes continuent leur travail de mise à jour du Centre d'aide. Si vous constatez des articles prêtant à confusion ou obsolètes, n'hésitez pas à nous le signaler ! (Notez cependant que la version française est toujours un peu en décalage avec la version anglaise, et qu'à ce titre, il se peut que certains articles ou éléments de l'interface ne soient pas encore traduits au moment où vous lisez ces lignes.)
🏴☠️👒
🌱 Prochainement
Vos retours concernant le récent redesign des reblogs ont été bien entendus par nos équipes, et elles vous remercient ! Ces dernières les ont maintenant bien à l'esprit alors qu'elles travaillent actuellement aux prochaines évolutions.
Vous rencontrez un problème ? Écrivez-nous (en anglais) et nous reviendrons vers vous aussi vite que possible !
Vous souhaitez nous faire part de vos commentaires ? Rendez-nous visite sur le blog Work in Progress et participez aux discussions de la Communauté !
18 notes
·
View notes
Text
Updates
🌟 Neu
Wir haben die Hinweise von OpenAI zum Blockieren des GPTBot umgesetzt. Hierdurch kann OpenAI, einschließlich ChatGPT, keine Bereiche auf Tumblr, einschließlich einzelner Blogs, auslesen.
Wir haben die Ansicht für Unterhaltungen in den Direktnachrichten neu gestaltet.
🛠 Behobene Bugs
In der neuesten Version der Android-App (30.8) wurde das Problem behoben, dass Links zu „Eintrag anzeigen“ bei gefilterten Einträgen den Webbrowser öffnen, anstatt den Eintrag in der App selbst aufzurufen.
Im Web haben wir den Screenreader-Hinweis für Tags in Einträgen verbessert, sodass er im Englischen nicht mehr „Pound“ oder „Number“ vorliest, wenn das Hashtag-Symbol auftaucht.
Es wurde ein Problem im Web behoben, das verhinderte, dass der Abschnitt „Verwandte Tags“ in der Seitenleiste auf der Seite mit den Suchergebnissen angezeigt wurde.
In den mobilen Apps wurde ein Fehler bei Gruppenblogs behoben, der verhinderte, dass Mitglieder die Benachrichtigungseinstellungen für diese Blogs bearbeiten konnten.
Wir veröffentlichen einige Bugfix-Releases und ein größeres Release für das StreamBuilder-Framework.
🚧 Baustelle
Zurzeit keine Meldungen.
🌱 Demnächst
Wir arbeiten an unserer ersten öffentlichen Ankündigung auf dem @labs-Blog. Wenn du wissen willst, worum es geht, solltest du diesem Blog unbedingt folgen (nur auf Englisch)!
Tritt ein Fehler auf? Kontaktiere den Support und wir melden uns, so schnell es geht!
Hast du Feedback für uns? Check regelmäßig unsere Updates und diskutiere mit der Community.
18 notes
·
View notes
Text
Not a single corner of the web will be left unscraped for AI: TikTok does it even faster than OpenAI
ByteDance is extracting web information on a massive and automated scale at a rapid pace.
It is doing this to train its models and position itself as a Chinese AI giant.
The company is also developing its own chips to reduce dependence on foreign suppliers.
TikTok Solen Feyissa
ByteDance, the parent company of TikTok, is in the midst of the AI race, or at least entering it. It is doing so with a dual strategy: developing its own chips — a project that has been underway for over three years — and also collecting data to train its future model.
Why it matters: Generative AI is currently dominated by OpenAI and Google, with NVIDIA providing the necessary hardware. If TikTok enters the field with enough strength, it could shake the balance of power we’ve seen so far.
The landscape: ByteDance is scraping the web at a rate much higher than what OpenAI has accustomed us to, according to Quartz. This means it is extracting and organizing online information in a massive and automated way.
This is being done to acquire enough data to train its own AI models while developing its own chips to reduce dependence on foreign suppliers, which is particularly sensitive for a Chinese company.
ChatGPT has a lot of competition in China. Here’s how to use its rivals (and they speak English).
In numbers:
Bytespider, ByteDance’s web scraper, is 25 times faster than OpenAI’s GPTbot…
…and 3,000 times faster than Anthropic’s ClaudeBot.
ByteDance has ordered over 100,000 Ascend 910B chips from Huawei this year to replace NVIDIA chips.
The context: U.S. restrictions on the export of specialized AI chips have forced Chinese companies to seek domestic alternatives and develop their own technology. They were already doing this, but the sanctions have pushed them to take it further.
ByteDance is designing two AI chips with TSMC and plans to bring them into mass production by 2026.
Between the lines: ByteDance has already achieved several milestones regarding AI:
In August 2023, they launched the Doubao chatbot.
In May 2024, they announced Doubao models for businesses.
This year, they also presented two AI models focused on the company’s strength: video.
This pace of development and scraping by ByteDance points in a clear direction: they want to position themselves as a true Chinese AI giant, not just as a secondary face of TikTok dedicated to it. What comes in terms of regulation and ethics will be a question for the future.
3 notes
·
View notes
Text
Tuesday, August 29th, 2023
🌟 New
We’re now discouraging the Common Crawl crawler from scraping content from Tumblr, similar to how we’re discouraging OpenAI’s GPTBot from crawling Tumblr content.
On the web dashboard, we added rel="author" to the blogname link in post headers to improve accessibility via screenreaders and other tools.
🛠 Fixed
We fixed a bug that was showing the incorrect number of subscriptions a user has for their blog.
There was a issue in the post editor on web where photosets were not displaying correctly when placed after a read more block. This is now fixed.
🚧 Ongoing
We’re slowly rolling out some more parts of the activity tab/highlights redesign to android users.
🏴☠️👒
🌱 Upcoming
We’ve heard your feedback about the redesigned reblogs loud and clear and are discussing next steps, thanks!
Experiencing an issue? File a Support Request and we’ll get back to you as soon as we can!
Want to share your feedback about something? Check out our Work in Progress blog and start a discussion with the community.
602 notes
·
View notes
Text
As media companies haggle licensing deals with artificial intelligence powerhouses like OpenAI that are hungry for training data, they’re also throwing up a digital blockade. New data shows that over 88 percent of top-ranked news outlets in the US now block web crawlers used by artificial intelligence companies to collect training data for chatbots and other AI projects. One sector of the news business is a glaring outlier, though: Right-wing media lags far behind their liberal counterparts when it comes to bot-blocking.
Data collected in mid-January on 44 top news sites by Ontario-based AI detection startup Originality AI shows that almost all of them block AI web crawlers, including newspapers like The New York Times, The Washington Post, and The Guardian, general-interest magazines like The Atlantic, and special-interest sites like Bleacher Report. OpenAI’s GPTBot is the most widely-blocked crawler. But none of the top right-wing news outlets surveyed, including Fox News, the Daily Caller, and Breitbart, block any of the most prominent AI web scrapers, which also include Google’s AI data collection bot. Pundit Bari Weiss’ new website The Free Press also does not block AI scraping bots.
Most of the right-wing sites didn’t respond to requests for comment on their AI crawler strategy, but researchers contacted by WIRED had a few different guesses to explain the discrepancy. The most intriguing: Could this be a strategy to combat perceived political bias? “AI models reflect the biases of their training data,” says Originality AI founder and CEO Jon Gillham. “If the entire left-leaning side is blocking, you could say, come on over here and eat up all of our right-leaning content.”
Originality tallied which sites block GPTbot and other AI scrapers by surveying the robots.txt files that websites use to inform automated web crawlers which pages they are welcome to visit or barred from. The startup used Internet Archive data to establish when each website started blocking AI crawlers; many did so soon after OpenAI announced its crawler would respect robots.txt flags in August 2023. Originality’s initial analysis focused on the top news sites in the US, according to estimated web traffic. Only one of those sites had a significantly right-wing perspective, so Originality also looked at nine of the most well-known right-leaning outlets. Out of the nine right-wing sites, none were blocking GPTBot.
Bot Biases
Conservative leaders in the US (and also Elon Musk) have expressed concern that ChatGPT and other leading AI tools exhibit liberal or left-leaning political biases. At a recent hearing on AI, Senator Marsha Blackburn recited an AI-generated poem praising President Biden as evidence, claiming that generating a similar ode to Trump was impossible with ChatGPT. Right-leaning outlets might see their ideological foes’ decisions to block AI web crawlers as a unique opportunity to redress the balance.
David Rozado, a data scientist based in New Zealand who developed an AI model called RightWingGPT to explore bias he perceived in ChatGPT, says that’s a plausible-sounding strategy. “From a technical point of view, yes, a media company allowing its content to be included in AI training data should have some impact on the model parameters,” he says.
However, Jeremy Baum, an AI ethics researcher at UCLA, says he’s skeptical that right-wing sites declining to block AI scraping would have a measurable effect on the outputs of finished AI systems such as chatbots. That’s in part because of the sheer volume of older material AI companies have already collected from mainstream news outlets before they started blocking AI crawlers, and also because AI companies tend to hire liberal-leaning employees.
“A process called reinforcement learning from human feedback is used right now in every state-of-the-art model,” to fine-tune its responses, Baum says. Most AI companies aim to create systems that appear neutral. If the humans steering the AI see an uptick of right-wing content but judge it to be unsafe or wrong, they could undo any attempt to feed the machine a certain perspective.
OpenAI spokesperson Kayla Wood says that in pursuit of AI models that “deeply represent all cultures, industries, ideologies, and languages” the company uses broad collections of training data. “Any one sector—including news—and any single news site is a tiny slice of the overall training data, and does not have a measurable effect on the model’s intended learning and output,” she says.
Rights Fights
The disconnect in which news sites block AI crawlers could also reflect an ideological divide on copyright. The New York Times is currently suing OpenAI for copyright infringement, arguing that the AI upstart’s data collection is illegal. Other leaders in mainstream media also view this scraping as theft. Condé Nast CEO Roger Lynch recently said at a Senate hearing that many AI tools have been built with “stolen goods.” (WIRED is owned by Condé Nast.) Right-wing media bosses have been largely absent from the debate. Perhaps they quietly allow data scraping because they endorse the argument that data scraping to build AI tools is protected by the fair use doctrine?
For a couple of the nine right-wing outlets contacted by WIRED to ask why they permitted AI scrapers, their responses pointed to a different, less ideological reason. The Washington Examiner did not respond to questions about its intentions but began blocking OpenAI’s GPTBot within 48 hours of WIRED’s request, suggesting that it may not have previously known about or prioritized the option to block web crawlers.
Meanwhile, the Daily Caller admitted that its permissiveness toward AI crawlers had been a simple mistake. “We do not endorse bots stealing our property. This must have been an oversight, but it's being fixed now,” says Daily Caller cofounder and publisher Neil Patel.
Right-wing media is influential, and notably savvy at leveraging social media platforms like Facebook to share articles. But outlets like the Washington Examiner and the Daily Caller are small and lean compared to establishment media behemoths like The New York Times, which have extensive technical teams.
Data journalist Ben Welsh keeps a running tally of news websites blocking AI crawlers from OpenAI, Google, and the nonprofit Common Crawl project whose data is widely used in AI. His results found that approximately 53 percent of the 1,156 media publishers surveyed block one of those three bots. His sample size is much larger than Originality AI’s and includes smaller and less popular news sites, suggesting outlets with larger staffs and higher traffic are more likely to block AI bots, perhaps because of better resourcing or technical knowledge.
At least one right-leaning news site is considering how it might leverage the way its mainstream competitors are trying to stonewall AI projects to counter perceived political biases. “Our legal terms prohibit scraping, and we are exploring new tools to protect our IP. That said, we are also exploring ways to help ensure AI doesn’t end up with all of the same biases as the establishment press,” Daily Wire spokesperson Jen Smith says. As of today, GPTBot and other AI bots were still free to scrape content from the Daily Wire.
6 notes
·
View notes
Text
Zmiany, zmiany...
⭐ Co nowego
Wdrożyliśmy wskazówki OpenAI dotyczące blokowania GPTBot. Powinno to zniechęcić OpenAI, w tym ChatGPT, do scrapowania części Tumblra, w tym poszczególnych blogów.
Wypuszczamy w świat przeprojektowany widok wiadomości bezpośrednich.
🛠 Co poprawionego
W najnowszej wersji Tumblra na Androida (30.8) linki do “Wyświetl post” w przefiltrowanych postach przekierowują już elegancko do aplikacji, a nie do przeglądarki, tak jak to miało miejsce wcześniej.
W przeglądarce poprawiliśmy podpowiedzi czytnika ekranu dla tagów w postach, dzięki czemu czytniki już nie wariują, gdy tu i tam napotkają symbol hashtagu.
Sekcja “Powiązane tagi” na pasku bocznym wyświetla się już bez zarzutu na stronie wyników wyszukiwania.
Członkowie blogów grupowych mogą już do woli edytować ustawienia powiadomień w aplikacjach mobilnych.
Wprowadziliśmy kilka poprawek błędów i jedną główną wersję struktury StreamBuilder.
🚧 Co w toku
Konkurs na najlepszy ekwipunek plażowicza, czyli jak się ratować, gdy na dworze upały z piekła rodem.
🌱 Co nadchodzi
Przygotowujemy naszą pierwszą publiczną odsłonę na blogu @labs. Obserwujcie ten blog, żeby zobaczyć, nad czym aktualnie pracujemy (tylko po angielsku).
Widzisz jakiś problem? Prześlij go nam, a my wkrótce się odezwiemy!
Chcesz podzielić się swoją opinią? Sprawdź blog @ekipa i zacznij rozmawiać ze społecznością.
13 notes
·
View notes
Text
Ch-ch-changes…
🌟 Novidades
Implementamos as instruções da OpenAI para bloquear o GPTBot. Isso deve desestimular a OpenAI (inclusive o ChatGPT) a copiar partes do Tumblr, incluindo blogs individuais.
Estamos lançando um novo design de visualização de mensagens diretas.
🛠️ Melhorias
A versão mais recente do aplicativo para Android (30.8) corrige o problema em que os links para “Ver post” de posts filtrados abriam o navegador da web, e não o aplicativo.
Aprimoramos a dica do leitor de tela das tags na web, para que “hashtag” não seja lido antes da palavra que vem a seguir ao símbolo.
Corrigimos um problema na web que impedia a exibição da seção tags relacionadas da barra lateral na página de resultados de pesquisa.
Corrigimos um erro nos aplicativos em que os membros de blogs coletivos não conseguiam editar as configurações de notificações desses blogs.
Fizemos várias atualizações para implementar correções de bugs, bem como uma atualização importante de nosso framework StreamBuilder.
🌱 Vindo por aí
O mais novo projeto da nossa equipe será revelado em breve ao mundo em nosso blog @labs: siga para ficar por dentro dos que estamos fazendo (blog em inglês).
Está tendo algum problema? Preencha o formulário de ajuda e entraremos em contato com você assim que possível!
Deseja enviar comentários e sugestões? Confira o blog “Work in Progress” e comece a conversar com a comunidade.
26 notes
·
View notes