Tumgik
#there are still some pages like the wikipedia up but everything that has either distributing power or videos has been thoroughly scrubbed
magichats · 1 year
Text
Man it really is a shame that as of this date (July 5th, 2023) that Cookie's Bustle is all but scrapped off the web at large.
There's only two videos discussing it directly (and that more about the take downs and not directly the game itself), with everything else turning up no results.
I know some people on twitter were thinking that it was for some sort of relaunch, but at this point I really don't think it's some sort of modern reboot.
This is really a bummer for game preservation. Also a bummer as a person who just likes weird obscure games.
53 notes · View notes
sunnyrinusstudies · 4 years
Text
Tumblr media
Going FOSS: An Intro to Open-Source software for studyblr (and also some privacy related bits)
Source for Header Image
Intro & attempt at TLDR
Hey everyone! Today I’d like to tell y’all something about Open Source Software, and also Why this should matter to you! This’ll probably be the first post of a series I intend to do, because I believe the Studyblr community, even the non-nerd folks, could really benefit from switching some things out in their digital environment. Since this is a long post, I attempted to summarise it below, please do read on if you have the spoons tho!
TLDR?
FOSS stands for “Free and Open Source Software” the “free” part doesn’t necessarily mean it’s free as in free pizza, but mostly means free as in freedom.
There’s a humongous amount of variants on this concept, but the core of FOSS specifically is the four freedoms:
1. To run the program however you want and for whatever you want
2. To study how the program works and to change it in whatever way you want
3. To be able to share it with whomever you feel like
4. To be able to share your modified version with whomever you want
There’s a whole host of software licenses built around these concepts, you can check those out at the Open Source Initiative website, or at Choose A License. Both have a good summary of what they all stand for.
Open Source software is used for a lot of products, nearly every single webserver is an Apache Linux server, Google chrome is built on top of their open source chromium (google is still the devil, but y’know, it’s an example), and even deep deep down, Apple computers run on top of a Linux Kernel. Many more can be listed, but I won’t do that otherwise this isn’t a TLDR anymore.
Now, Why is this important for you? The Open Source Initiative summed it up real nicely already, but heres a short paraphrase:
Control & Security. If software is open source then you can check if it really works the way it does, and to make sure it’s not spying on you. Even if you don’t have the skills for it, someone else who does will be able to check. Also if you don’t like how something works in a program, then you’ll be able to change it or find someone else’s changed version that you like more.
Training. People who want to learn programming can use the code to see what makes programs tick, as well as use it as a guide for their own projects.
Stability. Because everything’s out in the open, that means someone else can take up maintaining a project or make a successor of it, in case the original developers suddenly quit working on it. This is especially important when it’s software that’s absolutely critical for certain tasks.
Community. It’s not just one program. It’s a lot of people working together to make, test, use, and promote a project they really love. Lots of projects end up with a dedicated fanbase that helps support the developers in continuing to work on the software.
I’d like to add one more tho: Privacy, which ties in a lot with the security part. Nowadays with protests going on and everything being online due to the pandemic, folks have been and will be confronted much more with the impact of privacy, and lack thereof. Open Source software means that if any company or group tries to spy on you, then you and anyone who feels like checking, will be able to know and take action on it. Here’s the EFF page on privacy and why it should matter to you
If that got your attention then read on past the readmore button! Or, if nothing else maybe check out the Free and Open Source Software portal on Wikipedia? Or maybe the resources page of the Open Source Initiative?
Terminology: Let’s get that out of the way first
Open Source: The source code that a program is made up of is freely accessible, anyone can look at it and check whether it works well enough or to make sure it doesn’t spy on you.
FOSS: Free and Open Source Software. This doesn’t mean that you don’t need to pay for it, it’s free as in freedom and free speech, not free pizza.
There are four freedoms associated with FOSS:
The freedom to run the program as you wish, for any purpose (freedom 0).
The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
The freedom to redistribute copies so you can help others (freedom 2).
The freedom to distribute copies of your modified versions to others (freedom 3).
By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
FLOSS: Free and Libre Open Source Software. This time it is “free” as in free pizza. The “libre” is french for “free” as in freedom.
GRATIS: Sometimes people use this word to mean “free” as in free pizza. Usually alongside “FOSS”
Licenses : A license is something that tells others what they can or cannot do with your code. Licenses also apply to art and literature, those are copyright licenses. There are many different software licenses and I’m not going to be able to list them all.
The biggest players however are:
Apache License 2.0
The 3-Clause BSD License
GNU General Public License (also known as GPL)
MIT License
Mozilla Public License 2.0
There’s even more and you can find a list of them Here on the Open Source Initiative site There’s so many licenses that there’s even a Choose A License site, where you can pick a license depending on what you want it to achieve
Who and/or what even uses open source software?
You don’t need to be some nerd to benefit from Open Source software, in fact, you’re using open source software right now! The biggest example is the whole entire internet. Websites are stored on servers, and nearly every single webserver is a Linux server. The second biggest browser Firefox is open source, and even google chrome is built on top of “chromium” an open source base. If you dont use an iPhone, then you’re probably on an Android phone. Guess what? Android is part of the Android Open Source Project, which is then built upon a GNU/Linux base. All Open Source. Chromebooks? Built on top of a Linux kernel (like a non-patented engine you could put into any motor vehicle you’d like). Heck, even Apple computers are, at their core, built on top of a Linux kernel.
Neat apps you may wanna check out!
I’ve made a little list of apps that might be especially useful for studyblr folks, but depending on how well this post does I’ll probably make some more posts for specific apps.
TiddlyWiki, has a bajillion different ways to organise your thoughts, and also a lot of variant builds out there. Check out their table of contents if you feel lost! There’s versions available for most big browsers, as well as windows, linux, mac, android, and iOS.
AnyType, is an app that looks and almost exactly like notion, but is much more decentralised. They’re currently still in development but if you want to support them, sign up for early access and give them some feedback so they know what works and doesn’t! They’re still in closed alpha, but are intending to give beta access to about 100 folks at a time throughout 2021, so please sign up if this looks interesting to you!
Trilium Notes, is slightly more like a “notebook”, however you can arrange your notes in nearly infinitely deep folders. You can use things like Relation Maps & Link Maps to visualise your notes and how they go together. There’s even more they do and I just cant list it all, so go check out their stuff for a more comprehensive overview! Works on windows, linux, and (unsupported) mac
LibreOffice and ONLYOFFICE are two office suites that function just as well as micro$oft office, often Even Better in my experience. I’ve used LibreOffice for years now and honestly? never going back. OnlyOffice is technically free (as in pizza), but it’s a slight hassle to get everything set up, cause you need to set it up on a server. They have a paid and hosted version available with educational discounts, but honestly i’d go with LibreOffice.
OnePile, is an app I haven’t used myself since it only runs on Apple stuff. But I’ve heard a lot of good things about it so that’s why it’s in here. It looks like it works similar to most general “note taking notebook” apps. Looks really pretty too honestly.
EtherPad, is similar to ONLYOFFICE, however this one’s a lot more focused on specifically text documents. Works with real-time collaboration which is really neat.
Anything that FramaSoft has going on. They’re a non-profit organisation, dedicated to promoting digital freedom. A lot of open source cloud related things are not really useful to people who don’t have the time and/or money to set up a whole-ass server. That’s where FramaSoft comes in, they do it for you. Just about everything they offer (here’s a full overview) are free (as in free pizza). They also have a separate site to help you get started!
It’s not free to run it all on their side, so if you find yourself interested in using their services please try to support them any way you monetarily can! (they even have a “minetest” server (not minecraft, deeeefinitely not minecraft))
Joplin!! Which is also what I used to write this post so I wouldn’t have to use The Tumble’s post writing thing. It’s good for taking notes, has a bunch of neat plug-ins, and can also sync with a variety of cloud services!
Nextcloud For if you want to go just that little bit further on the open source and the privacy. Nextcloud has honestly way too many features for me to list, but the important parts are that it’s a nigh perfect replacement for office365, and probably even GSuite. The one caveat is that you either gotta host it yourself, or get someone else to host it for you. Framasoft (mentioned above), has a nextcloud instance. It works on just about every single platform, and can integrate with an absurd amount of services. Here’s a list of providers that work with nextcloud, and what different apps they have installed on their server.
I personally use Disroot, because they’re a local (as in, my country) non-profit that offer about 2gb of free storage, and then for about 15 cents per GB per month you can get more storage if you want. They also have an email service which is hella neat. Their one main rule is Do Not Use For Business Purposes, because they’re here to help the individual folks, not companies.
Neat Links you may also want to look at!
Here are some sources, and also resources that I used for this post. There’s also some stuff here that I think folks may be interested in in general.
General Wikipedia Article on Open Source Software
The Free and Open Source Software portal on Wikipedia
Resources page of the Open Source Initiative
Free Software Foundation definition of “free software”
itsfoss page on what FOSS means
itsfoss page on the history of FOSS
Open Source Software Foundation list of projects and apps they really like
Open Source Initiative on “the open source way”, and how it goes beyond software
Check out literally anything the Electronic Frontier Foundation has going on maybe?
TED talk on privacy and why it’s important
The Surveillance Self Defense project by the EFF
This EFF page on privacy for students
ExpressVPN article on privacy (not necessarily endorsing this company, just a good article)
What’s next?
I’ll probably make some more posts on specific kinds of software that I think folks may like. Or maybe a general overview on the more privacy forcused reasons and solutions for doing all of this.
Future post ideas, none of these are set in stone:
Open source Note taking apps
Replacements for just about Every Single google service I can think of
My personal setup
Open source / privacy conscious social media that studyblr folks may be into
Chatting, Calling, Videocalling: Discord and whatsapp alternatives etc
??? More studyblr apps that could do with a FOSS alternative??
How to support open source when you’re not a big fudgin nerd
How to be better at digital privacy and security, while still maintaining that studyblr aesthetic
Apps, software, other stuff, for specific areas of study maybe?
Feel free to suggest other ideas! Or leave feedback! This is my first big resource post so I wanna know if/how I can do better when I make another one!
179 notes · View notes
scifrey · 4 years
Photo
Tumblr media Tumblr media
WORDS FOR WRITERS: The Value of Fanfiction
There’s been a lot of chatter on social media these last few weeks, recycling that trashy, self-aggrandizing, tired old “hot take” that reading and writing fanfiction is somehow bad for you as a writer.
Before we go any further, let me give a clear and definitive answer to this take:
Tumblr media
No, reading and writing fanfiction will not make you and does not make you a bad reader or writer.
 Period.
 Why? Here’s the TL;DR version:
1)      Reading and Writing, any kind of reading and writing, will make you a better reader and writer. And it’s enjoyable, to boot.
2)      Fanfiction has been around as long as Original Fiction, so we’d know if there was any negative impact by now (spoiler alert: there isn’t.)
3)      Practice is Practice, so matter what medium you get that practice in.
4)      Comprehending and writing fanfiction is harder than writing original fiction because you have to hold the Source Media Text in your head at the same time as you’re reading/writing a different story. It improves your understanding of storytelling.
5)      No hobby, no matter what it is, so long as it doesn’t harm anyone else or yourself, is bad. And that goes double for if you decide to keep it a hobby. Not every fanfic writer wants to write original fiction, and that’s just fine. Not every hobby has to be monetized.
 Okay. But what do they mean by “fanfiction”?
 “Fanfiction is fictional writing written by fans, commonly of an existing work of fiction. The author uses copyrighted characters, settings, or other intellectual properties from the original creator as a basis for their writing.”-- Wikipedia
 Basically – it’s when you take elements (setting, characters, major themes or ideas) of a Media Text (a novel, a movie, a podcast, a comic, etc.) and create a different story with those elements. You can write a missing scene, or an extended episode, or a whole new adventure for the characters of the Media Text. You can even crossover or fuse multiple Media Texts, or specific elements, to create a whole new understanding of the characters or their worlds.
 Similar to fanfic, you can also create fanart, fancomics, or fansongs (“filk”), fancostumes (“cosplay”), and fanfilms. These are called Fanworks or Fancrafts.
 Fanfiction is usually posted to online forums, journals, blogs, or story archives and shared for free among the public. Before the advent of the internet, fanfiction was often printed or typed, and hand-copied using photocopiers or ditto machines, and distributed for free (or for a small administration fee to cover materials) among fans at conventions, or through mail-order booklets (“zines”).
 Fanfiction has existed pretty much since the beginning of storytelling (A Thousand and One Nights, Robin Hood, and King Arthur all have different elements attributed to them by different authors retelling, twisting, adding to, or changing the stories; there’s no single-origin author of those tales.)
 There are billions on billions of fanfics out there in the world—and while a majority of them are romance stories, there are also adventures, comedies, dramas, thrillers, stories based on case files, stories about the emotional connection between characters when one is hurt and the other must care for them, historical retellings, etc. There are also stories for every age range and taste, though be sure to take heed of the tags, trigger warnings, and age range warnings as your browse the archives and digital libraries.
 As a reader, it’s your responsibility to curate your experience online.
 So why are people so afraid or derisive of fanfic?
 People who are hard on fanfic say that…
 ·       It sucks.
o   Well of course it sucks! As it’s a low-stakes and easy way to try out creative writing for the first time, the majority of fanfiction is overwhelmingly written by new and young writers. Everything you do when you first try it sucks a little bit. 
I’m sure no figure skater was able to immediately land perfect triple axels ten minutes after they strap on the skates for the first time in their lives. No knitter has ever made a flawlessly perfect jumper on their first try. No mathematician has ever broken the code to send a rocket into space after having just been taught elementary-school multiplication. So why on earth do people think that new writers don’t need to practice? I can promise you that Lin-Manuel Miranda’s first rap was probably pretty shaky.
·       It’s lazy or it’s cheating.
o   Listen, anyone who tells you that writing anything is lazy clearly has not sat down and tried to write anything. Writing is tedious. It is boring. It takes hours, and hours, and hours to get anything on the page, and then once it’s on the page you have to go back and edit it. UGH. There is nothing about being a writer—even a fanfic writer—that is lazy.
o   And anyone who tells you that trying to tell a fresh, new story within the limits and confines of a pre-existing world and have it make sense is cheating, then they have no freaking clue how hard it is to be creative with that kind of limitation placed on you. It’s harder when you have a set of rules you need to follow. What you do come up with is often extremely interesting and creative because of those limitations, not in spite of them.
o   The argument that using pre-made characters, settings, tropes, and worlds to make up a new story is cheating is also complete bunk. Do those same people also expect hockey players to whittle and plane themselves a whole new hockey stick from scratch before each game? No, of course not. And yeah, a baker can grow all their own wheat, grind the flour, raise the chickens and cows so they can get eggs and milk, distill the vanilla, etc. Or a baker can buy a box mix. Either way, you get a cake at the end of the process. Whether you write fanfic or original fiction, you still get a story at the end of the process.
·       It makes you a worse writer.
o   * annoying buzzer noise * Practicing anything does not make you worse at it. And reading stories that are not edited, expertly crafted, or “high art” will also not indoctrinate you into being a bad writer. If anything, figuring out why you don’t like a specific story, trope, or writing style is actually a great way to learn what kind of writer you want to be, and to learn different methods of constructing sentences, creating images, and telling tales. Or you know, just how much spelling and grammar matter.
·       It’s not highbrow or thoughtful enough.
o   Sometimes stories are allowed to be just comfort food. Not every book or story you read has to be haute cuisine or boringly nutritious. You are allowed to read stories because they’re exciting, or swoony, or funny, or just because you like them. Anyone who says differently is a snob and worth ignoring. (Besides, fun silly stories can also be packed with meaning and lessons—I mean, hello, Terry Pratchett, anyone?)
·       It makes you waste all your time on writing that can’t be monetized.
o   No time is wasted if you spend it doing something that brings you joy. Not every hobby needs to be a money-maker and not everyone wants to be a professional writer. You are allowed to write, and read, fanfic just for the fun of it.
·       It’s theft.
o   According to Fair Use Law, it’s not. As long as the fanfic writer (or artist, cosplayer, etc.) is not making money on their creation that directly impacts or cuts into the original creator’s profit, or is not repackaging/plagiarizing the original Media Text and profiting off it’s resale, then Fan Works are completely legal. So there.
 How, exactly, does fanfic make you a better writer?
 Fanfiction…
 ·       teaches you to finish what you start.
o   The joy of being able to share your fic, either as you’re writing it, or afterward, is a big motivating factor for a lot of people. They finish because they get immediate feedback on it from their readers and followers. Lots of people have ideas for books, but how many of them do you know have actually sat down and written the whole thing?
o   Fanfic is also low-stakes; there’s nothing riding on whether you finish something or not, so you have to inspire yourself to get there without the outside (potentially negative) motivation of deadline or a failing grade if you don’t get the story finished. You end up learning how to motivate yourself.
o   Fanfic has no rules, so you write as much or as little as you want, stop wherever you think is a good place to end the story, write it out of order, or go back and write as many sequels or prequels as you like. Again, it’s totally low-stakes and is meant to be for fun, so you can noodle around with what it means to write a “whole” story and “complete” it, which teaches you how you like to write, and how you like to find your way to the finish line.
·       teaches you story structure.
o   Before you can sit down and write a story based on one of your favorite Media Texts, you’re likely to spend a lot of time consuming that text passively, or studying it actively. Either way, you’re absorbing how and why Media Text structures the stories it tells, and are learning how to structure your own from that.
o   Once you’re comfortable with the story structure the Media Text you’re working in is told, you’ll probably start experimenting with different ways stories can be told, and find the versions you like to work with best.
·       teaches you how to write characters consistently.
o   Fanfic is really hard because not only do you have to write your fave characters in a way that moves the story along, but they have to be recognizable as those fave characters.
o   This means you have to figure out their body language, verbal and physical tics, their motivations and they way the handle a crisis (fight, flight, or fawn?), and then make up the details you may need for your story that you may never see on screen/the page, like how they take their eggs or what their fave shampoo is, based on what you already know about them. That takes some top-notch detective work and character understanding to pull off.
o   Once you know how to do that, just making up a whole person yourself for original fiction is a breeze.
·       Teaches you how to hear and mimic a character/narrator voice.
o   You have to pay close attention to how an actor speaks, or how a character’s speech patterns, dialect, work choice, etc. is reflected on the page in order to be consistent in your story.
o   And all of this, in turn, teaches you how to build one for yourself.
o   I have a whole series of articles here about building a narrative voice, if you want to read more on constructing an original voice for your narrator.
·       Teaches you how to create or recreate a setting.
o   Again, like achieving character consistency, or mimicking a character or narrative voice, it takes work and paying attention in order to re-create a setting, time period, or geographical region in a fanfic—and if you’re taking your characters somewhere new, your readers will expect that setting to be equally rich as the one the Media Text is based in.
o   Which, again, teaches you how to then go and build an original one for yourself.
·       teaches how to take critique.
o   Professional writing is not a solitary pursuit. In fact, most writing is not entirely the work of an author alone. Like professional authors work with editors, critique partners, and proofreaders, some fanfiction writers will sometimes work with beta-readers or editors as well. This are friends or fanfic colleagues who offer to read your fanfic and point out plot, character, consistency, or story structure errors, or who offer to correct spelling and grammar errors. This is a great way to practice working with editors if you decide to pursue a professional career, and also a great way to make friends and strengthen your community and skill set if you don’t.
o   Many fanfic sites offer readers the opportunity to leave a comment on a fic, rather like a reviewer can leave a review on GoodReads or Amazon, or any other online store or blog, for a novel they’ve read. Sometimes these comments/reviews are 5 star and enthusiastic! Sometimes they are… not. The exact opposite in fact. As you get comments on your fanfic, and learn to ignore the ones that are just mean rather than usefully critical, you gain the Very Important Skill of learning to resist firing back at bad comments or reviews, while enjoying the good ones.  It also teaches you how to ignore drama or haters.
·       Teaches you how to exist within a like-minded community.
o   While the actual writing part of writing is solitary and sometimes tedious, nothing is ever published into a vacuum, whether it be fanfiction or original. Besides your editing/critique/beta reader group, you will also likely develop friendships, a support network, and mutuals. It’s always great to uplift, support, cheer on, and celebrate one another’s accomplishments and victories, whether the writing is fanfic or original.
·       Teaches you that it’s okay to write about things important to you, or your own identity.
o   You can change a characters ethnicity, cultural background, sexuality, religion, or disabilities to match yours, and talk about your lived life through the megaphone of that character. Or, you can insert original characters based on you, your desires, and experiences.
o   Once you’re comfortable writing in your #ownvoice in fanfic, you can approach it in original fiction, if you like.
o   See my article titled Your Voice Is Valid for more on this.
 What if I want to be a professional writer?
 Notice how I didn’t say “real writer”. Any writer who writes any kind of story is a ‘real’ writer. I mean, pinch yourself—you’re real, right? The difference is actually between being an “amateur” writer (a hobbyist who does not write for pay), and a “professional” (who is paid for their writing). Just because you only play shinny on the street with your friends, or in a house league on the weekends, it’s doesn’t mean  you’re not still as much of a hockey player as someone who plays in the NHL.
 Writing fanfiction before or at the same time as writing original fiction that you intend to sell is a great way to learn, or practice, everything I’ve mentioned above. If you read it widely, it will also expose you to different story telling styles, voices, and tropes than your reading of published fiction.
 ·       Can I sell my fanfic?
o   No. For fanfiction to remain under the umbrella of Fair Use Law, you cannot profit off your fanfiction. There’s some grey-area wiggle room around things like charging a small amount for a ‘zine or a PDF to cover administrative costs, but zero wiggleability around, say, selfpublishing your fanfic and charging heaps for it.
·       Can I “file off the serial numbers”?
o   “Filing of the series numbers” is when you take a fanfic you’ve written and essentially pull it apart, remove everything that’s clearly someone else’s Media Text, and reassembling the story so that it’s pretty much a completely original piece of creative writing.
o   Yes, you can sell these, provided your filing is rigorous enough that you aren’t likely to be dinged for plagiarism. It’s widely known that Cassandra Claire’s Shadowhunters was once Harry Potter fanfic, and that Fifty Shades of Gray was once Twilight fanfic. But did you know that my Triptych started life as an idea for a Stargate Atlantis fic? There’s lots of stories out there that were once full fics, or the idea for the novel was originally conceived for a fandom, but written as original instead.
o   So long as you’re careful to really rework the text so that it’s not just a find-name-replace-name rewrite, you should be fine.
o   Be aware, though, that the agents and editors you might pitch this novel to know how to Google. They may discover that this is a filed-off story, and depending on their backgrounds and biases, might be concerned about it. There’s no need to inform them of the novel’s origin straight off in your pitch/query letter, but you may want to have a frank discussion with them about it after it’s been signed so they can help you make sure that any lingering copywrited concepts or characters are thoroughly changed before publication.
o   Should you take down the original fic-version of the novel while you’re querying/shopping it? Well, that’s up to you, and whether you’re comfortable with an editor/agent potentially finding it.
·       Should I be ashamed of my fic, or take it down, or pretend I never wrote fic?
o   What? Why? No! I mean, I have hidden some of my most immature work, but I’ve left pretty much my whole catalogue of fanfic online and I don’t deny that I was/am a ficcer. Why? Because it’s a great repository of free stories that people can read before they buy one of my books, so they can get a taste of how and what I write. Also, you will be in good company. Lots and lots of writers who are published now-a-days started in fandom, including:
Steven Moffat
Seanan McGuire
Rainbow Rowell
Claudia Gray
Cory Doctorow
Marissa Meyer
Meg Cabot.
Naomi Novik
Neil Gaiman
Lev Grossman
S.E. Hinton
John Scalzi
The Bronte Sisters
Andy Weir
Sarah Rees Brennan
Marjorie M. Liu
Anna Todd
...and me, J.M. Frey
 How fanfic can harm.
 Like with anything else, there are ways that reading and writing fanfiction can actually harm you, or others, but it has nothing to do with the reading or writing of fanfiction in and of itself.
 ·       Some creators may prefer that you don’t (and may or may not follow up with legal action).
o   Anne Rice famously went after fanficcers in the 90s who wrote fanfic of her work, handing out Cease & Desist notices like confetti.
o   99% of creators don’t care. Those who do will generally have a notice on their websites or social media politely asking fancreators to refrain. Mostly this is due to their general discomfort over the idea of anyone else getting to play in their worlds. The best thing to do is respect that request, and find a different fandom to write in.
·       Flamewars and fandom fights leading to bullying and doxing.
o   Regrettably, just like any other community filled with people who have different favorites, opinions, and preferences, there will inevitably be clashes. It’s up to you to decide how to react to negative interactions, and how to model positive ones.
o   Don’t forget, you curate your online experience, so don’t be afraid of that block button.
o   Also, don’t be the jerk who goes after people for liking different aspects of the fandom. Everyone is entitled to interact and like a Media Text their own way. “Don’t yuck my yum,” as they say.
·       Trying to make money on other people’s IP/Media Text (law suits, etc.)
o   It doesn’t belong to you, so don’t try to make money on it.
o   There’s a grey area here in terms of selling prints/plushies/jewelry/etc. and there’s no hard line about where one copyright owner will draw the line, and another won’t. Warner Bros. owns the film rights for both Harry Potter and Hunger Games, but I’ve seen Harry Potter-themed bars spring up while fans wanting to make Hunger Game fanfilms have been shut down. A friend of mine sells hand-made fandom-inspired items at cons—there is no rhyme or reason to what she gets told to stop making and what she’s left alone on.
o   Best thing to do if you’re told to stop is just so stop, move on, and find a different fandom to be active in.
·       Writing Real Person Fanfic (“RPF”) can be considered a violation of consent.
o   This article sums it up pretty well, but basically… if you decide to write RPF, be aware that they person you are writing about is a real person, with real thoughts, and emotions, and they may feel violated by RPF. If you decide to write it, never send it to the people it’s about, and always clearly tag it so other can choose to engage with it, or avoid it.
o   Also be aware that it could ruin their love for what they do. For example: the friendships between the members of 1Direciton became strained and the band eventually disintegrated because people wouldn’t stop sending band members smutty stories or art of them having sex with one another, and it made them too uncomfortable to continue in the band.
·       Showing/sharing fanfic & fanart outside of its intended context. Fanworks are for fans, and there are definitely issues if…
o   It’s shown to celebrities/actors/creators.
  Shoving your fantasies onto the people who create or portray your fave characters is rude, and wrong, and also kinda gross. If they seek it out themselves, that’s one thing, but the same way you wouldn’t throw it at a complete stranger, don’t throw it at them. You may love the characters these people play, but they are not their characters, and they are not your friends.
  It may also really weird them out and ruin their love for what they do.
o   it’s shown to writers working on the series.
  There was a famous case where a fanficcer sent a story to a novelist, and the novelist was accused of plagiarism by the ficcer when their next novel in the series resembled the plot of that fanfic. There was a whole court case and everything.
  Because of this, writers of TV shows, books, etc. don’t want to (and often times, legally can’t) read your fanfic. They don’t want to get accidentally inspired by what you’ve written, or worse, have to throw out something because it resembles your fic too closely. Just let them write their stories the way they want, and if they choose to seek out fic, they will.
o   it’s mocked by celebrities.
  I’m not letting Alan Carr and Graham Norton off the hook. If it’s super rude and gross to shove fanworks at actors/writers/creators when you’re a creator, then it’s doubly rude for anyone to take a story or art made for a specific audience (the fans), by a specific community (the fans), lift it out of it’s context, and invite the public to mock it while also shoving it at the actor/celebrity in a place where they are literally cornered and can’t leave (i.e. the chat-show sofa). Man, it really steams me up when they do that. It’s rude and it’s tone-deaf, and it’s not fair.
  And most of the time they do it, they don’t even ask the artist or writer for permission, first, which is just…. Uuuuugggghhhh. It may be fanfic, but it was still created by someone, and you should always ask permission before publicly sharing something created by someone else.
  Grrrrrrr.
 In Conclusion
 If someone tells you that reading or writing fanfic is bad for you as a creator, tell them to get bent.
Famous Fanfic
·       Hamilton by Lin-Manuel Miranda
·       Wicked by Gregory Maguire
·       Wicked: the Musical by Stephen Schwartz
·       The Phantom of Manhattan by Fredrick Forsyth
·       A Study in Emerald by Neil Gaiman
·       Sherlock by Mark Gatiss and Steven Moffat
·       The Dracula Tape, by Fred Saberhaugen
·       Paradise Lost, John Milton
·       Inferno, by Dante
·       The Aeneid, by Virgil
·       Ulysses, by James Joyce
·       Romeo & Juliet, by William Shakespeare
·       The Once and Future King by T.H. White
·       A Connecticut Yankee in King Arthur’s Court, by Mark Twain
·       The Three Musketeers, by Alexandre Dumas
·       Pride & Prejudice & Zombies, by Seth Grahame-Smith
·       Phantom, a novel of his life by Susan Kaye
·       …and so many more.
14 notes · View notes
wickedbananas · 6 years
Text
How to Write Meta Descriptions in a Constantly Changing World (AKA Google Giveth, Google Taketh Away)
Posted by Dr-Pete
Summary: As of mid-May 2018, Google has reverted back to shorter display snippets. Our data suggests these changes are widespread and that most meta descriptions are being cut off in the previous range of about 155–160 characters.
Back in December, Google made a significant shift in how they displayed search snippets, with our research showing many snippets over 300 characters. Over the weekend, they seem to have rolled back that change (Danny Sullivan partially confirmed this on Twitter on May 14). Besides the obvious question — What are the new limits? — it may leave you wondering how to cope when the rules keep changing. None of us have a crystal ball, but I'm going to attempt to answer both questions based on what we know today.
Lies, dirty lies, and statistics...
I pulled all available search snippets from the MozCast 10K (page-1 Google results for 10,000 keywords), since that's a data set we collect daily and that has a rich history. There were 89,383 display snippets across that data set on the morning of May 15.
I could tell you that, across the entire data set, the minimum length was 6 characters, the maximum was 386, and the mean was about 159. That's not very useful, for a couple of reasons. First, telling you to write meta descriptions between 6–386 characters isn't exactly helpful advice. Second, we're dealing with a lot of extremes. For example, here's a snippet on a search for "USMC":
Marine Corps Community Services may be a wonderful organization, but I'm sorry to report that their meta description is, in fact, "apple" (Google appends the period out of, I assume, desperation). Here's a snippet for a search on the department store "Younkers":
Putting aside their serious multi-brand confusion, I think we can all agree that "BER Meta TAG1" is not optimal. If these cases teach you anything, it's only about what not to do. What about on the opposite extreme? Here's a snippet with 386 characters, from a search for "non-compete agreement":
Notice the "Jump to Exceptions" and links at the beginning. Those have been added by Google, so it's tough to say what counts against the character count and what doesn't. Here's one without those add-ons that clocks in at 370 characters, from a search for "the Hunger Games books":
So, we know that longer snippets do still exist. Note, though, that both of these snippets come from Wikipedia, which is an exception to many SEO rules. Are these long descriptions only fringe cases? Looking at the mean (or even the median, in this case) doesn't really tell us.
The big picture, part 1
Sometimes, you have to let the data try to speak for itself, with a minimum of coaxing. Let's look at all of the snippets that were cut off (ending in "...") and remove video results (we know from previous research that these skew a bit shorter). This leaves 42,863 snippets (just under half of our data set). Here's a graph of all of the cut-off lengths, gathered into 25 character bins (0–25, 26–50, etc.):
This looks very different from our data back in December, and is clearly clustered in the 150–175 character range. We see a few Google display snippets cut off after the 300+ range, but those are dwarfed by the shorter cut-offs.
The big picture, part 2
Obviously, there's a lot happening in that 125–175 character range, so let's zoom in and look at just the middle portion of the frequency distribution, broken up into smaller, 5-character buckets:
We can see pretty clearly that the bulk of cut-offs are happening in the 145–165 character range. Before December, our previous guidelines for meta descriptions were to keep them below 155 characters, so it appears that Google has more-or-less reverted to the old rules.
Keep in mind that Google uses proportional fonts, so there is no exact character limit. Some people have hypothesized a pixel-width limit, like with title tags, but I've found that more difficult to pin down with multi-line snippets (the situation gets even weirder on mobile results). Practically, it's also difficult to write to a pixel limit. The data suggests that 155 characters is a reasonable approximation.
To the Wayback Machine... ?!
Should we just go back to a 155 character cut-off? If you've already written longer meta descriptions, should you scrap that work and start over? The simple truth is that none of us know what's going to happen next week. The way I see it, we have four viable options:
(1) Let Google handle it
Some sites don't have meta descriptions at all. Wikipedia happens to be one of them. Now, Google's understanding of Wikipedia's content is much deeper than most sites (thanks, in part, to Wikidata), but many sites do fare fine without the tag. If your choice is to either write bad, repetitive tags or leave them blank, then I'd say leave them blank and let Google sort it out.
(2) Let the ... fall where it may
You could just write to the length you think is ideal for any given page (within reason), and if the snippets get cut off, don't worry about it. Maybe the ellipsis (...) adds intrigue. I'm half-joking, but the reality is that a cut-off isn't the kiss of death. A good description should entice people to want to read more.
(3) Chop everything at 155 characters
You could go back and mercilessly hack all of your hard work back to 155 characters. I think this is generally going to be time badly spent and may result in even worse search snippets. If you want to rewrite shorter Meta Descriptions for your most important pages, that's perfectly reasonable, but keep in mind that some results are still showing longer snippets and this situation will continue to evolve.
(4) Write length-adaptive descriptions
Is it possible to write a description that works well at both lengths? I think it is, with some care and planning. I wouldn't necessarily recommend this for every single page, but maybe there is a way to have our cake and eat at least half of it, too...
The 150/150 approach
I've been a bit obsessed with the "inverted pyramid" style of writing lately. This is a journalistic style where you start with the lead or summary of your main point and then break that down into the details, data, and context. While this approach is well suited to the web, its origins come from layout limitations in print. You never knew when your editor would have to cut your article short to fit the available space, so the inverted pyramid style helped guarantee that the most important part would usually be spared.
What if we took this approach to meta descriptions? In other words, why not write a 150-character "lead" that summarizes the page, and then add 150 characters of useful but less essential detail (when adding that detail makes sense and provides value)? The 150/150 isn't a magic number — you could even do 100/100 or 100/200. The key is to make sure that the text before the cut can stand on its own.
Think of it a bit like an ad, with two separate lines of copy. Let's take this blog post:
Line 1 (145 chars.)
In December, we reported that Google increased search snippets to over 300 characters. Unfortunately, it looks like the rules have changed again.
Line 2 (122 chars.)
According to our new research (May 2018), the limit is back to 155-160 characters. How should SEOs adapt to these changes?
Line 1 has the short version of the story and hopefully lets searchers know they're heading down the right path. Line 2 dives into a few details and gives away just enough data (hopefully) to be intriguing. If Google uses the longer description, it should work nicely, but if they don't, we shouldn't be any worse for wear.
Should you even bother?
Is this worth the effort? I think writing effective descriptions that engage search visitors is still very important, in theory (and that this indirectly impacts even ranking), but you may find you can write perfectly well within a 155-character limit. We also have to face the reality that Google seems to be rewriting more and more descriptions. This is difficult to measure, as many rewrites are partial, but there's no guarantee that your meta description will be used as written.
Is there any way to tell when a longer snippet (>300 characters) will still be used? Some SEOs have hypothesized a link between longer snippets and featured snippets at the top of the page. In our overall data set, 13.3% of all SERPs had featured snippets. If we look at just SERPs with a maximum display snippet length of 160 characters (i.e. no result was longer than 160 characters), the featured snippet occurrence was 11.4%. If we look at SERPs with at least one display snippet over 300 characters, featured snippets occurred at a rate of 41.8%. While that second data set is fairly small, it is a striking difference. There does seem to be some connection between Google's ability to extract answers in the form of featured snippets and their ability or willingness to display longer search snippets. In many cases, though, these longer snippets are rewrites or taken directly from the page, so even then there's no guarantee that Google will use your longer meta description.
For now, it appears that the 155-character guideline is back in play. If you've already increased some of your meta descriptions, I don't think there's any reason to panic. It might make sense to rewrite overly-long descriptions on critical pages, especially if the cut-offs are leading to bad results. If you do choose to rewrite some of them, consider the 150/150 approach — at least then you'll be a bit more future-proofed.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
from The Moz Blog https://ift.tt/2GooXya via IFTTT
2 notes · View notes
Note
answer ALL THE QUESTIONS
sharanya there are a hundred!! but ok
1: is there a boy/girl in your life?
No!
2: think of the last person who hurt you; do you forgive them?
I’m really fortunate that no one has really hurt me since, I guess, high school! And I definitely forgive high school shenanigans because wow, I thought my friends and I were mature then but this was not true!
3: what do you think of when you hear the word “meow?”
nyan, nyan, nyan! nihao nyan~~~~~
4: what’s something you really want right now?
a really awesome nerd dance party!
5: are you afraid of falling in love?
No!
6: do you like the beach?
I used to love the beach, or any water, a lot! But I had some weird ear injury, and now it’s hard for me to swim much without getting ear problems. So I still really like the beach, but it makes me feel a bit wistful.
7: have you ever slept on a couch with someone else?
Yep.
8: what’s the background on your cell?
An aesthetic shot of Bay-Enterprise train station, in my hometown of Edmonton!
9: name the last four beds you were sat on?
My summer housing bed, a hotel bed in hartford, my bed at home, my bed at school last year.
10: do you like your phone?
Yes. It cannot successfully type ‘t’ or ‘u’, but this adds to the charm! also I’ve gotten really creative at using words which don’t have those letters so that’s cool
11: honestly, are things going the way you planned?
I didn’t plan, I guess!
12: who was the last person whose phone number you added to your contacts?
I think… a model united nations conference help staffer’s number?
13: would you rather have a poodle or a rottweiler?
why have either when you could have MY DOG. Which I can. (And you can’t.)
14: which hurts the most, physical or emotional pain?
Emotional pain lasts longer, I guess?
15: would you rather visit a zoo or an art museum?
Art museums, because zoos are nice but they also make me sad when animals aren’t well cared for or don’t have enough space! It would be so sad to live your life on display. That said a lot of zoos care for their animals in an amazing way – unfortunately too often, it seems like this isn’t the case though :/
16: are you tired?
No!
17: how long have you known your 1st phone contact?
As in, my first one to ever be added? I guess my parents, so my whole life. As in, the person I’ve texted the most over my entire lifetime? Since high school. As in, the person I talk to most over the phone now? Also since high school.
18: are they a relative?
For the first of my three alternatives.
19: would you ever consider getting back together with any of your exes?
No! They’re all great people but our relationships ended for good reasons. I hang out with all of them though.
20: when did you last talk to the last person you shared a kiss with?
Wednesday?
21: if you knew you had the right person, would you marry them today?
No, well yes, depending on what ‘knowing’ means! If a 100% infallible oracle told me that this was the right decision, I guess! But if it’s just the ‘knowing’ of the moment, no! It’s good to have a time-distributed sample of ‘knowing’.
22: would you kiss the last person you kissed again?
No!
23: how many bracelets do you have on your wrists right now?
zeeero
24: is there a certain quote you live by?
“Je pense, donc je suis.”
25: what’s on your mind?
my head. But also my physics research, and thoughts about people! Mainly about how lucky I am to have acquired Amazing Summer Friends, which reminds me to tell them about why they’re great. **does so **
26: do you have any tattoos?
No.
27: what is your favorite color?
Red!
28: next time you will kiss someone on the lips?
When it happens, I guess!
29: who are you texting?
sharanya it’s you!
30: think to the last person you kissed, have you ever kissed them on a couch?
Yes.
31: have you ever had the feeling something bad was going to happen and you were right?
Roughly as often as chance predicts.
32: do you have a friend of the opposite sex you can talk to?
Honestly… almost all my friends are nonmale? I have like, one, maybeee two male friends I can talk to about serious stuff? But I am in the process of acquiring a new male friend and I’m Really Excited about this!
33: do you think anyone has feelings for you?
Maybe? Unclear!
34: has anyone ever told you you have pretty eyes?
On occasion.
35: say the last person you kissed was kissing someone right in front of you?
I would be surprised! But good for them!
36: were you single on valentines day?
Yep!
37: are you friends with the last person you kissed?
Yes!
38: what do your friends call you?
By my first name. or sometimes nerd
39: has anyone upset you in the last week?
I guess, but the perturbation led to me reaching a higher energy state!
40: have you ever cried over a text?
Six years ago.
41: where’s your last bruise located?
I don’t think I’ve had a bruise for years honestly.
42: what is it from?
Hence I forget.
43: last time you wanted to be away from somewhere really bad?
Hmm, probably not for three years, when I accidentally wandered into an unfamiliar and possibly sketchy part of Edmonton at midnight through a long and curious chain of events!
44: who was the last person you were on the phone with?
My mom!
45: do you have a favourite pair of shoes?
I love converse more than my own sole. (hahaha.) But they break pretty frequently, so whatever pair of converse I’ve had for longest that hasn’t broken!
46: do you wear hats if your having a bad hair day?
No, I only wear one hat and only when I’m feeling cool enough for it because damn it’s way cooler than I am.
47: would you ever go bald if it was the style?
i really like hair for dramatic dance moves;;;;;
48: do you make supper for your family?
Sometimes when I’m home, but honestly my mom does way more of the cooking work even then! The balance is slowly shifting in my favour as I learn better to make things my mom likes and get less lazy!
49: does your bedroom have a door?
Yes.
50: top 3 web-pages?
mspaintadventures,metafilter, wikipedia!
51: do you know anyone who hates shopping?
Most of my extended family!
52: does anything on your body hurt?
No!
53: are goodbyes hard for you?
Hmm. Usually they’re really easy, but when I’m hanging out with really cool people I’m always so awkward about it because they’re like ‘Bye’! and I’m like “bye…” but I’m really thinking “WAIT BUT YOURE SO COOL WHY NOT JUST KEEP HANGING OUT”
54: what was the last beverage you spilled on yourself?
Green tea!
55: how is your hair?
Wet post-shower!
56: what do you usually do first in the morning?
grumble, pull my blanket up, attempt to get into a comfortable sleeping position, decide I oughtn’t sleep, turn on my phone, log on to metafilter
57: do you think two people can last forever?
Yes!!! Of course!
58: think back to january 2007, were you single?
I was 10.
59: green or purple grapes?
I’ve never given it thought, I just eat them!
60: when’s the next time you will give someone a big hug?
this reminds me that I have not given a big hug for a long time! so whenever I have the chance!
61: do you wish you were somewhere else right now?
I wish cambridge ma were magically transported into canadaland because canada’s great! But Cambridge is nice and I’m happy to be here as it is.
62: when will be the next time you text someone?
Hmm, likely the next five minutes.
63: where will you be 5 hours from now?
Sleeping!
64: what were you doing at 8 this morning.
Reading differential geometry;;;
65: this time last year, can you remember who you liked?
My then-girlfriend!
66: is there one person in your life that can always make you smile?
Tons of people! My friends K and X and S. B., for a start!
67: did you kiss or hug anyone today?
Non non mon ami
68: what was your last thought before you went to bed last night?
“Zeta functions…. or hater functions”
(my brain’s dumb thoughts inevitably involve math. im sorry.)
69: have you ever tried your hardest and then gotten disappointed in the end?
Not really! Not that everything has been Successful, but a lot of it has and when things haven’t, it’s let me move onto other things which it turns out are hella cool and I really like so **happy shrug**
70: how many windows are open on your computer?
Seven!
71: how many fingers do you have?
ten
72: what is your ringtone?
The default one… but I’ve never turned on my volume ringer, like, ever?
73: how old will you be in 5 months?
21
74: where is your mum right now?
In Edmonton, Alberta! I’ll see her in three days and I’m excited!
75: why aren’t you with the person you were first in love with or almost in love?
We lived 9 timezones apart with limited internet access, then turned out to have very different attachment styles! But we’re still like. almost best friends! And it turns out this is a much better dynamic for us!
76: have you held hands with somebody in the past three days?
Non non!
77: are you friends with the people you were friends with two years ago?
Yes! Almost all of them, though since I’ve moved countries sometimes not in the frequency I would like. Like I really wish I could talk to my friends B.D. and T.J. more! But they’re both in California time.
78: do you remember who you had a crush on in year 7?
A., who was one of my best friends but in retrospect I was definitely gay for;;;; (A. is most definitely Very Straight.)
79: is there anyone you know with the name mike?
Yes! My cousin! He’s a great guy and has two AMAZING AND CUTE KIDS who I really wish I could spend more time with!
80: have you ever fallen asleep in someones arms?
Yes!
81: how many people have you liked in the past three months?
Definitely one, maybe two?
82: has anyone seen you in your underwear in the last 3 days?
No!
83: will you talk to the person you like tonight?
No current crushes!
84: you’re drunk and yelling at hot guys/girls out of your car window, you’re with?
Yelling at people from car windows is not ok and very objectifying! I am not doing this and if my friends are they’re getting a stern talking to!
85: if your bf/gf was into drugs would you care?
Yes, it’s probably not a for sure dealbreaker but it is almost certainly a dealbreaker.
86: what was the most eventful thing that happened last time you went to see a movie?
THE MOVIE HAD AMAZING BABY SPIDERMAN and I got to get really excited with a friend about cute. baby. spiderman. who i love deeply.
87: who was your last received call from?
My mom! Honestly I only call my mom.
88: if someone gave you $1,000 to burn a butterfly over a candle, would you?
…no.
89: what is something you wish you had more of?
…knowledge of algebraic geometry? Ok, on a personal level, I wish I was better at talking to friends about personal issues?  I used to be really good at this, but over the past few years I just have been really fortunate and I haven’t really had many personal issues at all! So I’ve become used to dealing with them myself, and I haven’t been very good at this at all.
90: have you ever trusted someone too much?
People have betrayed my trust, yes. But I’m happy I trusted them because even if it didn’t work out, the alternative of being suspicious seems too dark!
91: do you sleep with your window open?
Yes! the outdoors is nice and i want it in me!
92: do you get along with girls?
Yes! most of my friends are women!
93: are you keeping a secret from someone who needs to know the truth?
No!
94: does sex mean love?
They’re definitely distinct concepts! That said, I would find it pretty impossible personally to consider sex without love!  Definitely that isn’t true for other people and that’s cool!
95: you’re locked in a room with the last person you kissed, is that a problem?
No! Except neither of us are good at breaking locks oops
96: have you ever kissed anyone with a lip ring?
No!
97: did you sleep alone this week?
Yes!
98: everybody has somebody that makes them happy, do you?
So many people in my life! My mom, my dog, my friends are all so awesome! I’m super happy and fortunate.
99: do you believe in love at first sight?
It can probably happen! But people are deeper than their appearance, and I think most successful love probably does not rely on first sight!
100: who was the last person that you pinky promise?
Probably a drunk dutch girl in hong kong who wanted me to pinky promise swear that we’d hang out more?
2 notes · View notes
toothextract · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one’s backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove’em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn’t scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don’t need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let’s assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important – like sites that are manipulating the link graph, or sites that are exceptionally good. Let’s start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site’s link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I’m going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Tumblr media
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking…
These sites don’t rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven’t (and no one to my knowledge in our industry has) taken into account random URLs that don’t rank at all. This is something we’re solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz’s Spam Score is a link-blind (we don’t use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it “The Globe” network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Tumblr media
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn’t hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It’s common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google’s guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
Tumblr media
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google’s guidelines. However, they are manipulations of the link graph. It’s up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn’t work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
Tumblr media
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn’t always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like “Comment Spammer 1,” can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what’s next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We’re launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
from https://dentistry01.wordpress.com/2019/02/18/detecting-link-manipulation-and-spam-with-domain-authority/
0 notes
moonshugar · 8 years
Link
Search Voat Limit to v/pizzagate Submission Info Posted by: Johnny3names Posting time: 2 months ago on 12/13/2016 4:35:31 AM Last edit time: never edited. Traffic stats Views: 2645 Score SCP: 75 80 upvotes, 5 downvotes (94% upvoted it) Share a link Discuss pizzagate unsubscribe block 12204 subscribers ~1837 users here now Darknet Hack DATA COLLATION MEGATHREAD Pizzagate Subverse Network /v/AskPizzagate – Pizzagate-related questions /v/pizzagatewhatever – anything Pizzagate-related /v/PizzagateMemes – Pizzagate memes /v/pizzagatemods – meta concerns and Pizzagate moderation discussion Submission Requirements Please review BEFORE you submit content. Submissions not meeting these requirements will be removed. 1 ALL submissions should be directly related to Pizzagate, and this is how we define Pizzagate. It is the responsibility of the poster to demonstrate relevance. In most cases this will require a Discussion post where you provide a brief explanation of how your content relates to the investigation. Sometimes there will be content (like an article on Pizzagate) where a link post with an accurate, descriptive title will be enough to satisfy this requirement. 2 EVERY claim that is made as part of your post needs to be sourced. If you are asking a question, give a brief summary of what led to your question, and provide sources for those elements. If you are giving an explanation of how your content relates to Pizzagate (satisfying Rule 1), and you need to connect a few dots to do so, please provide sources for your "dots". If you wish to ask general questions about Pizzagate, please do so HERE. 3 LINK posts (VIDEO, IMAGE, ARTICLE, etc.) all need to include an accurate description of the actual content. If you can't provide an accurate summary in the title of a Link post then you must submit it as a Discussion post where you provide the link with a brief description of the content. 4 META submissions and general discussion submissions without sources will be removed. To avoid diluting the front page, please make those sorts of submissions HERE. Please bring any meta concerns you have HERE (this is also where the mods will discuss moderating the sub, so you are welcome to participate in those conversations as well). 5 Standalone MEMES will be removed. Please post standalone memes HERE. If you feel that a given meme can be used to raise awareness, please link it as part of a Discussion post where you source the elements that it is related to and discuss the best ways to use it. If you want to discuss the accuracy of a meme, again, link it in a Discussion post that sources the relevant elements from the investigation. 6 NSFW submissions (gore, nudity, etc.) must be labeled as such. This is not an adult subverse. Adspam, illegal content, and personal info about Voat subscribers will be removed, and the offender will be banned. Moderator Rules and On-Demand Removal Explanations Submission Removal Log WARNING! Due to the nature of this investigation, some links could result in the opening of incriminating material. Always practice common sense before clicking links, and make sure you're browsing safely. Use archive.is to archive and distribute sources. Links Resources for Investigators Memes and Infographics List of Independent Pizzagate Subverses List of External Pizzagate Sites – Stay secure! Chatroom #voatpizzagatemain:matrix.org (Riot) created by kingkongwaswrong a community for 3 months Advertisement Eat my ass Want to advertize on Voat? message the moderators MODERATORS kingkongwaswrong [O] Crensch [O] heygeorge [D] VictorSteinerDavion [O] Millennial_Falcon [M] rktyp [J] belphegorsprime [M] wecanhelp [M] l4l1lul3l0 [J] Kwijibo [J] Vindicator [M] abortionburger [J] SpikyAube [J] sensitive [J] MODERATION LOGS Removed submissions Removed comments Banned users 80 Nancy Pelosi and husband accused of shipping little boys in for Harry Reid back in 2014... (pizzagate) submitted 2 months ago by Johnny3names I posted this over on the shipping thread already but... Idk how dependable this publication is but I just ran across a post from 2014 accusing Pelosi and her husband of using his business to ship in little boys for Harry Reid. Anybody know what business her husband's in or a company name? His wikipedia page just makes some vague reference to American businesses with no real specifics. Well, today we found out, from the same unimpeachable whistleblower who gave us the information about Reid, that Nancy Pelosi has been Reid’s primary supplier of little boys. Apparently her husband has a multinational business and he’s been shipping in children for Harry.> http://conservativebyte.com/2014/03/harry-reids-pedophile-nancy-pelosi-supplying-children/ Harry Reid has been one of the asshole faces everywhere you look lately pushing one line of bullshit or another, Idk if this is it but he's got some kind of baggage somewhere. 31 comments unsave source report sort by: New Bottom Intensity Old Sort: Top [–] coincidencesmyass 8 points (+8|-0) 2 months ago  (edited 2 months ago) Not that it's proof or anything, but I've always thought that Nancy Pelosi's wide eyes looked psychopathic. Her eyes remind me of the eyes of the guy with the dyed orange/red hair that shot up that movie theater. His are worse, but still, her eyes are not the eyes of an average person. permalink save source reply report [–] Bulgakov 4 points (+4|-0) 2 months ago  I always assumed her plastic surgeon didn't do an even job...really opened that one eye permalink parent save source reply report [–] Johnny3names [S] 3 points (+3|-0) 2 months ago  (edited 2 months ago) More of these assholes than not look weird af with either bugged out or bird of prey eyes. They all remind me of the Skeksis from the 80's Jim Henson movie The Dark Chrystal or Montgomery Burns from the Simpsons and Mitch McConnell has a face like a shaved droopy nutsack wearing glasses. No one can ever say "never woulda guessed". permalink parent save source reply report [–] rail606 1 points (+1|-0) 2 months ago  What you dont like?JPG permalink parent save source reply report 1 reply [–] it_was_foretold 3 points (+3|-0) 2 months ago  wide-eyes think mk-ultra permalink parent save source reply report [–] [deleted] 3 points (+3|-0) 2 months ago  [Deleted] parent [–] coincidencesmyass 1 points (+1|-0) 2 months ago  ethic-eze, I love it. The men have the cremation of care at Bohemian Grove, maybe the Stepford wives are given medication. permalink parent save source reply report [–] blind_sypher 2 points (+2|-0) 2 months ago  You can always tell by their eyes. permalink parent save source reply report [–] Forgetmenot 6 points (+6|-0) 2 months ago  Harry Reid was absolutely not happy about bannons appointment to trumps cabinet. http://www.cnn.com/2016/11/15/politics/harry-reid-donald-trump-steven-bannon-floor-speech/index.html Could it be because bannon might be aware of the same information that Andrew breitbart was tweeting about? There was also a video where breitbart yells what is in your closet podesta but I cannot find it. Maybe someone can post it. http://www.infowars.com/andrew-breitbarts-chilling-podesta-tweet-from-the-grave/ permalink save source reply report [–] FyndersKeepers 1 points (+1|-0) 2 months ago  It can be safely assumed Breitbart filled Bannon in on everything that Bannon wasn't already privvy to. permalink parent save source reply report [–] hashtaggery 4 points (+4|-0) 2 months ago  Have you looked into Paul Pelosi Jr (her son)? He seems to hang around with a rough crowd - James E. Cohen / Joseph Corazzi. I'll dig around to see what I can find. permalink save source reply report [–] LearningTheLessons 4 points (+4|-0) 2 months ago  (edited 2 months ago) In the comments of this article about Bengazi someone posts "Nancy Pelosi's husband was the money manager for the Muslim Brotherhood in Turkey, when they were married in 1963. It's in their wedding announcement in the NY Times. Pelosi's husband has been orchestrating from behind the scenes." http://www.washingtonsblog.com/2014/04/real-benghazi-story.html permalink parent save source reply report [–] LearningTheLessons 2 points (+2|-0) 2 months ago  http://gawker.com/5113868/paul-pelosi-jr-the-fresh-green-prince-of-san-francisco permalink parent save source reply report [–] jbooba 2 points (+2|-0) 2 months ago  This is another hint ..: Pelosi’s daughter leads effort to block Trump through Electoral College permalink save source reply report [–] FyndersKeepers 1 points (+1|-0) 2 months ago  A hint treading on giveaway grounds. permalink parent save source reply report [–] Mrs_Ogynist01 2 points (+2|-0) 2 months ago  Utah is perusing a $2 million bribe case against Reid. http://www.sltrib.com/home/4533272-155/a-2m-check-harry-reid-and permalink save source reply report [–] 28leinad82 2 points (+2|-0) 2 months ago  a lot of emails from nancy and about her in the podesta wikileaks if you want to dig further. permalink save source reply report [–] BRING_DOWN_THE_RING 1 points (+1|-0) 2 months ago  Well, there's always the Pelosi Goat Hill Pizza thing... http://magafeed.com/guccifer-2-0-nancy-pelosis-goat-hill-pizza-may-be-a-front-company/ permalink save source reply report [–] ALDO_NOVA 1 points (+1|-0) 2 months ago  (edited 2 months ago) http://www.sfgate.com/politics/article/Pelosi-s-husband-prefers-a-low-profile-2660253.php " the couple's net worth, most of it linked to Paul Pelosi's investments, has made the Nancy the ninth-richest person in the House(of Representatives)." permalink save source reply report [–] surgit2 1 points (+1|-0) 2 months ago  She and her husband Paul own a Napa Valley vineyard: http://www.latimes.com/politics/la-pol-ca-richest-nancy-pelosi-vineyard-story.html permalink save source reply report [–] The_Kuru 1 points (+1|-0) 2 months ago  The reason they promote people to the top positions who are horribly compromised like Hastert and Reid is that they can be trusted. Same reason that you have to be destroyable before you can be "made" in the mafia. permalink save source reply report [–] [deleted] 1 points (+1|-0) 2 months ago  [Deleted] [–] Votescam 3 points (+3|-0) 2 months ago  Goat Hill Pizaa http://magafeed.com/guccifer-2-0-nancy-pelosis-goat-hill-pizza-may-be-a-front-company/ The owner of Goat Hill Pizza is Philip DeAndrade. as per this article from October 2016 He was a former staff member of Nanci Pelosi as recently as 2013 which can be found in the document ‘NP STAFF LIST UPDATED.xlsx’. Suggest a few others involved in purchase: Monley, Dickinson, Lipski Goat Hill Pizza is owned by Goat Hill Inc. Another website reports: BREAKING: Guccifer 2.0 - Nancy Pelosi’s Goat Hill Pizza Restaurant Is A Front Company Used To Funnel Money To The Democrats. Goat Hill Pizza is REGISTERED IN PANAMA. Who registers a pizza place that is in San Francisco in Panama? https://www.reddit.com/r/The_Donald/comments/55w48s/breaking_guccifer_20_nancy_pelosis_goat_hill/ permalink parent save source reply report [–] LargePepperoni 1 points (+1|-0) 2 months ago  You know about her and Goat Hill Pizza? permalink save source reply report [–] Johnny3names [S] 1 points (+1|-0) 2 months ago  Not at all, but I grew up in the bay and I shit you not my HS Social Studies teacher was married to one of the owners. The mf didn't believe in deoderant and taught in an inner classroon with no fucking windows, it was insane. permalink parent save source reply report [–] VictorDaniels777 1 points (+1|-0) 2 months ago  Harry Reid hasn't gotten enough attention. Nice lead. permalink save source reply report [–] sunshine702 4 points (+4|-0) 2 months ago  (edited 2 months ago) Harry Reid resigned his office in such an abrupt fashion. Not long after this happened: http://www.nytimes.com/politics/first-draft/2015/01/02/reid-is-hospitalized-after-exercise-accident/ There are some whispers about a "Mr. Exercise Equipment" paying him a visit at home over Christmas week but who knows?! permalink parent save source reply report [–] CadiBug 1 points (+1|-0) 2 months ago  (edited 2 months ago) "Mr. Exercise Equipment" = Ried's younger brother I believe permalink parent save source reply report [–] quantokitty 0 points (+0|-0) 2 months ago  Paul Pelosi is a land developer. https://www.google.com/#q=nancy+pelosi%27s+husband "Paul Francis Pelosi, Sr. is an American businessman who owns and operates Financial Leasing Services, Inc., a San Francisco, California-based real estate and venture capital investment and consulting firm. Wikipedia" Then there's the gifting of Treasure Island to this b*tch. This article leaves out that Clinton forced the military to sell the land. https://sadbastards.wordpress.com/2009/01/28/the-real-nancy-pelosi-multi-millionaire-non-union-resort-baroness/ permalink save source reply report [–] madmanpg 0 points (+0|-0) 2 months ago  As much as I wish this were true, "I heard Harry Reid touches little boys" has been a meme for a long time, mainly in response to his unsubstantiated claims that Mitt Romney cheated on his taxes during the 2012 election. He claimed to have an anonymous source but never produced them or any evidence, and his announcement of the baseless charge on the Senate floor was seen as proof that he was a slimy, ruthless piece of amphibian shit. I am looking forward to when he drops dead, and I really don't doubt that he's a kid-toucher if pizzagate turns out to be real...but going to any conservative blog for that source is going to be a dead end. permalink save source reply report
4 notes · View notes
thelmasirby32 · 4 years
Text
Interview with Lior Davidovitch, the founder of PUBLC
30-second summary:
The worldwide web is a clear reflection of all the shifts 2020 has brought and as businesses and marketers crunch majority of their budgets and pivot strategies.
In light of the current scenario businesses, digital marketers, and content creators continue to face some key problems around digital ad revenue, ad blocking, and more.
We caught up with Lior Davidovitch, the founder of PUBLC, an innovative search engine that reinvents user experience and technology.
PUBLC is a new search engine built by everyone, for everyone, that aspires to create an equally distributed web economy using blockchain token economics.
Read on to discover insights on how PUBLC serves a more equally distributed web economy using blockchain and token economics, generating a new and native revenue stream for online publishers.
The worldwide web is a clear reflection of all the shifts 2020 has brought and as businesses and marketers crunch majority of their budgets and pivot strategies, these remain some key problems of today’s digital space:
Digital ad revenue has taken a hit due to ad blockers.
Online publishers struggle to find a native revenue model as an alternative to ad-based models, which only grows bigger now with the COVID-19 impact on the advertising industry. Digital ad revenue is declining, as use of adblockers is increasing
Google and Facebook duopoly dominate over 60% of global ad revenue.
We caught up with Lior Davidovitch, the founder of PUBLC, a search engine that aims to reward the entire web ecosystem by creating an innovative, more equally distributed web economy using blockchain and token economics, generating a new and native revenue stream for online publishers.
Q1. Can you tell us about your background and journey towards becoming the founder of PUBLC?
I was always one of those kids that constantly thought of different business ideas and tried to invent things. The original idea for PUBLC started over 15 years ago when I was frustrated with the existing search engines. I always thought that people know best and there should be a way to add the human element to search for a better-organized web. Back then I had mocked up a few presentations and a family friend even connected me to a VC, but I was so young and had no idea what I was doing. Later in university, I wasn’t keen on academia and dropped out to start my journey as a start-up entrepreneur. In the beginning, I was just playing around with different ideas, and eventually, I saw that I’m always going back to the same original idea of creating a new search engine. I started completely from scratch, learning everything in the process, making mistakes, learning again, and building PUBLC layer by layer.
Q2. What was the biggest challenge you faced while setting up PUBLC? How did you solve it?
Just saying that you want to create a new search engine is a huge challenge by itself, doing it the way PUBLC does, creating a new search engine that completely reinvents the user experience and technology with a token-based business model is an even bigger challenge! If you add new and complicated technologies like blockchain and AI to the mix, the challenge becomes even bigger. Plus, the fact that you’re doing it as a small self-funded startup makes it almost impossible! But eventually, we did it step by step, layer by layer and built this platform that’s backed with AI and a blockchain financial infrastructure.
Q3. Can you give us a brief insight into PUBLC and your token economy?
PUBLC is a new type of search engine built by everyone, for everyone, that aspires to create an equally distributed web economy using blockchain token economics. You can think of PUBLC as a mix between Google and Wikipedia, where we combine human intelligence with artificial intelligence (AI) enabling users to categorize the content and “teach” PUBLC how to better organize the web, creating a new search experience, while also rewarding the users for participating in the process.
With regards to the token economics, on the one hand, our token, PUBLX, is granted by PUBLC as a reward to its community that contributes to PUBLC. On the other hand, the tokens are used as the only form of payment for PUBLC’s business services used by advertisers on the platform. This balance between supply and demand is what establishes the token value.
Token earners, be it publishers, brands, influencers, or content categorizers, can either use their token rewards to pay for any of our business services or exchange them on cryptocurrency exchanges, where the tokens can be bought by advertisers. So, we encourage everyone to checkout PUBLC and discover how they could earn PUBLX tokens.
Q5. How can businesses use PUBLC? Any tips on how they can get started with PUBLC?
PUBLC was built with all the different actors of the web ecosystem in mind, as we believe PUBLC is a platform that is meant to serve everyone and reward them for the value they create. Businesses as online publishers, brands, and celebrities benefit from PUBLC as it gives them exposure to new audiences, drives traffic to their websites, and earns them revenue for every time a user clicks on their content and views it. Businesses can get started on PUBLC by submitting their website, categorizing their content, and curating their pages. Our job is to support all those people and help them better achieve their goals, so feel free to reach out to us, we would love to hear from you!
Q6. Would video content be sourced from platforms like YouTube?
Yes! PUBLC curates and displays video content that users upload on YouTube and other such sites. You’ll be surprised to know that we even reward sites like YouTube as they also provide value to the ecosystem for hosting all that content.
Q7. What are your future plans for PUBLC? Will you venture into the digital advertising aspect as well? If yes, we’re assuming it will be in-ecosystem currency of PUBLX tokens?
Besides inventing a new user experience and technology we also had to invent a new revenue model connected to our token economy – and that’s crucial to the success of the platform – having those PUBLX tokens that are given to everyone for their contribution have real-life value. In order to do that we built a new set of business services such as, promoted content, brand awareness, ecommerce, and more, which offer advertisers a new way to enhance their brand awareness or conversions in a native and organic way within the platform without compromising PUBLC’s user experience for the users. 
Our business services work differently than the way it’s done on traditional search engines, and it rethinks this traditional advertising model of just promoting ads over search queries. The usual method is good but it could be different. We put more focus on content and the user experience because when you get ads, whether it’s on Google, Facebook, or any other platform – as a user that harms your experience. We aim to deliver our business services in a very native and organic way that doesn’t harm the user experience. For example, PUBLC offers promoted content that is real, in the form of videos, articles, and other multimedia. These could be campaigns that not only provide advertisers with the worth of their money but also engage and add value for the users. Furthermore, we incorporate PUBLC’s community in the approval of ads, having them take part in flagging spam and fraud, and helping shape PUBLC revenue model.
Q8. Do you use citations? How does the web validate your resources?
We’re focused more on the human element of search. People add domains and content URLs which are then approved by our community, and only then are indexed and crawled, making our sources more credible. There are many parameters that our algorithm evaluates in order to rank content, to give you a better idea I’ll share the three main key elements:
Relevancy: How the content is relevant to the search query or the topic that is searched
Popularity: How many PUBLC users saw and clicked on the content
Content age: How old is it, when was it published
Users are the first gatekeepers of which content gets indexed on the PUBLC search engine. 
Q9. How does PUBLC’s search engine combine human intelligence and AI? Is it curated by people? How do you counter aspects of “subjectivity” and “bias”?
As I’ve mentioned before, I strongly believe that people know best. They would know best about what topic(s) the is content related to or which search queries best describe the content. This is unlike how typical search engines work by mainly analyzing the text of the content. Having users add and categorize content on PUBLC works on a micro-scale for that specific content. However, when you add machine learning and AI to that you can adapt on a much larger scale, learn better about content categorization and indexing in a more precise, user-friendly, and genuine manner. Our search engine intends to broaden users’ horizons by reaching new content that they didn’t even know existed.
Yes, giving too much power to people could bring bias. But I’d like to refer to PUBLC on the lines of Wikipedia where you have a large group of people editing a specific piece of content that could be very controversial, and they still find a way to do it. On PUBLC we have an entire system of a reputation for users and publishers so they’re always building their own reputation simultaneously. 
For example, a user could build their reputation on the PUBLC platform for any niche, let’s assume, blockchain. Now if this user claims something about blockchain, the system considers their subject matter expertise and deems their claim right for crawling, categorizing, and indexing. 
We have everything validated by the users of the community. I think users very quickly know spam when they see it, so they wouldn’t approve spam-like content with the risk of lowering their reputation. We built this set of rules to incentivize people to do good and if they don’t play by the rules, they’re just going to lose.
Q10. Could you give us a small brief on how you’re dealing with privacy? Is there anything else that you’d like people to know about in terms of data privacy?
Privacy is one of PUBLC’s core, crucial elements. In fact, that’s the big problem with the web that we also saw fit to address. Platforms like Facebook and the others make their business out of the users’ data, and in a way compromises their privacy. That’s why there’s a huge loss of trust for users. One of our ambitions and aims is to use blockchain to enable users to have the best, most personal user experience while having 100% privacy. We are now building this element, and plan to have all of our users’ personal data stored on the blockchain and accessed only by them ensuring them with complete control over their data that’s also kept anonymous.
Users would have their own PUBLC IDs but there would be no way that I, the platform, or any of us could access that information. If a user personally chooses to share their information with advertisers and publishers to help them understand the user profile or engagement with their content – that too would be completely anonymized. 
Since it’s still the users’ data giving value to a business they would also be rewarded for it. This way we help users earn some of the revenue made through that data. But again, it would be completely anonymized data ensuring that businesses can’t trace a user to their real-life entity. That’s one of the great potentials that blockchain gives is the bandwidth to build platforms that are more focused on the privacy elements.
Q11. What are your predictions for search and SEO in 2020?
As I’m sure you can already guess, I believe search and SEO will be more focused on the human element, and that we will continue to see improvements in understanding the user’s intent. I think we will also see SEO become more accessible to the creators, and more straightforward, without harming the creativity and user experience of the content. Doing so by making tools to create the best optimization and content categorization. One of the biggest problems I see today is that creative content creators are forced to focus their efforts on SEO rather than on creating better content. I hope that with PUBLC, creators could focus on creating creative content while having the tools to actively influence their content’s SEO, without having the two contradict one another. For me, the prediction would be – better user experience, better content, and hopefully a better web.
The post Interview with Lior Davidovitch, the founder of PUBLC appeared first on Search Engine Watch.
from Digital Marketing News https://www.searchenginewatch.com/2020/07/03/interview-with-lior-davidovitch-the-founder-of-publc/
0 notes
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
readersforum · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
New Post has been published on http://www.readersforum.tk/detecting-link-manipulation-and-spam-with-domain-authority/
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one’s backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove’em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn’t scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don’t need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let’s assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important – like sites that are manipulating the link graph, or sites that are exceptionally good. Let’s start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site’s link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I’m going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking…
These sites don’t rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven’t (and no one to my knowledge in our industry has) taken into account random URLs that don’t rank at all. This is something we’re solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz’s Spam Score is a link-blind (we don’t use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it “The Globe” network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn’t hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It’s common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google’s guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google’s guidelines. However, they are manipulations of the link graph. It’s up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn’t work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn’t always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like “Comment Spammer 1,” can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what’s next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We’re launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
0 notes
christinesumpmg · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
http://bit.ly/2DNEo3T
0 notes
tracisimpson · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
christinesumpmg1 · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
http://bit.ly/2DNEo3T
0 notes
maryhare96 · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
http://bit.ly/2DNEo3T
0 notes
byronheeutgm · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
http://bit.ly/2DNEo3T
0 notes
fairchildlingpo1 · 6 years
Text
Detecting Link Manipulation and Spam with Domain Authority
Posted by rjonesx.
Over 7 years ago, while still an employee at Virante, Inc. (now Hive Digital), I wrote a post on Moz outlining some simple methods for detecting backlink manipulation by comparing one's backlink profile to an ideal model based on Wikipedia. At the time, I was limited in the research I could perform because I was a consumer of the API, lacking access to deeper metrics, measurements, and methodologies to identify anomalies in backlink profiles. We used these techniques in spotting backlink manipulation with tools like Remove'em and Penguin Risk, but they were always handicapped by the limitations of consumer facing APIs. Moreover, they didn't scale. It is one thing to collect all the backlinks for a site, even a large site, and judge every individual link for source type, quality, anchor text, etc. Reports like these can be accessed from dozens of vendors if you are willing to wait a few hours for the report to complete. But how do you do this for 30 trillion links every single day?
Since the launch of Link Explorer and my residency here at Moz, I have had the luxury of far less filtered data, giving me a far deeper, clearer picture of the tools available to backlink index maintainers to identify and counter manipulation. While I in no way intend to say that all manipulation can be detected, I want to outline just some of the myriad surprising methodologies to detect spam.
The general methodology
You don't need to be a data scientist or a math nerd to understand this simple practice for identifying link spam. While there certainly is a great deal of math used in the execution of measuring, testing, and building practical models, the general gist is plainly understandable.
The first step is to get a good random sample of links from the web, which you can read about here. But let's assume you have already finished that step. Then, for any property of those random links (DA, anchor text, etc.), you figure out what is normal or expected. Finally, you look for outliers and see if those correspond with something important - like sites that are manipulating the link graph, or sites that are exceptionally good. Let's start with an easy example, link decay.
Link decay and link spam
Link decay is the natural occurrence of links either dropping off the web or changing URLs. For example, if you get links after you send out a press release, you would expect some of those links to eventually disappear as the pages are archived or removed for being old. And, if you were to get a link from a blog post, you might expect to have a homepage link on the blog until that post is pushed to the second or third page by new posts.
But what if you bought your links? What if you own a large number of domains and all the sites link to each other? What if you use a PBN? These links tend not to decay. Exercising control over your inbound links often means that you keep them from ever decaying. Thus, we can create a simple hypothesis:
Hypothesis: The link decay rate of sites manipulating the link graph will differ from sites with natural link profiles.
The methodology for testing this hypothesis is just as we discussed before. We first figure out what is natural. What does a random site's link decay rate look like? Well, we simply get a bunch of sites and record how fast links are deleted (we visit a page and see a link is gone) vs. their total number of links. We then can look for anomalies.
In this case of anomaly hunting, I'm going to make it really easy. No statistics, no math, just a quick look at what pops up when we first sort by Lowest Decay Rate and then sort by Highest Domain Authority to see who is at the tail-end of the spectrum.
Success! Every example we see of a good DA score but 0 link decay appears to be powered by a link network of some sort. This is the Aha! moment of data science that is so fun. What is particularly interesting is we find spam on both ends of the distribution — that is to say, sites that have 0 decay or near 100% decay rates both tend to be spammy. The first type tends to be part of a link network, the second part tends to spam their backlinks to sites others are spamming, so their links quickly shuffle off to other pages.
Of course, now we do the hard work of building a model that actually takes this into account and accurately reduces Domain Authority relative to the severity of the link spam. But you might be asking...
These sites don't rank in Google — why do they have decent DAs in the first place?
Well, this is a common problem with training sets. DA is trained on sites that rank in Google so that we can figure out who will rank above who. However, historically, we haven't (and no one to my knowledge in our industry has) taken into account random URLs that don't rank at all. This is something we're solving for in the new DA model set to launch in early March, so stay tuned, as this represents a major improvement on the way we calculate DA!
Spam Score distribution and link spam
One of the most exciting new additions to the upcoming Domain Authority 2.0 is the use of our Spam Score. Moz's Spam Score is a link-blind (we don't use links at all) metric that predicts the likelihood a domain will be indexed in Google. The higher the score, the worse the site.
Now, we could just ignore any links from sites with Spam Scores over 70 and call it a day, but it turns out there are fascinating patterns left behind by common link manipulation schemes waiting to be discovered by using this simple methodology of using a random sample of URLs to find out what a normal backlink profile looks like, and then see if there are anomalies in the way Spam Score is distributed among the backlinks to a site. Let me show you just one.
It turns out that acting natural is really hard to do. Even the best attempts often fall short, as did this particularly pernicious link spam network. This network had haunted me for 2 years because it included a directory of the top million sites, so if you were one of those sites, you could see anywhere from 200 to 600 followed links show up in your backlink profile. I called it "The Globe" network. It was easy to look at the network and see what they were doing, but could we spot it automatically so that we could devalue other networks like it in the future? When we looked at the link profile of sites included in the network, the Spam Score distribution lit up like a Christmas tree.
Most sites get the majority of their backlinks from low Spam Score domains and get fewer and fewer as the Spam Score of the domains go up. But this link network couldn't hide because we were able to detect the sites in their network as having quality issues using Spam Score. If we relied only on ignoring the bad Spam Score links, we would have never discovered this issue. Instead, we found a great classifier for finding sites that are likely to be penalized by Google for bad link building practices.
DA distribution and link spam
We can find similar patterns among sites with the distribution of inbound Domain Authority. It's common for businesses seeking to increase their rankings to set minimum quality standards on their outreach campaigns, often DA30 and above. An unfortunate outcome of this is that what remains are glaring examples of sites with manipulated link profiles.
Let me take a moment and be clear here. A manipulated link profile is not necessarily against Google's guidelines. If you do targeted PR outreach, it is reasonable to expect that such a distribution might occur without any attempt to manipulate the graph. However, the real question is whether Google wants sites that perform such outreach to perform better. If not, this glaring example of link manipulation is pretty easy for Google to dampen, if not ignore altogether.
A normal link graph for a site that is not targeting high link equity domains will have the majority of their links coming from DA0–10 sites, slightly fewer for DA10–20, and so on and so forth until there are almost no links from DA90+. This makes sense, as the web has far more low DA sites than high. But all the sites above have abnormal link distributions, which make it easy to detect and correct — at scale — link value.
Now, I want to be clear: these are not necessarily examples of violating Google's guidelines. However, they are manipulations of the link graph. It's up to you to determine whether you believe Google takes the time to differentiate between how the outreach was conducted that resulted in the abnormal link distribution.
What doesn't work
For every type of link manipulation detection method we discover, we scrap dozens more. Some of these are actually quite surprising. Let me write about just one of the many.
The first surprising example was the ratio of nofollow to follow links. It seems pretty straightforward that comment, forum, and other types of spammers would end up accumulating lots of nofollowed links, thereby leaving a pattern that is easy to discern. Well, it turns out this is not true at all.
The ratio of nofollow to follow links turns out to be a poor indicator, as popular sites like facebook.com often have a higher ratio than even pure comment spammers. This is likely due to the use of widgets and beacons and the legitimate usage of popular sites like facebook.com in comments across the web. Of course, this isn't always the case. There are some sites with 100% nofollow links and a high number of root linking domains. These anomalies, like "Comment Spammer 1," can be detected quite easily, but as a general measurement the ratio does not serve as a good classifier for spam or ham.
So what's next?
Moz is continually traversing the the link graph looking for ways to improve Domain Authority using everything from basic linear algebra to complex neural networks. The goal in mind is simple: We want to make the best Domain Authority metric ever. We want a metric which users can trust in the long run to root out spam just like Google (and help you determine when you or your competitors are pushing the limits) while at the same time maintaining or improving correlations with rankings. Of course, we have no expectation of rooting out all spam — no one can do that. But we can do a better job. Led by the incomparable Neil Martinsen-Burrell, our metric will stand alone in the industry as the canonical method for measuring the likelihood a site will rank in Google.
We're launching Domain Authority 2.0 on March 5th! Check out our helpful resources here, or sign up for our webinar this Thursday, February 21st for more info on how to communicate changes like this to clients and stakeholders:
Save my spot!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
http://bit.ly/2DNEo3T
0 notes